Statistical classification, an application to credit default

Sikhakhane, Anele Gcina

Statistical classification, an application to credit default

Files

Statistical_classification__an_application_to_cred_vital_76570.pdf (1 MB)

Date

11/10/2024

Authors

Sikhakhane, Anele Gcina

Publisher

Rhodes University, Faculty of Science, Department of Statistics

Abstract

Statistical learning has been used in both industry and academia to create credit scoring models. These models are used to predict who might default on their loan repayments, thus minimizing the risk financial institutions face. In this study six traditional and one more recent classifier, namely kNN, LDA, CART, RF, AdaBoost, XGBoost and SynBoost were used to predict who might default on their loans. The data set used in this study was imbalanced thus sampling and performance evaluation techniques were investigated and used to balance the class distribution and assess the classifiers performance. In addition to the standard variables and data set, new variables called synthetic variables and synthetic data sets were produced, investigated and used to predict who might default on their loans. This study found that the synthetic data set had strong predictive power and sampling methods negatively affected the classifiers performance. The best-performing classifier was XGBoost, with an AUC score of 0.7732.

Keywords

Binary classification, Default (Finance), Credit cards, Credit risk, Machine learning, Variables (Mathematics)

URI

https://researchrepository.ru.ac.za/handle/20.500.14915/3225

Collections

Masters Degrees (Statistics)

Full item page

Statistical classification, an application to credit default

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By