GitHub - jathiel/home_credit

Project Description

The Home Credit Group (HCG) is looking to develop a model that accurately predicts the riskiness of a particular loan. Such models are referred to as 'scorecards'. Since scorecards need to be updated according to changes in consumer behavior over time, their predictive ability may change over time as well. The goal of this project is to develop a scorecard that can provide time-stable predictions of credit defaults. The data provided encompass a wide variety of categories, but no credit scores.

Data Description

The data we are going to use (train/test) is provided by HGC and is available on kaggle. The data were derived from various sources, including application forms, social-demographic data, previous credit behavior data, etc. Each data has a unique case_id which corresponds to one applicant. Moreover, personal information and business sensitive information are "masked" but still retain as much information as possible.

In total, the data is around 26 GB and contains 32 separate files, including 465 features. For more information, please refer to the data dictionary.

Stakeholders

The HCG is the primary stakeholder in this project. They, along with other lending institutions, have a strong financial incentive tp make loans that are unlikely to default. Borrowers may also benefit from improved predictions, as lenders may be more inclined to offer preferable loan terms to individuals without extensive credit history if other indicators (i.e., income and demographics) suggest a low chance of default. Improved loan predictions would also benefit organizations that supply consumer products that often require loans. Car dealerships, for instance, may increase their profits if previously ineligible lendees are given loans.

Key Performance Indicators (KPIs)

As mentioned in the project description, the main KPIs are the model's ability to accurately predict the likelihood of default and the stability of the model over time. AUC is the primary metric used, meaning that the model(s) will aim to solve a binary classification problem. In terms of model stability over time, the key metric will be the scorecard's performance consistency over time. Using the WEEK_NUM group, we can calculate the weekly Gini score as $2\cdot\text{AUC score}-1$. A linear regression model $y = ax + b$ ($x$ is the WEEK_NUM and $y$ is the score) will be used to evaluate the trend of the Gini scores over time. A negative coefficient $a$ will indicate that predictive performance is dropping over time and the final score of the model will be penalized.

We will also calculate the standard deviation of the residuals from the above linear regression model. If the standard deviation is too large the final score of the model will be penalized.

The final score is a combination of the computations mentioned above:

$$\text{Stability Score} = \text{mean(gini)} + 88.0 \cdot \min (0,a) - 0.5 \cdot \text{std(residual)}.$$

Dataset Link

Link to dataset on Google Drive

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.vscode		.vscode
__pycache__		__pycache__
src		src
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
NaN_stats.csv		NaN_stats.csv
NaN_stats.ipynb		NaN_stats.ipynb
Notes.md		Notes.md
README.md		README.md
adaboost.ipynb		adaboost.ipynb
data_reduction.py		data_reduction.py
data_reduction_PS.py		data_reduction_PS.py
features_dictionary.py		features_dictionary.py
home-credit-2024-starter-notebook.ipynb		home-credit-2024-starter-notebook.ipynb
home-credit-data-cleaning-eval-metric.ipynb		home-credit-data-cleaning-eval-metric.ipynb
home-credit-data-cleaning-exploratory.ipynb		home-credit-data-cleaning-exploratory.ipynb
home-credit-data-cleaning-nn.ipynb		home-credit-data-cleaning-nn.ipynb
home-credit-data-cleaning.ipynb		home-credit-data-cleaning.ipynb
logistic.ipynb		logistic.ipynb
vectorization.py		vectorization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Data Description

Stakeholders

Key Performance Indicators (KPIs)

Dataset Link

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Description

Data Description

Stakeholders

Key Performance Indicators (KPIs)

Dataset Link

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages