The neglog transformation and quantile regression for the analysis of a large credit scoring database |
| |
Authors: | Joe Whittaker Chris Whitehead Mark Somers |
| |
Affiliation: | Lancaster University, UK; Scimetrics Ltd, Reading, UK |
| |
Abstract: | Summary. A statistical analysis of a bank's credit card database is presented. The database is a snapshot of accounts whose holders have missed a payment on a given month but who do not subsequently default. The variables on which there is information are observable measures on the account (such as profit and activity), and whether actions that are available to the bank (such as letters and telephone calls) have been taken. A primary objective for the bank is to gain insight into the effect that collections activity has on on-going account usage. A neglog transformation that highlights features that are hidden on the original scale and improves the joint distribution of the covariates is introduced. Quantile regression, a novel methodology to the credit scoring industry, is used as it is relatively assumption free, and it is suspected that different relationships may be manifest in different parts of the response distribution. The large size is handled by selecting relatively small subsamples for training and then building empirical distributions from repeated samples for validation. In the application to the database of clients who have missed a single payment a substantive finding is that the predictor of the median of the target variable contains different variables from those of the predictor of the 30% quantile. This suggests that different mechanisms may be at play in different parts of the distribution. |
| |
Keywords: | Collections Large data sets Missed payment database Profit scoring Training sample Validation sample Yeo–Johnson power transformation |
|
|