共查询到20条相似文献,搜索用时 15 毫秒
1.
Statistical Classification Methods in Consumer Credit Scoring: a Review 总被引:11,自引:0,他引:11
D. J. Hand & W. E. Henley 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1997,160(3):523-541
Credit scoring is the term used to describe formal statistical methods used for classifying applicants for credit into 'good' and 'bad' risk classes. Such methods have become increasingly important with the dramatic growth in consumer credit in recent years. A wide range of statistical methods has been applied, though the literature available to the public is limited for reasons of commercial confidentiality. Particular problems arising in the credit scoring context are examined and the statistical methods which have been applied are reviewed. 相似文献
2.
近年来,我国消费金融发展迅速,但同时也面临着更加复杂的欺诈和信用风险,为了更好地对消费金融中借贷客户的信用风险进行监测,本文提出了基于稀疏结构连续比率模型的风控方法。相对于传统的二分类模型,该模型的特点是可以处理借贷客户被分为三类或三类以上的有序数据,估计系数的同时能从众多纷繁复杂的数据中自动筛选重要变量,并在变量筛选过程中考虑不同子模型系数的结构特征。通过蒙特卡洛模拟发现,本文所提出的稀疏结构连续比率模型在分类泛化误差和变量筛选上的表现都较好。最后将本文提出的模型应用到实际的消费金融信用风险分析中,针对传统征信信息不足的借款人,通过引入高频电商消费行为数据,利用本文提出的高维有序多分类模型能有效识别借款人的信用风险,可以弥补传统征信方法的不足。 相似文献
3.
E. Stanghellini K. J. McConway & D. J. Hand 《Journal of the Royal Statistical Society. Series C, Applied statistics》1999,48(2):239-251
A bank offering unsecured personal loans may be interested in several related outcome variables, including defaulting on the repayments, early repayment or failing to take up an offered loan. Current predictive models used by banks typically consider such variables individually. However, the fact that they are related to each other, and to many interrelated potential predictor variables, suggests that graphical models may provide an attractive alternative solution. We developed such a model for a data set of 15 variables measured on a set of 14 000 applications for unsecured personal loans. The resulting global model of behaviour enabled us to identify several previously unsuspected relationships of considerable interest to the bank. For example, we discovered important but obscure relationships between taking out insurance, prior delinquency with a credit card and delinquency with the loan. 相似文献
4.
《Journal of Statistical Computation and Simulation》2012,82(2):181-184
This paper encompasses three parts of validating risk models. The first part provides an understanding of the precision of the standard statistics used to validate risk models given varying sample sizes. The second part investigates jackknifing as a method to obtain a confidence interval for the Gini coefficient and K–S statistic for small sample sizes. The third and final part investigates the odds at various cutoff points as to its efficiency and appropriateness relative to the K–S statistic and Gini coefficient in model validation. There are many parts to understanding the risk associated with the extension of credit. This paper focuses on obtaining a better understanding of present methodology for validating existing risk models used for credit scoring, by investigating the three parts mentioned. The empirical investigation shows the precision of the Gini coefficient and K–S statistic is driven by the sample size of the smaller, either successes or failures. In addition, a simple adaption of the standard jackknifing formula is possible to use to get an understanding of the variability of the Gini coefficient and K–S statistic. Finally, the odds is not a reliable statistic to use without a considerably large sample of both successes and failures. 相似文献
5.
现阶段,商业银行信贷仍是我国社会资金配置的主要方式。出于盈利和风险考虑,商业银行信贷行为天然具有顺周期特征。为实现稳增长目标,政府更倾向于逆周期调节。受到地方财政收支状况影响,省级地方政府会采取不同方式、不同程度地干预省域资金配置。十九大报告明确指出,要健全货币政策和宏观审慎政策双支柱调控框架。因此,省域信贷风险判别是一个动态过程,需在经济周期与宏观审慎政策框架下整体考虑。在此背景下,本文基于新古典经济学分析框架,建立了2008年以来省域信贷风险识别模型,研究发现,第一,地方财政支出收入比与不良贷款率存在正向影响关系,资本回报率与不良贷款率存在负向影响关系,且地方财政支出收入比对不良贷款率的影响程度更大;第二,依据分类准则,属于信贷高风险的省域分别是:河南,海南,重庆,四川,贵州,云南,陕西,甘肃,青海,宁夏,新疆,西藏;第三,在地方财政支出收入比、资本回报率的显著作用影响下,我国各省域不良贷款率呈现U型变化,不良贷款率阈值为1.49%,即当不良贷款率大于1.49%时,省域贷款风险较高;第四,当我国资本回报率处于企稳阶段,不良贷款率处于低于阈值的谷底阶段,且省域间风险差异性较小。当我国资本回报率处于下行阶段时,不良贷款率上升至阈值线以上,且省域间风险差异性较大。 相似文献
6.
《统计学通讯:模拟与计算》2013,42(3):401-423
ABSTRACT In this study, Monte Carlo simulation experiments were employed to examine the performance of four statistical two-group classification methods when the data distributions are skewed and misclassification costs are unequal, conditions frequently encountered in business and economic applications. The classification methods studied are linear and quadratic parametric, nearest neighbor and logistic regression methods. It was found that when skewness is moderate, the parametric methods tend to give best results. Depending on the specific data condition, when skewness is high, either the linear parametric, logistic regression, or the nearest-neighbor method gives the best results. When misclassification costs differ widely across groups, the linear parametric method is favored over the other methods for many of the data conditions studied. 相似文献
7.
《Journal of Statistical Computation and Simulation》2012,82(3-4):281-294
An assumption made in the classification problem is that the distribution of the data being classified has the same parameters as the data used to obtain the discriminant functions. A method based on mixtures of two normal distributions is proposed as method of checking this assumption and modifying the discriminant functions accordingly. As a first step, the case considered in this paper, is that of a shift in the mean of one or two univariate normal distributions with all other parameters remaining fixed and known. Calculations based on the asymptotic the proposed method works well even for small shifts. 相似文献
8.
Fast and robust bootstrap 总被引:1,自引:0,他引:1
Matías Salibián-Barrera Stefan Van Aelst Gert Willems 《Statistical Methods and Applications》2008,17(1):41-71
In this paper we review recent developments on a bootstrap method for robust estimators which is computationally faster and more resistant to outliers than the classical bootstrap. This fast and robust bootstrap method is, under reasonable regularity conditions, asymptotically consistent. We describe the method in general and then consider its application to perform inference based on robust estimators for the linear regression and multivariate location-scatter models. In particular, we study confidence and prediction intervals and tests of hypotheses for linear regression models, inference for location-scatter parameters and principal components, and classification error estimation for discriminant analysis. 相似文献
9.
Ana Kupresanin Hyejin Shin David King R.L. Eubank 《Journal of statistical planning and inference》2010
Linear combinations of random variables play a crucial role in multivariate analysis. Two extension of this concept are considered for functional data and shown to coincide using the Loève–Parzen reproducing kernel Hilbert space representation of a stochastic process. This theory is then used to provide an extension of the multivariate concept of canonical correlation. A solution to the regression problem of best linear unbiased prediction is obtained from this abstract canonical correlation formulation. The classical identities of Lawley and Rao that lead to canonical factor analysis are also generalized to the functional data setting. Finally, the relationship between Fisher's linear discriminant analysis and canonical correlation analysis for random vectors is extended to include situations with function-valued random elements. This allows for classification using the canonical Y scores and related distance measures. 相似文献
10.
Philippe Casin 《Journal of applied statistics》2018,45(8):1396-1409
Techniques of credit scoring have been developed these last years in order to reduce the risk taken by banks and financial institutions in the loans that they are granting. Credit Scoring is a classification problem of individuals in one of the two following groups: defaulting borrowers or non-defaulting borrowers. The aim of this paper is to propose a new method of discrimination when the dependent variable is categorical and when a large number of categorical explanatory variables are retained. This method, Categorical Multiblock Linear Discriminant Analysis, computes components which take into account both relationships between explanatory categorical variables and canonical correlation between each explanatory categorical variable and the dependent variable. A comparison with three other techniques and an application on credit scoring data are provided. 相似文献
11.
We propose a mixture of latent variables model for the model-based clustering, classification, and discriminant analysis of data comprising variables with mixed type. This approach is a generalization of latent variable analysis, and model fitting is carried out within the expectation-maximization framework. Our approach is outlined and a simulation study conducted to illustrate the effect of sample size and noise on the standard errors and the recovery probabilities for the number of groups. Our modelling methodology is then applied to two real data sets and their clustering and classification performance is discussed. We conclude with discussion and suggestions for future work. 相似文献
12.
Nema Dean Thomas Brendan Murphy Gerard Downey 《Journal of the Royal Statistical Society. Series C, Applied statistics》2006,55(1):1-14
Summary. An authentic food is one that is what it purports to be. Food processors and consumers need to be assured that, when they pay for a specific product or ingredient, they are receiving exactly what they pay for. Classification methods are an important tool in food authenticity studies where they are used to assign food samples of unknown type to known types. A classification method is developed where the classification rule is estimated by using both the labelled and the unlabelled data, in contrast with many classical methods which use only the labelled data for estimation. This methodology models the data as arising from a Gaussian mixture model with parsimonious covariance structure, as is done in model-based clustering. A missing data formulation of the mixture model is used and the models are fitted by using the EM and classification EM algorithms. The methods are applied to the analysis of spectra of food-stuffs recorded over the visible and near infra-red wavelength range in food authenticity studies. A comparison of the performance of model-based discriminant analysis and the method of classification proposed is given. The classification method proposed is shown to yield very good misclassification rates. The correct classification rate was observed to be as much as 15% higher than the correct classification rate for model-based discriminant analysis. 相似文献
13.
14.
《商业与经济统计学杂志》2013,31(2):320-328
Default is a rare event, even in segments in the midrange of a bank’s portfolio. Inference about default rates is essential for risk management and for compliance with the requirements of Basel II. Most commercial loans are in the middle-risk categories and are to unrated companies. Expert information is crucial in inference about defaults. A Bayesian approach is proposed and illustrated using a prior distribution assessed from an industry expert. The binomial model, most common in applications, is extended to allow correlated defaults. A check of robustness is illustrated with an ε-mixture of priors. 相似文献
15.
The risk of an individual woman having a pregnancy associated with Down's syndrome is estimated given her age, α-fetoprotein, human chorionic gonadotropin, and pregnancy-specific β1-glycoprotein levels. The classical estimation method is based on discriminant analysis under the assumption of lognormality of the marker values, but logistic regression is also applied for data classification. In the present work, we compare the performance of the two methods using a dataset containing the data of almost 89,000 unaffected and 333 affected pregnancies. Assuming lognormality of the marker values, we also calculate the theoretical detection and false positive rates for both the methods. 相似文献
16.
《Journal of Statistical Computation and Simulation》2012,82(1-4):321-336
Risk assessment of modeling predictions is becoming increasingly important as input to decision makers. Probabilistic risk analysis is typically expensive to perform since it generallyrequires the calculation of a model output Probability Distribution Function (PDF) followed by the integration of the risk portion of the PDF. Here we describe the new risk analysis Guided Monte Carlo (GMC) technique. It maintains the global coverage of Monte Carlo (MC) while judiciously combining model reruns with efficient sensitivity analysis predictions to accurately evaluate the integrated risk portion of the PDF. This GMC technique will facilitate risk analysis of complex models, where the expense was previously prohibitive. Two examples are presented to illustrate the technique, its computational savings and broad applicability. These are an ordinary differential equation based chemical kinetics model and an analytic dosimetry model. For any particular example, the degree of savings will depend on the relative risk being evaluated. In general, the highest fractional degree of savings with the GMC technique will occur for estimating risk levels that are specified in the far wing of the PDF.If no savings are possible, the GMC technique defaults to the true MC limit. In the illustrations presented here, the GMC analysis saved approximately a factor of four in computational effort relative to that of a full MC analysis. Furthermore, the GMC technique can also be implemented with other possible sampling strategies, such as Latin Hypercube, when appropriate. 相似文献
17.
Paul R Rosenbaum 《统计学通讯:理论与方法》2013,42(11):2687-2698
In many experiments where data have been collected at two points in time (pre-treatment and post-treatment), investigators wish to determine if there is a difference between two treatment groups. In recent years it has been proposed that an appropriate statistical analysis to determine if treatment differences exist is to use the post-treatment values as the primary comparison variables and the pre-treatment values as covariates. When there are several outcome variables, we propose new tests based on residuals as alternatives to existing methods and investigate how the powers of the new and existing tests are affected by various choices of covariates. The limiting distribution of the test statistic of the new test based on residuals is given. Monte Carlo simulations are employed in the power comparisons. 相似文献
18.
Angela Vossmeyer 《商业与经济统计学杂志》2016,34(2):197-212
This article develops a framework for estimating multivariate treatment effect models in the presence of sample selection. The methodology deals with several important issues prevalent in policy and program evaluation, including application and approval stages, nonrandom treatment assignment, endogeneity, and discrete outcomes. This article presents a computationally efficient estimation algorithm and techniques for model comparison and treatment effects. The framework is applied to evaluate the effectiveness of bank recapitalization programs and their ability to resuscitate the financial system. The analysis of lender of last resort (LOLR) policies is not only complicated due to econometric challenges, but also because regulator data are not easily obtainable. Motivated by these difficulties, this article constructs a novel bank-level dataset and employs the new methodology to jointly model a bank’s decision to apply for assistance, the LOLR’s decision to approve or decline the assistance, and the bank’s performance following the disbursements. The article offers practical estimation tools to unveil new answers to important regulatory and policy questions. 相似文献
19.
《Journal of Statistical Computation and Simulation》2012,82(1-4):175-196
The evaluation of hazards from complex, large scale, technologically advanced systems often requires the construction of computer implemented mathematical models. These models are used to evaluate the safety of the systems and to evaluate the consequences of modifications to the systems. These evaluations, however, are normally surrounded by significant uncertainties related to the uncertainty inherent in natural phenomena such as the weather and those related to uncertainties in the parameters and models used in the evaluation. Another use of these models is to evaluate strategies for improving information used in the modeling process itself. While sensitivity analysis is useful in defining variables in the model that are important, uncertainty analysis provides a tool for assessing the importance of uncertainty about these variables. A third complementary technique, is decision analysis. It provides a methodology for explicitly evaluating and ranking potential improvements to the model. Its use in the development of information gathering strategies for a nuclear waste repository are discussed in this paper. 相似文献
20.
The primary purpose of sampling inspection is the protection of consumer’s interests. Although under simple cost models, sampling inspection never serves the producer’s interest, some form of sampling inspection can be beneficial to the consumer under the same assumptions. We consider the case of isolated lot inspection and examine the consumer risk, economic sample design, and errors in the inspection process. Acceptance sampling is shown to be cost-effective to the consumer whenever the lot quality is less than perfect, and even for perfect lot quality in the presence of inspection errors. 相似文献