首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
方匡南  赵梦峦 《统计研究》2018,35(12):92-101
随着信息技术的发展,数据来源越来越多,一方面可以更加精准、科学地刻画个人信用状况,但另一方面,由于数据来源多、结构复杂等问题,对传统的征信技术带来了挑战。本文提出了基于多源数据融合的个人信用模型,可以同时对多个数据集进行建模和变量选择,同时考虑了数据集间的相似性和异质性。通过模拟实验发现,本文所提出的整合模型在变量选择和分类效果方面都具有明显的优势。最后,将整合模型应用于城市和农村两个数据集的个人信用评分中。  相似文献   

2.
范新妍等 《统计研究》2021,38(2):99-113
传统信用评分方法主要利用统计分类方法,只能预测借款人是否会发生违约,但不能预测违约发生的时点。治愈率模型是二分类和生存分析的混合模型,不仅可以预测是否会发生违约,而且可以预测违约发生的时点,比传统二分类方法可以提供更多的信息。另外,随着大数据的发展,数据源越来越多,针对相同或者相似任务,可以收集到多个数据集,本文提出了融合多源数据的整合治愈率模型,可以对多个数据集同时建模和估计参数,通过复合惩罚函数进行组间和组内双层变量选择,并通过促进两个子模型回归系数符号相同,提高模型的可解释性。通过数值模拟发现,所提方法在变量选择和参数估计上均有明显优势。最后,将所提方法应用于信用贷款的违约时点预测中,模型表现良好。  相似文献   

3.
王小燕等 《统计研究》2014,31(9):107-112
变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构作筛选,实现了组内和组间双层选择。该方法的优点是对各单个系数和组系数采取不同程度的惩罚,避免了过度惩罚大系数,从而提高了模型的估计和预测精度。求解的难点是惩罚似然函数不是严格凸的,因此本文基于组坐标下降法求解模型,并建立了调整参数的选取准则。模拟分析表明,对比现有代表性方法Sparse Group Lasso、Group Lasso及Lasso,adSGL法不仅提高了双层选择精度,而且降低了模型误差。最后本文将adSGL法应用到信用卡信用评分研究,对比logistic回归,它具有更高的分类精度和稳健性。  相似文献   

4.
Techniques of credit scoring have been developed these last years in order to reduce the risk taken by banks and financial institutions in the loans that they are granting. Credit Scoring is a classification problem of individuals in one of the two following groups: defaulting borrowers or non-defaulting borrowers. The aim of this paper is to propose a new method of discrimination when the dependent variable is categorical and when a large number of categorical explanatory variables are retained. This method, Categorical Multiblock Linear Discriminant Analysis, computes components which take into account both relationships between explanatory categorical variables and canonical correlation between each explanatory categorical variable and the dependent variable. A comparison with three other techniques and an application on credit scoring data are provided.  相似文献   

5.
信用评分是各类机构进行信用管理的有效工具,有着广泛的应用前景。随着计量技术的发展,信用评分方法也不断革新,为实际应用提供了多种选择。选取Logistic回归、分类树两种统计方法及代表信用评分发展趋势的人工智能神经网络中的多层感知器、径向基网络、自组织特征映射网络、支持向量机等共六种模型,运用较大样本量的个体工商户数据在一致的框架下进行检验。结果表明:Logistic回归模型与支持向量机两种方法在错分率、稳定性及适用性方面较为优越,其中支持向量机作为人工智能评分方法的最新应用之一,其综合性能更为突出。  相似文献   

6.
周怡等 《统计研究》2014,31(7):58-62
统计信用缺失是影响统计数据失真的重要原因,建立统计信用评级体系是从根本上保障诚信、提高数据质量的最有效的方法。本文运用信息不对称原理和博弈论分析模型对统计信用缺失进行分析,并对统计信用评价体系进行了初步设想。  相似文献   

7.
Kolmogorov–Smirnov statistic (KS) is a standard measure in credit scoring. Currently, there are three computational methods of KS: method with equal-width binning, method with equal-size binning and method without binning. This paper compares the three methods in three aspects: Values, Rank Ordering of Scores and Geometrical Way. The computational results on the German Credit Data show that only the method without binning can produce a unique value of KS. It is further proved analytically that the method without binning yields the maximum value of KS among the three methods. The computational results also show that only the method with equal-size binning can be used to evaluate rank ordering of scores. Moreover, it is proved that all the three methods can be used to calculate KS in a geometric way.  相似文献   

8.
ABSTRACT

Traditional credit risk assessment models do not consider the time factor; they only think of whether a customer will default, but not the when to default. The result cannot provide a manager to make the profit-maximum decision. Actually, even if a customer defaults, the financial institution still can gain profit in some conditions. Nowadays, most research applied the Cox proportional hazards model into their credit scoring models, predicting the time when a customer is most likely to default, to solve the credit risk assessment problem. However, in order to fully utilize the fully dynamic capability of the Cox proportional hazards model, time-varying macroeconomic variables are required which involve more advanced data collection. Since short-term default cases are the ones that bring a great loss for a financial institution, instead of predicting when a loan will default, a loan manager is more interested in identifying those applications which may default within a short period of time when approving loan applications. This paper proposes a decision tree-based short-term default credit risk assessment model to assess the credit risk. The goal is to use the decision tree to filter the short-term default to produce a highly accurate model that could distinguish default lending. This paper integrates bootstrap aggregating (Bagging) with a synthetic minority over-sampling technique (SMOTE) into the credit risk model to improve the decision tree stability and its performance on unbalanced data. Finally, a real case of small and medium enterprise loan data that has been drawn from a local financial institution located in Taiwan is presented to further illustrate the proposed approach. After comparing the result that was obtained from the proposed approach with the logistic regression and Cox proportional hazards models, it was found that the classifying recall rate and precision rate of the proposed model was obviously superior to the logistic regression and Cox proportional hazards models.  相似文献   

9.
Book review     
This paper encompasses three parts of validating risk models. The first part provides an understanding of the precision of the standard statistics used to validate risk models given varying sample sizes. The second part investigates jackknifing as a method to obtain a confidence interval for the Gini coefficient and K–S statistic for small sample sizes. The third and final part investigates the odds at various cutoff points as to its efficiency and appropriateness relative to the K–S statistic and Gini coefficient in model validation. There are many parts to understanding the risk associated with the extension of credit. This paper focuses on obtaining a better understanding of present methodology for validating existing risk models used for credit scoring, by investigating the three parts mentioned. The empirical investigation shows the precision of the Gini coefficient and K–S statistic is driven by the sample size of the smaller, either successes or failures. In addition, a simple adaption of the standard jackknifing formula is possible to use to get an understanding of the variability of the Gini coefficient and K–S statistic. Finally, the odds is not a reliable statistic to use without a considerably large sample of both successes and failures.  相似文献   

10.
洪祥骏  宫蕾 《统计研究》2021,38(8):16-29
本文着眼于企业资产支持证券(ABS)市场的信用风险溢价,探讨关联方增信的效果。研究表明,采用关联方增信的ABS风险溢价较高;当同一主体先前发行的ABS评级上调时,新发行ABS的风险溢价会因为采用关联方增信而升高。本文进一步探究了产生上述现象的原因,一方面,关联方增信使得集团内部风险集聚;另一方面,企业的道德风险使得原ABS的评级上调成为反映新ABS底层资产质量变差的负面信号,从而影响投资者的定价判断。本研究揭示了关联方增信可能存在的道德风险问题。通过分析ABS丰富的底层资产统计特征,本文发现集团风险集聚导致ABS的底层资产质量下降,进而提高了企业的融资成本。本文建议监管机构可以通过适当限制关联方增信或加强监管的方式保障我国企业资产证券化市场的健康持续发展。  相似文献   

11.
The naïve Bayes rule (NBR) is a popular and often highly effective technique for constructing classification rules. This study examines the effectiveness of NBR as a method for constructing classification rules (credit scorecards) in the context of screening credit applicants (credit scoring). For this purpose, the study uses two real-world credit scoring data sets to benchmark NBR against linear discriminant analysis, logistic regression analysis, k-nearest neighbours, classification trees and neural networks. Of the two aforementioned data sets, the first one is taken from a major Greek bank whereas the second one is the Australian Credit Approval data set taken from the UCI Machine Learning Repository (available at http://www.ics.uci.edu/~mlearn/MLRepository.html). The predictive ability of scorecards is measured by the total percentage of correctly classified cases, the Gini coefficient and the bad rate amongst accepts. In each of the data sets, NBR is found to have a lower predictive ability than some of the other five methods under all measures used. Reasons that may negatively affect the predictive ability of NBR relative to that of alternative methods in the context of credit scoring are examined.  相似文献   

12.
随着我国金融市场的蓬勃发展,信用评价中的拒绝推断问题越来越受到重视。针对信用评分模型中存在的有类别标签的样本占比低,并且样本中的类别分布不平衡等问题,本文在半监督学习技术与集成学习理论的基础上,提出了一种新的算法——BCT算法。该算法通过使用动态Bagging生成多个子分类器,引入分类阈值参数来解决样本类别分布不平衡问题,以及设定早停止条件来避免算法迭代过程中存在的过拟合风险,以此对传统半监督协同训练法进行改进。通过在5个真实数据集上的实证分析发现,在不同数据集与不同拒绝比例下,BCT算法的性能均优于其他6种有监督学习和半监督学习算法的信用评分模型,显示了BCT算法具有良好的模型泛化性能和更高的模型评价能力。  相似文献   

13.
Variable selection is an important decision process in consumer credit scoring. However, with the rapid growth in credit industry, especially, after the rising of e-commerce, a huge amount of information on customer behavior is available to provide more informative implication of consumer credit scoring. In this study, a hybrid quadratic programming model is proposed for consumer credit scoring problems by variable selection. The proposed model is then solved with a bisection method based on Tabu search algorithm (BMTS), and the solution of this model provides alternative subsets of variables in different sizes. The final subset of variables used in consumer credit scoring model is selected based on both the size (number of variables in a subset) and predictive (classification) accuracy rate. Simulation studies are used to measure the performances of the proposed model, illustrating its effectiveness for simultaneous variable selection as well as classification.  相似文献   

14.
The problem of treatment related withdrawals has not been fully addressed in the statistical literature. Statistical procedures which compare efficacies of treatments when withdrawals may occur prior to the conclusion of a study are discussed. A unified test statistic which incorporates all available data, not just the last values and adjusts for withdrawal patterns, proportion of withdrawals, and level of response prior to withdrawal has been developed with the help of data- dependent scoring schemes. A randomization technique has been developed to compute an empirical significance level for each scoring system under both the Fulldata and Endpoint data for a specified parameter configuration. The proposed methods have been applied to a subset (Allen Park Hospital) of the VA study 127.  相似文献   

15.
Two statistical scoring procedures based on pp-values have been developed to evaluate the overall performance of analytical laboratories performing environmental measurements. The overall scores of bias and standing are used to determine how consistently a laboratory is able to measure the true (unknown) value correctly over time. The overall scores of precision and standing are used to determine how well a laboratory is able to reproduce its measurements in the long run. Criteria are established for qualitatively labeling measurements as Acceptable, Warning, and Not Acceptable and for identifying areas where laboratories should re-evaluate their measurement procedures. These statistical scoring procedures are applied to two real environmental data sets.  相似文献   

16.
张晶等 《统计研究》2020,37(11):57-67
近年来,我国消费金融发展迅速,但同时也面临着更加复杂的欺诈和信用风险,为了更好地对消费金融中借贷客户的信用风险进行监测,本文提出了基于稀疏结构连续比率模型的风控方法。相对于传统的二分类模型,该模型的特点是可以处理借贷客户被分为三类或三类以上的有序数据,估计系数的同时能从众多纷繁复杂的数据中自动筛选重要变量,并在变量筛选过程中考虑不同子模型系数的结构特征。通过蒙特卡洛模拟发现,本文所提出的稀疏结构连续比率模型在分类泛化误差和变量筛选上的表现都较好。最后将本文提出的模型应用到实际的消费金融信用风险分析中,针对传统征信信息不足的借款人,通过引入高频电商消费行为数据,利用本文提出的高维有序多分类模型能有效识别借款人的信用风险,可以弥补传统征信方法的不足。  相似文献   

17.
Summary.  A statistical analysis of a bank's credit card database is presented. The database is a snapshot of accounts whose holders have missed a payment on a given month but who do not subsequently default. The variables on which there is information are observable measures on the account (such as profit and activity), and whether actions that are available to the bank (such as letters and telephone calls) have been taken. A primary objective for the bank is to gain insight into the effect that collections activity has on on-going account usage. A neglog transformation that highlights features that are hidden on the original scale and improves the joint distribution of the covariates is introduced. Quantile regression, a novel methodology to the credit scoring industry, is used as it is relatively assumption free, and it is suspected that different relationships may be manifest in different parts of the response distribution. The large size is handled by selecting relatively small subsamples for training and then building empirical distributions from repeated samples for validation. In the application to the database of clients who have missed a single payment a substantive finding is that the predictor of the median of the target variable contains different variables from those of the predictor of the 30% quantile. This suggests that different mechanisms may be at play in different parts of the distribution.  相似文献   

18.
Abstract

Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine, machine learning and credit scoring. The receiver operating characteristic (ROC) curve and surface are useful tools to assess the ability of diagnostic tests to discriminate between ordered classes or groups. To define these diagnostic tests, selecting the optimal thresholds that maximize the accuracy of these tests is required. One procedure that is commonly used to find the optimal thresholds is by maximizing what is known as Youden’s index. This article presents nonparametric predictive inference (NPI) for selecting the optimal thresholds of a diagnostic test. NPI is a frequentist statistical method that is explicitly aimed at using few modeling assumptions, enabled through the use of lower and upper probabilities to quantify uncertainty. Based on multiple future observations, the NPI approach is presented for selecting the optimal thresholds for two-group and three-group scenarios. In addition, a pairwise approach has also been presented for the three-group scenario. The article ends with an example to illustrate the proposed methods and a simulation study of the predictive performance of the proposed methods along with some classical methods such as Youden index. The NPI-based methods show some interesting results that overcome some of the issues concerning the predictive performance of Youden’s index.  相似文献   

19.
个人信用评分的主要模型与方法综述   总被引:16,自引:1,他引:15       下载免费PDF全文
随着中国经济的快速发展 ,信用消费已逐步浮出水面 ,住房按揭、汽车贷款、教育贷款、信用卡等各种个人消费贷款的规模在迅速扩大。在消费信贷热不断升温的形势下 ,各商业银行均把发展零售业务作为未来发展战略的重要组成部分。但是由于目前国内商业银行对零售业务的风险管理水平较低 ,管理手段与方法均较落后 ,其中缺乏一套有效的个人信用评分方法是阻碍了个人消费信贷业务进一步开展的主要因素之一。本文的目的就是在对国外有关商业银行较常使用的个人信用评分模型与方法进行综述 ,并就各种方法的性能进行分析比较。  一、信用评分的简要…  相似文献   

20.
我国信用卡业务的迅猛发展助推了消费经济的快速发展,但信用卡的逾期行为不容忽视。收入代表了一个人的经济地位,是信用卡按时还款的保障。本文基于某商业银行信用卡客户的逾期数据,以持卡人的经济地位为视角,分析了经济地位对信用卡逾期行为的影响。研究结果表明,我国商业银行信用卡持卡人的逾期行为具有显著的经济特征,收入对信用卡逾期的影响呈“U”型的非线性特征,即收入较低和收入较高的持卡人逾期的可能性较高,收入处于中间的持卡人逾期的可能性较低。进一步的研究发现,中年群体、工作单位稳定者、有房者会降低经济地位对信用卡逾期行为的非线性影响,而账龄较长的持卡人提升了这种影响。本文的研究为全社会建立良好的信用卡用卡环境,商业银行高效处理信用卡逾期,改进和完善商业银行信用卡风险管理提供了关键证据。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号