首页 | 本学科首页   官方微博 | 高级检索  
 共查询到18条相似文献,搜索用时 187 毫秒
通过对P2P网贷平台人人贷数据的持续跟踪与抓取,对参与的借款人-平台-出借人的借贷逾期行为和羊群行为进行定量分析.研究发现,对逾期行为造成显著性影响的变量有借款金额、借款期限、借款利率、借款人性别、借款人收入、借款人受教育程度、借款人有无房贷、借款人婚姻状况、借款人信用等级、借款人已偿清比率、借款人成功借款比率等.P2P网贷存在显著的羊群行为.  相似文献   

方匡南  杨阳 《统计研究》2018,35(8):104-115
针对分类问题,本文提出了稀疏组Lasso支持向量机方法(Sparse group lasso SVM, SGL-SVM),即在SVM模型的损失函数中引入SGL惩罚函数,能同时进行组间变量和组内变量的筛选。由于SGL-SVM的目标函数求解比较复杂,本文又提出了一种快速的双层坐标下降算法。通过模拟实验,发现SGL-SVM方法在预测效果和变量选择上均要好于其他方法,对于变量具有自然分组结构且组内是稀疏的数据,本文方法在提高变量选择效果的同时又能提高模型的预测精度。最后,将本文提出的SGL-SVM方法应用到我国制造业上市公司财务困境预测中。  相似文献   

方匡南  赵梦峦 《统计研究》2018,35(12):92-101
随着信息技术的发展,数据来源越来越多,一方面可以更加精准、科学地刻画个人信用状况,但另一方面,由于数据来源多、结构复杂等问题,对传统的征信技术带来了挑战。本文提出了基于多源数据融合的个人信用模型,可以同时对多个数据集进行建模和变量选择,同时考虑了数据集间的相似性和异质性。通过模拟实验发现,本文所提出的整合模型在变量选择和分类效果方面都具有明显的优势。最后,将整合模型应用于城市和农村两个数据集的个人信用评分中。  相似文献   

有序变量是定性数据中较为常见的数据形式,其统计分析方法的研究一直备受国内外学术界关注。本文对国内外处理有序变量的主要模型与方法进行了梳理,指出了当前我国在有序变量研究领域存在的不足,并提出了希冀。  相似文献   

在分类预测模型的自变量间存在交互效应时,传统Shapley值法的可加性无法满足,造成变量筛选效果变差,导致分类模型的预测精度降低。针对此问题,文章提出使用稳健独立成分分析,从原始数据中估计出具有独立性的数据集并对其进行Shapley值分解,从而提高变量筛选的准确度。统计模拟与实证分析的结果表明,改进后的方法在变量筛选上的表现优于传统Shapley值法。  相似文献   

面对海量高维信用数据,传统贝叶斯网络在刻画变量复杂结构和概率关系时遇到了挑战。尝试将基于multi-logit回归的离散贝叶斯网络稀疏方法用于个人信用影响因素结构关系的发现,实现从多维变量复杂关系中抓取重要结构关系;基于解路径探讨了用于结构发现的稀疏贝叶斯网络模型的选择标准,并比较了稀疏贝叶斯网络与经典贝叶斯网络结构学习的性能;结合领域先验知识进一步改进贝叶斯网络结构,定性分析多维变量存在的主要结构关系;在确定多维变量稀疏网络结构的基础上,采用贝叶斯后验估计获取模型参数,并利用贝叶斯网络推理定量分析关键变量对信贷客户类型的直接或间接影响。  相似文献   

高静 《统计与决策》2012,(18):19-21
现有的结构方程模型(SEM)分析软件有LISREL、AMOS、EQS、Mplus等,但它们都要求变量是连续的,并且服从多元正态分布,然而现实中却很难达到要求。文章基于有序分类数据研究幸福指数,对有序分类数据作了连续化处理,使其符合现有的SEM分析软件的假设条件,从而建立结构方程模型;并用AMOS、SPSS等软件进行了分析。事实证明,这样的处理方法是可行的,并且比原有数据有了改进。  相似文献   

银行房屋抵押贷款数据中有许多指标,也有一定数量的缺陷数据.根据实际房贷数据,用贷款金额,权利价值和房屋面积构造了一个指标ω,用它刻划实际贷款金额与权利价值之问的差异.建立三分类有序logistic:模型,用OR值解决了下面的问题:贷款比率水平变化给银行客户数量变化带来的影响.或者从客户的角度出发,阐述了在相同的贷款条件下,客户在哪家银行可能获得更多的贷款.并根据结论.提出了合理化建议.  相似文献   

范新妍等 《统计研究》2021,38(2):99-113
传统信用评分方法主要利用统计分类方法,只能预测借款人是否会发生违约,但不能预测违约发生的时点。治愈率模型是二分类和生存分析的混合模型,不仅可以预测是否会发生违约,而且可以预测违约发生的时点,比传统二分类方法可以提供更多的信息。另外,随着大数据的发展,数据源越来越多,针对相同或者相似任务,可以收集到多个数据集,本文提出了融合多源数据的整合治愈率模型,可以对多个数据集同时建模和估计参数,通过复合惩罚函数进行组间和组内双层变量选择,并通过促进两个子模型回归系数符号相同,提高模型的可解释性。通过数值模拟发现,所提方法在变量选择和参数估计上均有明显优势。最后,将所提方法应用于信用贷款的违约时点预测中,模型表现良好。  相似文献   

通过对广东农户民间借贷行为实地调查的问卷进行数据分析,了解农村民间借贷中的资金供求关系,发现存在的融资约束问题,进而对融资约束环境下民间借贷资金利率定价过程进行实证分析;着重考察农村民间借贷利率受公共信息和私人信息影响的程度,从借款人和贷款人的角度分别建立定价模型进行经验分析。结果显示,定价模型在F检验1%水平上显著,其他模型具有R2的统计显著性;反映借款用途的变量在10%水平上显著,其他变量均在5%水平上显著。这说明该市场利率能够反映公共信息的影响,借款人和贷款人的利率定价也反映了各自私人信息中相关风险和财务能力因素的影响,得到的经验结论主要是:第一,农村民间借贷市场是自主交易的金融市场;第二,其利率定价过程基本市场化。  相似文献   

In this contribution we aim at improving ordinal variable selection in the context of causal models for credit risk estimation. In this regard, we propose an approach that provides a formal inferential tool to compare the explanatory power of each covariate and, therefore, to select an effective model for classification purposes. Our proposed model is Bayesian nonparametric thus keeps the amount of model specification to a minimum. We consider the case in which information from the covariates is at the ordinal level. A noticeable instance of this regards the situation in which ordinal variables result from rankings of companies that are to be evaluated according to different macro and micro economic aspects, leading to ordinal covariates that correspond to various ratings, that entail different magnitudes of the probability of default. For each given covariate, we suggest to partition the statistical units in as many groups as the number of observed levels of the covariate. We then assume individual defaults to be homogeneous within each group and heterogeneous across groups. Our aim is to compare and, therefore select, the partition structures resulting from the consideration of different explanatory covariates. The metric we choose for variable comparison is the calculation of the posterior probability of each partition. The application of our proposal to a European credit risk database shows that it performs well, leading to a coherent and clear method for variable averaging of the estimated default probabilities.  相似文献   

Variable selection is an important decision process in consumer credit scoring. However, with the rapid growth in credit industry, especially, after the rising of e-commerce, a huge amount of information on customer behavior is available to provide more informative implication of consumer credit scoring. In this study, a hybrid quadratic programming model is proposed for consumer credit scoring problems by variable selection. The proposed model is then solved with a bisection method based on Tabu search algorithm (BMTS), and the solution of this model provides alternative subsets of variables in different sizes. The final subset of variables used in consumer credit scoring model is selected based on both the size (number of variables in a subset) and predictive (classification) accuracy rate. Simulation studies are used to measure the performances of the proposed model, illustrating its effectiveness for simultaneous variable selection as well as classification.  相似文献   

王小燕等 《统计研究》2014,31(9):107-112
变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构作筛选,实现了组内和组间双层选择。该方法的优点是对各单个系数和组系数采取不同程度的惩罚,避免了过度惩罚大系数,从而提高了模型的估计和预测精度。求解的难点是惩罚似然函数不是严格凸的,因此本文基于组坐标下降法求解模型,并建立了调整参数的选取准则。模拟分析表明,对比现有代表性方法Sparse Group Lasso、Group Lasso及Lasso,adSGL法不仅提高了双层选择精度,而且降低了模型误差。最后本文将adSGL法应用到信用卡信用评分研究,对比logistic回归,它具有更高的分类精度和稳健性。  相似文献   

缺少违约数据与债务人异质性是度量信用风险时面临的重要问题。贝叶斯模型中分层先验信息和马尔可夫链蒙特卡罗(MCMC)模拟方法的应用可以有效缓解数据缺失和测量误差问题,并能对债务人异质性进行评价和比较,从而避免低估风险。针对银行数据的模型拟合与模型诊断均展现了分层估计的适应性和灵活性,相关方法简洁清晰,利于国内风险分析人员采用。同时,涵盖宏观经济协变量的贝叶斯分层模型可以用于更加复杂的风险分析。  相似文献   

Penalized variable selection methods have been extensively studied for standard time-to-event data. Such methods cannot be directly applied when subjects are at risk of multiple mutually exclusive events, known as competing risks. The proportional subdistribution hazard (PSH) model proposed by Fine and Gray (J Am Stat Assoc 94:496–509, 1999) has become a popular semi-parametric model for time-to-event data with competing risks. It allows for direct assessment of covariate effects on the cumulative incidence function. In this paper, we propose a general penalized variable selection strategy that simultaneously handles variable selection and parameter estimation in the PSH model. We rigorously establish the asymptotic properties of the proposed penalized estimators and modify the coordinate descent algorithm for implementation. Simulation studies are conducted to demonstrate the good performance of the proposed method. Data from deceased donor kidney transplants from the United Network of Organ Sharing illustrate the utility of the proposed method.  相似文献   

张勇等 《统计研究》2015,32(5):32-39
文章在货币当局与市场主体存在不对称信息条件下,探讨了货币当局实施未预期的宽松性政策时,市场主体预期时变性在这一政策行动影响信贷市场融资成本过程中的机制。首先,我们采用外部融资溢价度量融资成本,提出在经济周期的不同阶段,未预期的宽松性政策通过作用于市场主体预期的时变性特征,进而影响外部融资溢价的非线性效应假说,然后,建立包含货币政策变量的马尔科夫区制转换信贷利差模型和市场主体预期形成模型展开检验。研究显示,未预期的宽松性政策会暴露出经济不良的私有信息,从而使市场主体形成悲观的经济前景预期。而且,在经济衰退阶段,政策行动促使市场主体预期恶化的影响效应较小,会较大幅度地降低外部融资溢价;在经济扩张阶段,政策行动促使市场主体预期恶化的影响效应将会越大,进而较小幅度地降低外部融资溢价,这也就意味着,市场主体预期形成方式的时变性,影响到未预期宽松性政策降低信贷市场融资成本的力度。  相似文献   

Here we consider a multinomial probit regression model where the number of variables substantially exceeds the sample size and only a subset of the available variables is associated with the response. Thus selecting a small number of relevant variables for classification has received a great deal of attention. Generally when the number of variables is substantial, sparsity-enforcing priors for the regression coefficients are called for on grounds of predictive generalization and computational ease. In this paper, we propose a sparse Bayesian variable selection method in multinomial probit regression model for multi-class classification. The performance of our proposed method is demonstrated with one simulated data and three well-known gene expression profiling data: breast cancer data, leukemia data, and small round blue-cell tumors. The results show that compared with other methods, our method is able to select the relevant variables and can obtain competitive classification accuracy with a small subset of relevant genes.  相似文献   

Estimating the risk factors of a disease such as diabetic retinopathy (DR) is one of the important research problems among bio-medical and statistical practitioners as well as epidemiologists. Incidentally many studies have focused in building models with binary outcomes, that may not exploit the available information. This article has investigated the importance of retaining the ordinal nature of the response variable (e.g. severity level of a disease) while determining the risk factors associated with DR. A generalized linear model approach with appropriate link functions has been studied using both Classical and Bayesian frameworks. From the result of this study, it can be observed that the ordinal logistic regression with probit link function could be more appropriate approach in determining the risk factors of DR. The study has emphasized the ways to handle the ordinal nature of the response variable with better model fit compared to other link functions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号