期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A classification spline machine for building a credit scorecard

《Journal of Statistical Computation and Simulation》2012,82(5):681-689

In constructing a scorecard, we partition each characteristic variable into a few attributes and assign weights to those attributes. For the task, a simulated annealing algorithm has been proposed. A drawback of simulated annealing is that the number of cutpoints separating each characteristic variable into attributes is required as an input. We introduce a scoring method, called a classification spline machine (CSM), which determines cutpoints automatically via a stepwise basis selection. In this paper, we compare performances of CSM and simulated annealing on simulated datasets. The results indicate that the CSM can be useful in the construction of scorecards. 相似文献

2.

个人信用评分的主要模型与方法综述 总被引：15，自引：1，他引：15

下载免费PDF全文

石庆焱靳云汇《统计研究》2003,20(8):36-4

随着中国经济的快速发展 ,信用消费已逐步浮出水面 ,住房按揭、汽车贷款、教育贷款、信用卡等各种个人消费贷款的规模在迅速扩大。在消费信贷热不断升温的形势下 ,各商业银行均把发展零售业务作为未来发展战略的重要组成部分。但是由于目前国内商业银行对零售业务的风险管理水平较低 ,管理手段与方法均较落后 ,其中缺乏一套有效的个人信用评分方法是阻碍了个人消费信贷业务进一步开展的主要因素之一。本文的目的就是在对国外有关商业银行较常使用的个人信用评分模型与方法进行综述 ,并就各种方法的性能进行分析比较。　　一、信用评分的简要… 相似文献

3.

Fused least absolute shrinkage and selection operator for credit scoring

《Journal of Statistical Computation and Simulation》2012,82(11):2135-2147

Credit scoring can be defined as the set of statistical models and techniques that help financial institutions in their credit decision makings. In this paper, we consider a coarse classification method based on fused least absolute shrinkage and selection operator (LASSO) penalization. By adopting fused LASSO, one can deal continuous as well as discrete variables in a unified framework. For computational efficiency, we develop a penalization path algorithm. Through numerical examples, we compare the performances of fused LASSO and LASSO with dummy variable coding. 相似文献

4.

A zero-inflated non default rate regression model for credit scoring data

Francisco Louzada Fernando F. Moreira 《统计学通讯:理论与方法》2018,47(12):3002-3021

The aim of this paper is to propose a survival credit risk model that jointly accommodates three types of time-to-default found in bank loan portfolios. It leads to a new framework that extends the standard cure rate model introduced by Berkson and Gage (1952 Berkson, J., and R. P. Gage. 1952. Survival curve for cancer patients following treatment. Journal of the American Statistical Association 47 (259):501–15.[Taylor & Francis Online], [Web of Science ®] , [Google Scholar]) regarding the accommodation of zero-inflations. In other words, we propose a new survival model that takes into account three different types of individuals which have so far not been jointly accounted for: (i) an individual with an event at the starting time (zero time); (ii) non susceptible for the event, or (iii) susceptible for the event. Considering this, the zero-inflated Weibull non default rate regression models, which include a multinomial logistic link for the three classes, are presented using an application for credit scoring data. The parameter estimation is reached by the maximum-likelihood estimation procedure and Monte Carlo simulations are carried out to assess its finite sample performance. 相似文献

5.

On the confusion matrix in credit scoring and its analytical properties

Guoping Zeng 《统计学通讯:理论与方法》2020,49(9):2080-2093

Abstract

Confusion Matrix is an important measure to evaluate the accuracy of credit scoring models. However, the literature about Confusion Matrix is limited. The analytical properties of Confusion Matrix are ignored. Moreover, the concept of Confusion Matrix is confusing. In this article, we systematically study Confusion Matrix and its analytical properties. We enumerate 16 possible variants of Confusion Matrix and show that only 8 are reasonable. We study the relationship between Confusion Matrix and 2 other performance measures: the receiver operating characteristic curve (ROC) and Kolmogorov-Smirnov statistic (KS). We show that an optimal cutoff score can be attained by KS. 相似文献

6.

On the three-way equivalence of AUC in credit scoring with tied scores

Guoping Zeng Edward Zeng 《统计学通讯:理论与方法》2019,48(7):1635-1650

In credit scoring, it is well known that AUC (the area under curve) can be calculated geometrically, by the probability of a correct ranking of a good and bad pair, and by the Wilcoxon Rank-Sum statistic. This three-way equivalence was first present by Hanley and McNeil in 1982 without considering tied scores and without giving analytical proofs. In this paper, we extend the three-way equivalence to the case with tied scores and provide analytic proofs for the three-way equivalence. 相似文献

7.

Invariant properties of logistic regression model in credit scoring under monotonic transformations

Guoping Zeng 《统计学通讯:理论与方法》2017,46(17):8791-8807

Monotonic transformations of explanatory continuous variables are often used to improve the fit of the logistic regression model to the data. However, no analytic studies have been done to study the impact of such transformations. In this paper, we study invariant properties of the logistic regression model under monotonic transformations. We prove that the maximum likelihood estimates, information value, mutual information, Kolmogorov–Smirnov (KS) statistics, and lift table are all invariant under certain monotonic transformations. 相似文献

8.

The neglog transformation and quantile regression for the analysis of a large credit scoring database

Joe Whittaker Chris Whitehead Mark Somers 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(5):863-878

Summary. A statistical analysis of a bank's credit card database is presented. The database is a snapshot of accounts whose holders have missed a payment on a given month but who do not subsequently default. The variables on which there is information are observable measures on the account (such as profit and activity), and whether actions that are available to the bank (such as letters and telephone calls) have been taken. A primary objective for the bank is to gain insight into the effect that collections activity has on on-going account usage. A neglog transformation that highlights features that are hidden on the original scale and improves the joint distribution of the covariates is introduced. Quantile regression, a novel methodology to the credit scoring industry, is used as it is relatively assumption free, and it is suspected that different relationships may be manifest in different parts of the response distribution. The large size is handled by selecting relatively small subsamples for training and then building empirical distributions from repeated samples for validation. In the application to the database of clients who have missed a single payment a substantive finding is that the predictor of the median of the target variable contains different variables from those of the predictor of the 30% quantile. This suggests that different mechanisms may be at play in different parts of the distribution. 相似文献

9.

Link sampling for attributes

K. Harishchandra P. Sriveakataramana 《统计学通讯:理论与方法》2013,42(16):1855-1868

A few lot-by-lot acceptance sampling procedures for attributes are proposed as alternatives to the usual double sampling. In these schemes whenever a second sample is needed, the sample information from neighbouring lots is used. The new plans have the DC identical to that of the comparable double sampling plan. The primary advantage of these plans is a reduction in cost due to a smaller ASN. An empirical study which investigates the effect of sudden shifts in quality level on the probability of acceptance and ARL under the proposed plans is included 相似文献

10.

Double-sampling control charts for attributes

Aurélia Aparecida De Araújo Rodrigues Maysa Sacramento De Magalhães 《Journal of applied statistics》2011,38(1):87-112

In this article, we propose a double-sampling (DS) np control chart. We assume that the time interval between samples is fixed. The choice of the design parameters of the proposed chart and also comparisons between charts are based on statistical properties, such as the average number of samples until a signal. The optimal design parameters of the proposed control chart are obtained. During the optimization procedure, constraints are imposed on the in-control average sample size and on the in-control average run length. In this way, required statistical properties can be assured. Varying some input parameters, the proposed DS np chart is compared with the single-sampling np chart, variable sample size np chart, CUSUM np and EWMA np charts. The comparisons are carried out considering the optimal design for each chart. For the ranges of parameters considered, the DS scheme is the fastest one for the detection of increases of 100% or more in the fraction non-conforming and, moreover, the DS np chart is easy to operate. 相似文献

11.

Probability scoring for spelling correction

Kenneth W. Church William A. Gale 《Statistics and Computing》1991,1(2):93-103

This paper describes a new program, CORRECT, which takes words rejected by the Unix^® SPELL program, proposes a list of candidate corrections, and sorts them by probability score. The probability scores are the novel contribution of this work. They are based on a noisy channel model. It is assumed that the typist knows what words he or she wants to type but some noise is added on the way to the keyboard (in the form of typos and spelling errors). Using a classic Bayesian argument of the kind that is popular in recognition applications, especially speech recognition (Jelinek, 1985), one can often recover the intended correction,c, from a typo,t, by finding the correctionc that maximizesPr(c) Pr(t/c). The first factor,Pr(c), is a prior model of word probabilities; the second factor,Pr(t/c), is a model of the noisy channel that accounts for spelling transformations on letter sequences (insertions, deletions, substitutions and reversals). Both sets of probabilities were estimated using data collected from the Associated Press (AP) newswire over 1988 and 1989 as a training set. The AP generates about 1 million words and 500 typos per week.In evaluating the program, we found that human judges were extremely reluctant to cast a vote given only the information available to the program, and that they were much more comfortable when they could see a concordance line or two. The second half of this paper discusses some very simple methods of modeling the context usingn-gram statistics. Althoughn-gram methods are much too simple (compared with much more sophisticated methods used in artificial intelligence and natural language processing), we have found that even these very simple methods illustrate some very interesting estimation problems that will almost certainly come up when we consider more sophisticated models of contexts. The problem is how to estimate the probability of a context that we have not seen. We compare several estimation techniques and find that some are useless. Fortunately, we have found that the Good-Turing method provides an estimate of contextual probabilities that produces a significant improvement in program performance. Context is helpful in this application, but only if it is estimated very carefully.At this point, we have a number of different knowledge sources—the prior, the channel and the context—and there will certainly be more in the future. In general, performance will be improved as more and more knowledge sources are added to the system, as long as each additional knowledge source provides some new (independent) information. As we shall see, it is important to think more carefully about combination rules, especially when there are a large number of different knowledge sources. 相似文献

12.

Defining equations for two-level factorial designs

Neil A. Butler 《Journal of statistical planning and inference》2008

Defining equations are introduced in the context of two-level factorial designs and they are shown to provide a concise specification of both regular and nonregular designs. The equations are used to find orthogonal arrays of high strength and some optimal designs. The latter optimal designs are formed in a new way by augmenting notional orthogonal arrays which are allowed to have some runs with a negative number of replicates before augmentation. Defining equations are also shown to be useful when the factorial design is blocked. 相似文献

13.

Control chart limits for attributes and events

A. F. Bissell 《Journal of applied statistics》1988,15(1):97-105

相似文献

14.

Independent attributes control chart for Markov-dependent processes

M. P. Gadre S. B. Adnaik 《统计学通讯:模拟与计算》2019,48(4):985-997

Statistical process monitoring (SPM) is a very efficient tool to maintain and to improve the quality of a product. In many industrial processes, end product has two or more attribute-type quality characteristics. Some of them are independent, but the observations are Markovian dependent. It is essential to develop a control chart for such situations. In this article, we develop an Independent Attributes Control Chart for Markov Dependent Processes based on error probabilities criterion under the assumption of one-step Markov dependency. Implementation of the chart is similar to that of Shewhart-type chart. Performance of the chart has been studied using probability of detecting shift criterion. A procedure to identify the attribute(s) responsible for out-of-control status of the process is given. 相似文献

15.

Control chart for monitoring multivariate COM-Poisson attributes

Aamir Saghir Zhengyan Lin 《Journal of applied statistics》2014,41(1):200-214

Statistical process control of multi-attribute count data has received much attention with modern data-acquisition equipment and online computers. The multivariate Poisson distribution is often used to monitor multivariate attributes count data. However, little work has been done so far on under- or over-dispersed multivariate count data, which is common in many industrial processes, with positive or negative correlation. In this study, a Shewhart-type multivariate control chart is constructed to monitor such kind of data, namely the multivariate COM-Poisson (MCP) chart, based on the MCP distribution. The performance of the MCP chart is evaluated by the average run length in simulation. The proposed chart generalizes some existing multivariate attribute charts as its special cases. A real-life bivariate process and a simulated trivariate Poisson process are used to illustrate the application of the MCP chart. 相似文献

16.

Optimal designs for choice experiments with asymmetric attributes

《Journal of statistical planning and inference》2005,134(1):288-301

相似文献

17.

Control charts for attributes with maxima nominated samples

Mohammad Jafari Jozani Sayed Jamal Mirkamali 《Journal of statistical planning and inference》2011,141(7):2386-2398

We develop quality control charts for attributes using the maxima nomination sampling (MNS) method and compare them with the usual control charts based on simple random sampling (SRS) method, using average run length (ARL) performance, the required sample size in detecting quality improvement, and non-existence region for control limits. We study the effect of the sample size, the set size, and nonconformity proportion on the performance of MNS control charts using ARL curve. We show that MNS control chart can be used as a better benchmark for indicating quality improvement or quality deterioration relative to its SRS counterpart. We consider MNS charts from a cost perspective. We also develop MNS attribute control charts using randomized tests. A computer program is designed to determine the optimal control limits for an MNS p-chart such that, assuming known parameter values, the absolute deviation between the ARL and a specific nominal value is minimized. We provide good approximations for the optimal MNS control limits using regression analysis. Theoretical results are augmented with numerical evaluations. These show that MNS based control charts can yield substantial improvement over the usual control charts based on SRS. 相似文献

18.

An augmented data scoring algorithm for maximum likelihood

Jun Ma H. Malcolm Hudson 《统计学通讯:理论与方法》2013,42(11):2761-2776

The expectation-maximization (EM) method facilitates computation of max¬imum likelihood (ML) and maximum penalized likelihood (MPL) solutions. The procedure requires specification of unobservabie complete data which augment the measured or incomplete data. This specification defines a conditional expectation of the complete data log-likelihood function which is computed in the E-stcp. The EM algorithm is most effective when maximizing the iunction Q{0) denned in the F-stnp is easier than maximizing the likelihood function.

The Monte Carlo EM (MCEM) algorithm of Wei & Tanner (1990) was introduced for problems where computation of Q is difficult or intractable. However Monte Carlo can he computationally expensive, e.g. in signal processing applications involving large numbers of parameters. We provide another approach: a modification of thc standard EM algorithm avoiding computation of conditional expectations. 相似文献

19.

Properties and performance of the c-chart for attributes data

S. Chakraborti S. W. Human 《Journal of applied statistics》2008,35(1):89-100

The effects of parameter estimation are examined for the well-known c-chart for attributes data. The exact run length distribution is obtained for Phase II applications, when the true average number of non-conformities, c, is unknown, by conditioning on the observed number of non-conformities in a set of reference data (from Phase I). Expressions for various chart performance characteristics, such as the average run length (ARL), the standard deviation of the run length (SDRL) and the median run length (MDRL) are also obtained. Examples show that the actual performance of the chart, both in terms of the false alarm rate (FAR) and the in-control ARL, can be substantially different from what might be expected when c is known, in that an exceedingly large number of false alarms are observed, unless the number of inspection units (the size of the reference dataset) used to estimate c is very large, much larger than is commonly used or recommended in practice. In addition, the actual FAR and the in-control ARL values can be very different from the nominally expected values such as 0.0027 (or ARL₀=370), particularly when c is small, even with large amounts of reference data. A summary and conclusions are offered. 相似文献

20.

A Bayesian nonparametric model for Taguchi's on-line quality monitoring procedure for attributes

Miriam Harumi Tsunemi Thiago Feitosa Campos Luís Gustavo Esteves José Galvão Leite Sergio Wechsler 《Journal of statistical planning and inference》2012

A Bayesian nonparametric model for Taguchi's on-line quality monitoring procedure for attributes is introduced. The proposed model may accommodate the original single shift setting to the more realistic situation of gradual quality deterioration and allows the incorporation of an expert's opinion on the production process. Based on the number of inspections to be carried out until a defective item is found, the Bayesian operation for the distribution function that represents the increasing sequence of defective fractions during a cycle considering a mixture of Dirichlet processes as prior distribution is performed. Bayes estimates for relevant quantities are also obtained. 相似文献