期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Robust linear discriminant analysis using S‐estimators

Christophe Croux Catherine Dehon 《Revue canadienne de statistique》2001,29(3):473-493

The authors consider a robust linear discriminant function based on high breakdown location and covariance matrix estimators. They derive influence functions for the estimators of the parameters of the discriminant function and for the associated classification error. The most B‐robust estimator is determined within the class of multivariate S‐estimators. This estimator, which minimizes the maximal influence that an outlier can have on the classification error, is also the most B‐robust location S‐estimator. A comparison of the most B‐robust estimator with the more familiar biweight S‐estimator is made. 相似文献

2.

Estimation of error rate in multiple group logistic discrimination. the approximate leaving-one-out method

E. Lesaffre J.L. Willems A. Albert 《统计学通讯:理论与方法》2013,42(8):2989-3007

We present an approximate leaving-one-out technique for estimating the error rate in logistic discrimination. The new measure is based on the one-step approximation of a(i), the maximum likelihood estimate of the parameter vector based on the sample without the ith case. Some inequalities between the resubstitution error rate, the approximate and exact leaving-one-out error rates for the multiple group logistic model are investigated. Monte-Carlo simulations assess the adequacy of the approximate leaving-one-out method as an estimate of the actual error rate. The usefulness of this approach is demonstrated by means of two medical examples. 相似文献

3.

A robust logistic discrimination model

COX TREVOR F. PEARCE KIM F. 《Statistics and Computing》1997,7(3):155-161

Logistic discrimination is a well documented method for classifying observations to two or more groups. However, estimation of the discriminant rule can be seriously affected by outliers. To overcome this, Cox and Ferry produced a robust logistic discrimination technique. Although their method worked in practice, parameter estimation was sometimes prone to convergence problems. This paper proposes a simplified robust logistic model which does not have any such problems and which takes a generalized linear model form. Misclassification rates calculated in a simulation exercise are used to compare the new method with ordinary logistic discrimination. Model diagnostics are also presented. The newly proposed model is then used on data collected from pregnant women at two district general hospitals. A robust logistic discriminant is calculated which can be used to predict accurately which method of feeding a woman will eventually use: breast feeding or bottle feeding. 相似文献

4.

INFLUENCE FUNCTIONS IN FUNCTIONAL MEASUREMENT ERROR MODELS WITH REPLICATED DATA

A. R. RASEKH N. R. J. FIELLER 《Statistics》2013,47(2):169-178

We consider the construction and properties of influence functions in the context of functional measurement error models with replicated data. In these models estimates of the parameters can be affected both by the individual observations and the means of replicated observations. We show that influence function of the means of replicates on the estimate of regression coefficients can be only derived under the assumption that the variances of the errors are known, while one for the individual observations can be only derived simultaneously with their influence function on the estimators of the variances of the errors. 相似文献

5.

On discrimination procedure with mixtures of continuous and categorical variables

Gafar Matanmi Oyeyemi George Chinanu Mbaeyi Saheed Ishola Salawu Bernard Olagboyega Muse 《Journal of applied statistics》2016,43(10):1864-1873

A discrimination procedure, based on the location model is described and suggested for use in situation where the discriminating variables are mixtures of continuous and binary variables. Some procedures that have been previously employed, in a similar situation, like Fisher's linear discriminant function and the logistic regression were compared with this method using error rate (ER). Optimal ERs for these procedures are reported using real and simulated data for the case of varying sample size and number of continuous and binary variables and were used as a measure for assessing the performance of the various procedures. The suggested procedure performed considerably better in the cases considered and never did produce a result that is poor when compared with other procedures. Hence, the suggested procedure might be considered for such situations. 相似文献

6.

Bayesian incorporation of repeated measurements in logistic discrimination

D. F. Andrews R. Brant M. E. Percy 《Revue canadienne de statistique》1986,14(3):263-266

A common problem in medical statistics is the discrimination between two groups on the basis of diagnostic information. Information on patient characteristics is used to classify individuals into one of two groups: diseased or disease-free. This classification is often with respect to a particular disease. This discrimination has two probabilistic components: (1) the discrimination is not without error, and (2) in many cases the a priori chance of disease can be estimated. Logistic models (Cox 1970; Anderson 1972) provide methods for incorporating both of these components. The a posteriori probability of disease may be estimated for a patient on the basis of both current measurement of patient characteristics and prior information. The parameters of the logistic model may be estimated on the basis of a calibration trial. In practice, not one but several sets of measurements of one characteristic of the patient may be made on a questionable case. These measurements typically are correlated; they are far from independent. How should these correlated measurements be used? This paper presents a method for incorporating several sets of measurements in the classification of a case. 相似文献

7.

Spatial discrimination and classification maps

K.V Mardia 《统计学通讯:理论与方法》2013,42(18):2181-2197

A method of constructing maps through spatial discrimination is given. The discrimination depends basically on the assumption of local spatial continuity, and a factorized covariance matrix. Given an autocovariance function, this formulation in particular, leads to a deeper insight into the pioneering work of Switzer (1980). Certain windows for the maps are examined, and choice of window size is discussed in relation to the classification error when the variables are dependent versus independent. When a training data is given, we give a method of estimating the parameters in the model. Some numerical examples are also given. 相似文献

8.

Classification with discrete and continuous variables via general mixed-data models

A. R. de Leon A. Soo T. Williamson 《Journal of applied statistics》2011,38(5):1021-1032

We study the problem of classifying an individual into one of several populations based on mixed nominal, continuous, and ordinal data. Specifically, we obtain a classification procedure as an extension to the so-called location linear discriminant function, by specifying a general mixed-data model for the joint distribution of the mixed discrete and continuous variables. We outline methods for estimating misclassification error rates. Results of simulations of the performance of proposed classification rules in various settings vis-à-vis a robust mixed-data discrimination method are reported as well. We give an example utilizing data on croup in children. 相似文献

9.

Prediction Error Estimation Under Bregman Divergence for Non-Parametric Regression and Classification

CHUNMING ZHANG 《Scandinavian Journal of Statistics》2008,35(3):496-523

Abstract. Prediction error is critical to assess model fit and evaluate model prediction. We propose the cross-validation (CV) and approximated CV methods for estimating prediction error under the Bregman divergence (BD), which embeds nearly all of the commonly used loss functions in the regression, classification procedures and machine learning literature. The approximated CV formulas are analytically derived, which facilitate fast estimation of prediction error under BD. We then study a data-driven optimal bandwidth selector for local-likelihood estimation that minimizes the overall prediction error or equivalently the covariance penalty. It is shown that the covariance penalty and CV methods converge to the same mean-prediction-error-criterion. We also propose a lower-bound scheme for computing the local logistic regression estimates and demonstrate that the algorithm monotonically enhances the target local likelihood and converges. The idea and methods are extended to the generalized varying-coefficient models and additive models. 相似文献

10.

Flexible regression modeling

Peter M. Hooper 《Revue canadienne de statistique》2001,29(3):343-364

The author proposes a new method for flexible regression modeling of multi‐dimensional data, where the regression function is approximated by a linear combination of logistic basis functions. The method is adaptive, selecting simple or more complex models as appropriate. The number, location, and (to some extent) shape of the basis functions are automatically determined from the data. The method is also affine invariant, so accuracy of the fit is not affected by rotation or scaling of the covariates. Squared error and absolute error criteria are both available for estimation. The latter provides a robust estimator of the conditional median function. Computation is relatively fast, particularly for large data sets, so the method is well suited for data mining applications. 相似文献

11.

Nenparametric Two-Group Classification: Concepts and a SAS-Based Software Package

A. Pedro Duarte Silva Antonie Stam 《The American statistician》2013,67(2):185-197

This article introduces BestClass, a set of SAS macros, available in the mainframe and workstation environment, designed for solving two-group classification problems using a class of recently developed nonparametric classification methods. The criteria used to estimate the classification function are based on either minimizing a function of the absolute deviations from the surface which separates the groups, or directly minimizing a function of the number of misclassified entities in the training sample. The solution techniques used by BestClass to estimate the classification rule use the mathematical programming routines of the SAS/OR software. Recently, a number of research studies have reported that under certain data conditions this class of classification methods can provide more accurate classification results than existing methods, such as Fisher's linear discriminant function and logistic regression. However, these robust classification methods have not yet been implemented in the major statistical packages, and hence are beyond the reach of those statistical analysts who are unfamiliar with mathematical programming techniques. We use a limited simulation experiment and an example to compare and contrast properties of the methods included in Best-Class with existing parametric and nonparametric methods. We believe that BestClass contributes significantly to the field of nonparametric classification analysis, in that it provides the statistical community with convenient access to this recently developed class of methods. BestClass is available from the authors. 相似文献

12.

Estimation and prediction for Chen distribution with bathtub shape under progressive censoring

Tanmay Kayal Devendra Pratap Singh Manoj Kumar Rastogi 《Journal of Statistical Computation and Simulation》2017,87(2):348-366

We consider estimation of the unknown parameters of Chen distribution [Chen Z. A new two-parameter lifetime distribution with bathtub shape or increasing failure rate function. Statist Probab Lett. 2000;49:155–161] with bathtub shape using progressive-censored samples. We obtain maximum likelihood estimates by making use of an expectation–maximization algorithm. Different Bayes estimates are derived under squared error and balanced squared error loss functions. It is observed that the associated posterior distribution appears in an intractable form. So we have used an approximation method to compute these estimates. A Metropolis–Hasting algorithm is also proposed and some more approximate Bayes estimates are obtained. Asymptotic confidence interval is constructed using observed Fisher information matrix. Bootstrap intervals are proposed as well. Sample generated from MH algorithm are further used in the construction of HPD intervals. Finally, we have obtained prediction intervals and estimates for future observations in one- and two-sample situations. A numerical study is conducted to compare the performance of proposed methods using simulations. Finally, we analyse real data sets for illustration purposes. 相似文献

13.

Influence functions applied to the estimation of mean rain rate

Donald E. K. Martin 《Journal of applied statistics》2001,28(2):247-258

In this paper we illustrate the usefulness of influence functions for studying properties of various statistical estimators of mean rain rate using space-borne radar data. In Martin (1999), estimators using censoring, minimum chi-square, and least squares are compared in terms of asymptotic variance. Here, we use influence functions to consider robustness properties of the same estimators. We also obtain formulas for the asymptotic variance of the estimators using influence functions, and thus show that they may also be used for studying relative efficiency. The least squares estimator, although less efficient, is shown to be more robust in the sense that it has the smallest gross-error sensitivity. In some cases, influence functions associated with the estimators reveal counterintuitive behaviour. For example, observations that are less than the mean rain rate may increase the estimated mean. The additional information gleaned from influence functions may be used to understand better and improve the estimation procedures themselves. 相似文献

14.

Robust centroid based classification with minimum error rates for high dimension,low sample size data

Jiancheng Jiang J.S. Marron Xuejun Jiang 《Journal of statistical planning and inference》2009

A new method of statistical classification (discrimination) is proposed. The method is most effective for high dimension, low sample size data. It uses a robust mean difference as the direction vector and locates the classification boundary by minimizing the error rates. Asymptotic results for assessment and comparison to several popular methods are obtained by using a type of asymptotics of finite sample size and infinite dimensions. The value of the proposed approach is demonstrated by simulations. Real data examples are used to illustrate the performance of different classification methods. 相似文献

15.

Sensitivity analysis of reliability functions of the exponential power series lifetime distribution

Mohammad Salehi Vaysi Saralees Nadarajah 《统计学通讯:模拟与计算》2013,42(10):2938-2952

ABSTRACT

Hazard rate functions are often used in modeling of lifetime data. The Exponential Power Series (EPS) family has a monotone hazard rate function. In this article, the influence of input factors such as time and parameters on the variability of hazard rate function is assessed by local and global sensitivity analysis. Two different indices based on local and global sensitivity indices are presented. The simulation results for two datasets show that the hazard rate functions of the EPS family are sensitive to input parameters. The results also show that the hazard rate function of the EPS family is more sensitive to the exponential distribution than power series distributions. 相似文献

16.

Bayesian and maximum likelihood estimations of the inverted exponentiated half logistic distribution under progressive Type II censoring

Kyeongjun Lee 《Journal of applied statistics》2017,44(5):811-832

In this paper, the estimation of parameters, reliability and hazard functions of a inverted exponentiated half logistic distribution (IEHLD) from progressive Type II censored data has been considered. The Bayes estimates for progressive Type II censored IEHLD under asymmetric and symmetric loss functions such as squared error, general entropy and linex loss function are provided. The Bayes estimates for progressive Type II censored IEHLD parameters, reliability and hazard functions are also obtained under the balanced loss functions. However, the Bayes estimates cannot be obtained explicitly, Lindley approximation method and importance sampling procedure are considered to obtain the Bayes estimates. Furthermore, the asymptotic normality of the maximum likelihood estimates is used to obtain the approximate confidence intervals. The highest posterior density credible intervals of the parameters based on importance sampling procedure are computed. Simulations are performed to see the performance of the proposed estimates. For illustrative purposes, two data sets have been analyzed. 相似文献

17.

The use of Smooth Bootstrap Techniques for Estimating the Error Rate of a Prediction Rule

J.M. Prada Sánchez X.L. Otero Cepeda 《统计学通讯:模拟与计算》2013,42(3):1169-1186

In this paper we present a simulation study for comparing differents methods for estimating the prediction error rate in a discrimination problem. We consider the Cross-validation, Bootstrap and Bayesian Bootstrap methods for such as problem, while also elaborating on both simple and Bayesian Bootstrap methods by smoothing techniques. We observe as the smoothing procedure lead to improvements in the estimation of the true error rate of the discrimination rule, specially in the case of the smooth Bayesian Bootstrap estimator, whose reduction in M.S.E. resulted from the high positive correlation between the true error rate and its estimations based in this method. 相似文献

18.

The Relationship Between the T2 Statistic and the Influence Function

Robert L. Mason Youn-Min Chou John C. Young 《统计学通讯:理论与方法》2014,43(13):2844-2857

Hotelling's T² statistic has many applications in multivariate analysis. In particular, it can be used to measure the influence that a particular observation vector has on parameter estimation. For example, in the bivariate case, there exists a direct relationship between the ellipse generated using a T² statistic for individual observations and the hyperbolae generated using Hampel's influence function for the corresponding correlation coefficient. In this paper, we jointly use the components of an orthogonal decomposition of the T² statistic and some influence functions to identify outliers or influential observations. Since the conditional components in the T² statistic are related to the possible changes in the correlation between a variable and a group of other variables, we consider the theoretical influence functions of the correlations and multiple correlation coefficients. Finite-sample versions of these influence functions are used to find the estimated influence function values. 相似文献

19.

New aspects of Bregman divergence in regression and classification with parametric and nonparametric estimation

Chunming Zhang Yuan Jiang Zuofeng Shang 《Revue canadienne de statistique》2009,37(1):119-139

In statistical learning, regression and classification concern different types of the output variables, and the predictive accuracy is quantified by different loss functions. This article explores new aspects of Bregman divergence (BD), a notion which unifies nearly all of the commonly used loss functions in regression and classification. The authors investigate the duality between BD and its generating function. They further establish, under the framework of BD, asymptotic consistency and normality of parametric and nonparametric regression estimators, derive the lower bound of their asymptotic covariance matrices, and demonstrate the role that parametric and nonparametric regression estimation play in the performance of classification procedures and related machine learning techniques. These theoretical results and new numerical evidence show that the choice of loss function affects estimation procedures, whereas has an asymptotically relatively negligible impact on classification performance. Applications of BD to statistical model building and selection with non‐Gaussian responses are also illustrated. The Canadian Journal of Statistics 37: 119‐139; 2009 © 2009 Statistical Society of Canada 相似文献

20.

Robust penalized logistic regression with truncated loss functions

Park SY Liu Y 《Revue canadienne de statistique》2011,39(2):300-323

The penalized logistic regression (PLR) is a powerful statistical tool for classification. It has been commonly used in many practical problems. Despite its success, since the loss function of the PLR is unbounded, resulting classifiers can be sensitive to outliers. To build more robust classifiers, we propose the robust PLR (RPLR) which uses truncated logistic loss functions, and suggest three schemes to estimate conditional class probabilities. Connections of the RPLR with some other existing work on robust logistic regression have been discussed. Our theoretical results indicate that the RPLR is Fisher consistent and more robust to outliers. Moreover, we develop estimated generalized approximate cross validation (EGACV) for the tuning parameter selection. Through numerical examples, we demonstrate that truncating the loss function indeed yields better performance in terms of classification accuracy and class probability estimation. 相似文献