首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

Sliced average variance estimation (SAVE) is one of the best methods for estimating central dimension-reduction subspace in semi parametric regression models when covariates are normal. In recent days SAVE is being used to analyze DNA microarray data especially in tumor classification but most important drawback is normality of covariates. In this article, the asymptotic behavior of estimates of CDR space under varying slice size is studied through simulation studies when covariates are non normal but follows linearity condition as well as when covariates slightly perturbed from normal distribution and we observed that serious error may occur under violation normality assumption.  相似文献   

2.
ABSTRACT

Fisher's linear discriminant analysis (FLDA) is known as a method to find a discriminative feature space for multi-class classification. As a theory of extending FLDA to an ultimate nonlinear form, optimal nonlinear discriminant analysis (ONDA) has been proposed. ONDA indicates that the best theoretical nonlinear map for maximizing the Fisher's discriminant criterion is formulated by using the Bayesian a posterior probabilities. In addition, the theory proves that FLDA is equivalent to ONDA when the Bayesian a posterior probabilities are approximated by linear regression (LR). Due to some limitations of the linear model, there is room to modify FLDA by using stronger approximation/estimation methods. For the purpose of probability estimation, multi-nominal logistic regression (MLR) is more suitable than LR. Along this line, in this paper, we develop a nonlinear discriminant analysis (NDA) in which the posterior probabilities in ONDA are estimated by MLR. In addition, in this paper, we develop a way to introduce sparseness into discriminant analysis. By applying L1 or L2 regularization to LR or MLR, we can incorporate sparseness in FLDA and our NDA to increase generalization performance. The performance of these methods is evaluated by benchmark experiments using last_exam17 standard datasets and a face classification experiment.  相似文献   

3.
The purpose of this paper is to examine the multiple group (>2) discrimination problem in which the group sizes are unequal and the variables used in the classification are correlated with skewed distributions. Using statistical simulation based on data from a clinical study, we compare the performances, in terms of misclassification rates, of nine statistical discrimination methods. These methods are linear and quadratic discriminant analysis applied to untransformed data, rank transformed data, and inverse normal scores data, as well as fixed kernel discriminant analysis, variable kernel discriminant analysis, and variable kernel discriminant analysis applied to inverse normal scores data. It is found that the parametric methods with transformed data generally outperform the other methods, and the parametric methods applied to inverse normal scores usually outperform the parametric methods applied to rank transformed data. Although the kernel methods often have very biased estimates, the variable kernel method applied to inverse normal scores data provides considerable improvement in terms of total nonerror rate.  相似文献   

4.
ABSTRACT

Classification rules with a reserve judgment option provide a way to satisfy constraints on the misclassification probabilities when there is a high degree of overlap among the populations. Constructing rules which maximize the probability of correct classification while satisfying such constraints is a difficult optimization problem. This paper uses the form of the optimal solution to develop a relatively simple and computationally fast method for three populations which has a non parametric quality in controlling the misclassification probabilities. Simulations demonstrate that this procedure performs well.  相似文献   

5.
Abstract

Semi-functional linear regression models are important in practice. In this paper, their estimation is discussed when function-valued and real-valued random variables are all measured with additive error. By means of functional principal component analysis and kernel smoothing techniques, the estimators of the slope function and the non parametric component are obtained. To account for errors in variables, deconvolution is involved in the construction of a new class of kernel estimators. The convergence rates of the estimators of the unknown slope function and non parametric component are established under suitable norm and conditions. Simulation studies are conducted to illustrate the finite sample performance of our method.  相似文献   

6.
The problem of two-group classification has implications in a number of fields, such as medicine, finance, and economics. This study aims to compare the methods of two-group classification. The minimum sum of deviations and linear programming model, linear discriminant analysis, quadratic discriminant analysis and logistic regression, multivariate analysis of variance (MANOVA) test-based classification and the unpooled T-square test-based classification methods, support vector machines and k-nearest neighbor methods, and combined classification method will be compared for data structures having fat-tail and/or skewness. The comparison has been carried out by using a simulation procedure designed for various stable distribution structures and sample sizes.  相似文献   

7.
Abstract

In this article, we consider a panel data partially linear regression model with fixed effect and non parametric time trend function. The data can be dependent cross individuals through linear regressor and error components. Unlike the methods using non parametric smoothing technique, a difference-based method is proposed to estimate linear regression coefficients of the model to avoid bandwidth selection. Here the difference technique is employed to eliminate the non parametric function effect, not the fixed effects, on linear regressor coefficient estimation totally. Therefore, a more efficient estimator for parametric part is anticipated, which is shown to be true by the simulation results. For the non parametric component, the polynomial spline technique is implemented. The asymptotic properties of estimators for parametric and non parametric parts are presented. We also show how to select informative ones from a number of covariates in the linear part by using smoothly clipped absolute deviation-penalized estimators on a difference-based least-squares objective function, and the resulting estimators perform asymptotically as well as the oracle procedure in terms of selecting the correct model.  相似文献   

8.
This article introduces BestClass, a set of SAS macros, available in the mainframe and workstation environment, designed for solving two-group classification problems using a class of recently developed nonparametric classification methods. The criteria used to estimate the classification function are based on either minimizing a function of the absolute deviations from the surface which separates the groups, or directly minimizing a function of the number of misclassified entities in the training sample. The solution techniques used by BestClass to estimate the classification rule use the mathematical programming routines of the SAS/OR software. Recently, a number of research studies have reported that under certain data conditions this class of classification methods can provide more accurate classification results than existing methods, such as Fisher's linear discriminant function and logistic regression. However, these robust classification methods have not yet been implemented in the major statistical packages, and hence are beyond the reach of those statistical analysts who are unfamiliar with mathematical programming techniques. We use a limited simulation experiment and an example to compare and contrast properties of the methods included in Best-Class with existing parametric and nonparametric methods. We believe that BestClass contributes significantly to the field of nonparametric classification analysis, in that it provides the statistical community with convenient access to this recently developed class of methods. BestClass is available from the authors.  相似文献   

9.
There are many well-known methods applied in classification problem for linear data with both known and unknown distribution. Here, we deal with classification involving data on torus and cylinder. A new method involving a generalized likelihood ratio test is developed for classifying in two populations using directional data. The approach assumes that one of the probabilities of misclassification is known. The procedure is constructed by applying Gibbs sampler on the conditionally specified distribution. A parametric bootstrap approach is also presented. An application to data involving linear and circular measurements on human skull from two tribal populations is given.  相似文献   

10.
We propose a hybrid two-group classification method that integrates linear discriminant analysis, a polynomial expansion of the basis (or variable space), and a genetic algorithm with multiple crossover operations to select variables from the expanded basis. Using new product launch data from the biochemical industry, we found that the proposed algorithm offers mean percentage decreases in the misclassification error rate of 50%, 56%, 59%, 77%, and 78% in comparison to a support vector machine, artificial neural network, quadratic discriminant analysis, linear discriminant analysis, and logistic regression, respectively. These improvements correspond to annual cost savings of $4.40–$25.73 million.  相似文献   

11.
A classifier is constant if it classifies all examples into just one class. Call a training data set “(linearly) indiscriminate” if a constant classifier minimizes, among all linear classifiers, the misclassification rate on the training data set. General sufficient conditions are presented for the probability of getting an indiscriminate data set to be positive. Similarly, general sufficient conditions are also presented for the probability of getting an indiscriminate data set to be 0.

A small simulation study examines how our results are reflected in the behavior of logistic regression.  相似文献   

12.
The method of target estimation developed by Cabrera and Fernholz [(1999). Target estimation for bias and mean square error reduction. The Annals of Statistics, 27(3), 1080–1104.] to reduce bias and variance is applied to logistic regression models of several parameters. The expectation functions of the maximum likelihood estimators for the coefficients in the logistic regression models of one and two parameters are analyzed and simulations are given to show a reduction in both bias and variability after targeting the maximum likelihood estimators. In addition to bias and variance reduction, it is found that targeting can also correct the skewness of the original statistic. An example based on real data is given to show the advantage of using target estimators for obtaining better confidence intervals of the corresponding parameters. The notion of the target median is also presented with some applications to the logistic models.  相似文献   

13.
Abstract

We propose to compare population means and variances under a semiparametric density ratio model. The proposed method is easy to implement by employing logistic regression procedures in many statistical software, and it often works very well when data are not normal. In this paper, we construct semiparametric estimators of the differences of two population means and variances, and derive their asymptotic distributions. We prove that the proposed semiparametric estimators are asymptotically more efficient than the corresponding non parametric ones. In addition, a simulation study and the analysis of two real data sets are presented. Finally, a short discussion is provided.  相似文献   

14.
ABSTRACT

As a compromise between parametric regression and non-parametric regression models, partially linear models are frequently used in statistical modelling. This paper is concerned with the estimation of partially linear regression model in the presence of multicollinearity. Based on the profile least-squares approach, we propose a novel principal components regression (PCR) estimator for the parametric component. When some additional linear restrictions on the parametric component are available, we construct a corresponding restricted PCR estimator. Some simulations are conducted to examine the performance of our proposed estimators and the results are satisfactory. Finally, a real data example is analysed.  相似文献   

15.
Measurement error and misclassification arise commonly in various data collection processes. It is well-known that ignoring these features in the data analysis usually leads to biased inference. With the generalized linear model setting, Yi et al. [Functional and structural methods with mixed measurement error and misclassification in covariates. J Am Stat Assoc. 2015;110:681–696] developed inference methods to adjust for the effects of measurement error in continuous covariates and misclassification in discrete covariates simultaneously for the scenario where validation data are available. The augmented simulation-extrapolation (SIMEX) approach they developed generalizes the usual SIMEX method which is only applicable to handle continuous error-prone covariates. To implement this method, we develop an R package, augSIMEX, for public use. Simulation studies are conducted to illustrate the use of the algorithm. This package is available at CRAN.  相似文献   

16.
In this paper, we propose a new semiparametric heteroscedastic regression model allowing for positive and negative skewness and bimodal shapes using the B-spline basis for nonlinear effects. The proposed distribution is based on the generalized additive models for location, scale and shape framework in order to model any or all parameters of the distribution using parametric linear and/or nonparametric smooth functions of explanatory variables. We motivate the new model by means of Monte Carlo simulations, thus ignoring the skewness and bimodality of the random errors in semiparametric regression models, which may introduce biases on the parameter estimates and/or on the estimation of the associated variability measures. An iterative estimation process and some diagnostic methods are investigated. Applications to two real data sets are presented and the method is compared to the usual regression methods.  相似文献   

17.

For comparing several logistic regression slopes to that of a control for small sample sizes, Dasgupta et al. (2001) proposed an "asymptotic" small-sample test and a "pivoted" version of that test statistic. Their results show both methods perform well in terms of Type I error control and marginal power when the response is related to the explanatory variable via a logistic regression model. This study finds, via Monte Carlo simulations, that when the underlying relationship is probit, complementary log-log, linear, or even non-monotonic, the "asymptotic" and the "pivoted" small-sample methods perform fairly well in terms of Type I error control and marginal power. Unlike their large sample competitors, they are generally robust to departures from the logistic regression model.  相似文献   

18.
ABSTRACT

Motivated by a longitudinal oral health study, the Signal-Tandmobiel® study, a Bayesian approach has been developed to model misclassified ordinal response data. Two regression models have been considered to incorporate misclassification in the categorical response. Specifically, probit and logit models have been developed. The computational difficulties have been avoided by using data augmentation. This idea is exploited to derive efficient Markov chain Monte Carlo methods. Although the method is proposed for ordered categories, it can also be implemented for unordered ones in a simple way. The model performance is shown through a simulation-based example and the analysis of the motivating study.  相似文献   

19.
This paper develops a method for handling two-class classification problems with highly unbalanced class sizes and misclassification costs. When the class sizes are highly unbalanced and the minority class represents a rare event, conventional classification methods tend to strongly favour the majority class, resulting in very low detection of the minority class. A method is proposed to determine the optimal cut-off for asymmetric misclassification costs and for unbalanced class sizes. Monte Carlo simulations show that this proposal performs better than the method based on the notion of classification accuracy. Finally, the proposed method is applied to empirical data on Italian small and medium enterprises to classify them into default and non-default groups.  相似文献   

20.
Model selection methods are important to identify the best approximating model. To identify the best meaningful model, purpose of the model should be clearly pre-stated. The focus of this paper is model selection when the modelling purpose is classification. We propose a new model selection approach designed for logistic regression model selection where main modelling purpose is classification. The method is based on the distance between the two clustering trees. We also question and evaluate the performances of conventional model selection methods based on information theory concepts in determining best logistic regression classifier. An extensive simulation study is used to assess the finite sample performances of the cluster tree based and the information theoretic model selection methods. Simulations are adjusted for whether the true model is in the candidate set or not. Results show that the new approach is highly promising. Finally, they are applied to a real data set to select a binary model as a means of classifying the subjects with respect to their risk of breast cancer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号