期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Sufficiency of variables in discrete discriminant analysis

Gerhard Tutz 《Statistical Papers》1988,29(1):257-269

In discrete discriminant analysis the high dimensional estimation problem makes it necessary to restrict oneself to the most effective variables. Conditions are derived to determine whether subsets of variables yield the same optimal allocation rule as the original set. In the discrete case the conditions turn out to be sufficient but not necessary. Tests are derived in the framework of log-linear models. Based on the concept of adequate (α) discriminant sets two variable selection procedures are considered. 相似文献

2.

Classification with discrete and continuous variables via general mixed-data models

A. R. de Leon A. Soo T. Williamson 《Journal of applied statistics》2011,38(5):1021-1032

We study the problem of classifying an individual into one of several populations based on mixed nominal, continuous, and ordinal data. Specifically, we obtain a classification procedure as an extension to the so-called location linear discriminant function, by specifying a general mixed-data model for the joint distribution of the mixed discrete and continuous variables. We outline methods for estimating misclassification error rates. Results of simulations of the performance of proposed classification rules in various settings vis-à-vis a robust mixed-data discrimination method are reported as well. We give an example utilizing data on croup in children. 相似文献

3.

Probit and logistic discriminant functions

A. Albert J. A. Anderson 《统计学通讯:理论与方法》2013,42(7):641-657

Most discriminant functions refer to qualitatively district groups. Talis et al. (1975) introduced the probit discriminant function for distinguishing between two ordered groups. They showed how to estimate this function for mixture sampling and continuous predictor variables. Here an estimation system is given for the more common separate sampling which is applicable to continuous and/or discrete predictor variables. When used solely with continuous variables) this method of estimation is more robust than Tallis!

The relationship of probit and logistic discrimination is discussed. 相似文献

4.

The optimal allocation combination for the two-sided sequential screening procedure based on the individual misclassification error

《Journal of Statistical Computation and Simulation》2012,82(9):2025-2043

In this paper, the expected total costs (ETCs) of three kinds of quality cost functions for the two-sided sequential screening procedure (SQSP) based on the individual misclassification error are obtained, where the ETC is the sum of the expected cost of inspection, the expected cost of rejection and the expected cost of quality. The general formulas for all the desired probabilities and three ETCs when k screening variables are allocated into r-stages are derived. The optimal allocation combination for each ETC is determined based on the criterion of minimum ETC. Finally, we give two examples to illustrate the selection of the optimal allocation combination for the SQSP. 相似文献

5.

Variance ratio screening for ultrahigh dimensional discriminant analysis

Fengli Song Baohua Shen Guosheng Cheng 《统计学通讯:理论与方法》2018,47(24):6034-6051

This article is concerned with feature screening for the ultrahigh dimensional discriminant analysis. A variance ratio screening method is proposed and the sure screening property of this screening procedure is proved. The proposed method has some additional desirable features. First, it is model-free which does not require specific discriminant model and can be directly applied to the multi-categories situation. Second, it can effectively screen main effects and interaction effects simultaneously. Third, it is relatively inexpensive in computational cost because of the simple structure. The finite sample properties are performed through the Monte Carlo simulation studies and two real-data analyses. 相似文献

6.

Integrating linear discriminant analysis,polynomial basis expansion,and genetic search for two-group classification

Michael J. Brusco Clay M. Voorhees Roger J. Calantone Michael K. Brady Douglas Steinley 《统计学通讯:模拟与计算》2019,48(6):1623-1636

We propose a hybrid two-group classification method that integrates linear discriminant analysis, a polynomial expansion of the basis (or variable space), and a genetic algorithm with multiple crossover operations to select variables from the expanded basis. Using new product launch data from the biochemical industry, we found that the proposed algorithm offers mean percentage decreases in the misclassification error rate of 50%, 56%, 59%, 77%, and 78% in comparison to a support vector machine, artificial neural network, quadratic discriminant analysis, linear discriminant analysis, and logistic regression, respectively. These improvements correspond to annual cost savings of $4.40–$25.73 million. 相似文献

7.

Nonlinear measures of association with kernel canonical correlation analysis and applications

Su-Yun Huang Mei-Hsien Lee Chuhsing Kate Hsiao 《Journal of statistical planning and inference》2009

Measures of association between two sets of random variables have long been of interest to statisticians. The classical canonical correlation analysis (LCCA) can characterize, but also is limited to, linear association. This article introduces a nonlinear and nonparametric kernel method for association study and proposes a new independence test for two sets of variables. This nonlinear kernel canonical correlation analysis (KCCA) can also be applied to the nonlinear discriminant analysis. Implementation issues are discussed. We place the implementation of KCCA in the framework of classical LCCA via a sequence of independent systems in the kernel associated Hilbert spaces. Such a placement provides an easy way to carry out the KCCA. Numerical experiments and comparison with other nonparametric methods are presented. 相似文献

8.

The Optimal Allocation Combination for the One-Sided Sequential Screening Procedure Based on the Individual Misclassification Error

Shu-Fei Wu Ying-Po Lin 《统计学通讯:模拟与计算》2015,44(4):833-850

In this article, the expected total costs of three kinds of quality cost functions for the one-sided sequential screening procedure based on the individual misclassification error are obtained, where the expected total cost is the sum of the expected cost of inspection, the expected cost of rejection, and the expected cost of quality. The computational formulas for three kinds of expected total costs are derived when k screening variables are allocated into r stages. The optimal allocation combination is determined based on the criterion of minimum expected total cost. At last, we give one example to illustrate the selection of the optimal allocation combination for the sequential screening procedure. 相似文献

9.

Linear discriminant analysis for multiple functional data analysis

Sugnet Gardner-Lubbe 《Journal of applied statistics》2021,48(11):1917

In multivariate data analysis, Fisher linear discriminant analysis is useful to optimally separate two classes of observations by finding a linear combination of p variables. Functional data analysis deals with the analysis of continuous functions and thus can be seen as a generalisation of multivariate analysis where the dimension of the analysis space p strives to infinity. Several authors propose methods to perform discriminant analysis in this infinite dimensional space. Here, the methodology is introduced to perform discriminant analysis, not on single infinite dimensional functions, but to find a linear combination of p infinite dimensional continuous functions, providing a set of continuous canonical functions which are optimally separated in the canonical space.KEYWORDS: Functional data analysis, linear discriminant analysis, classification 相似文献

10.

Gaussian copula distributions for mixed data,with application in discrimination

《Journal of Statistical Computation and Simulation》2012,82(9):1643-1659

The construction of a joint model for mixed discrete and continuous random variables that accounts for their associations is an important statistical problem in many practical applications. In this paper, we use copulas to construct a class of joint distributions of mixed discrete and continuous random variables. In particular, we employ the Gaussian copula to generate joint distributions for mixed variables. Examples include the robit-normal and probit-normal-exponential distributions, the first for modelling the distribution of mixed binary-continuous data and the second for a mixture of continuous, binary and trichotomous variables. The new class of joint distributions is general enough to include many mixed-data models currently available. We study properties of the distributions and outline likelihood estimation; a small simulation study is used to investigate the finite-sample properties of estimates obtained by full and pairwise likelihood methods. Finally, we present an application to discriminant analysis of multiple correlated binary and continuous data from a study involving advanced breast cancer patients. 相似文献

11.

Variational discriminant analysis with variable selection

Weichang Yu John T. Ormerod Michael Stewart 《Statistics and Computing》2020,30(4):933-951

A fast Bayesian method that seamlessly fuses classification and hypothesis testing via discriminant analysis is developed. Building upon the original discriminant analysis classifier, modelling components are added to identify discriminative variables. A combination of cake priors and a novel form of variational Bayes we call reverse collapsed variational Bayes gives rise to variable selection that can be directly posed as a multiple hypothesis testing approach using likelihood ratio statistics. Some theoretical arguments are presented showing that Chernoff-consistency (asymptotically zero type I and type II error) is maintained across all hypotheses. We apply our method on some publicly available genomics datasets and show that our method performs well in practice for its computational cost. An R package VaDA has also been made available on Github. 相似文献

12.

A model-free feature screening approach based on kernel density estimation

Xiangjie Li Lei Wang 《Journal of Statistical Computation and Simulation》2017,87(12):2450-2468

In this article, a new model-free feature screening method named after probability density (mass) function distance (PDFD) correlation is presented for ultrahigh-dimensional data analysis. We improve the fused-Kolmogorov filter (F-KOL) screening procedure through probability density distribution. The proposed method is also fully nonparametric and can be applied to more general types of predictors and responses, including discrete and continuous random variables. Kernel density estimate method and numerical integration are applied to obtain the estimator we proposed. The results of simulation studies indicate that the fused-PDFD performs better than other existing screening methods, such as F-KOL filter, sure-independent screening (SIS), sure independent ranking and screening (SIRS), distance correlation sure-independent screening (DCSIS) and robust ranking correlation screening (RRCS). Finally, we demonstrate the validity of fused-PDFD by a real data example. 相似文献

13.

Chapter Notes

Frederick Mosteller 《The American statistician》2013,67(1):20-22

Tests for redundancy of variables in linear two-group discriminant analysis are well known and frequently used. We give a survey of similar tests, including the one-sample T ² as a special case, in the situation in which only the mean vector (but no covariance matrix) is available in one sample. Then we show that a relation between linear regression and discriminant functions found by Fisher (1936) can be generalized to this situation. Relating regression and discriminant analysis to a multivariate linear model sheds more light on the relationship between them. Practical and didactical advantages of the regression approach to T ² tests and discriminant analysis are outlined. 相似文献

14.

MISSING VALUES, IMPUTATION AND ERROR RATE ESTIMATORS IN DISCRIMINANT ANALYSIS

A.L. Bello 《Australian & New Zealand Journal of Statistics》1995,37(1):95-104

Error rate is a popular criterion for assessing the performance of an allocation rule in discriminant analysis. Training samples which involve missing values cause problems for those error rate estimators that require all variables to be observed at all data points. This paper explores imputation algorithms, their effects on, and problems of implementing them with, eight commonly used error rate estimators (three parametric and five non-parametric) in linear discriminant analysis. The results indicate that imputation should not be based on the way error rate estimators are calculated, and that imputed values may underestimate error rates. 相似文献

15.

Latent Variable Models for Mixed Discrete and Continuous Outcomes 总被引：1，自引：0，他引：1

Mary Dupuis Sammel Louise M. Ryan & Julie M. Legler 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1997,59(3):667-678

We propose a latent variable model for mixed discrete and continuous outcomes. The model accommodates any mixture of outcomes from an exponential family and allows for arbitrary covariate effects, as well as direct modelling of covariates on the latent variable. An EM algorithm is proposed for parameter estimation and estimates of the latent variables are produced as a by-product of the analysis. A generalized likelihood ratio test can be used to test the significance of covariates affecting the latent outcomes. This method is applied to birth defects data, where the outcomes of interest are continuous measures of size and binary indicators of minor physical anomalies. Infants who were exposed in utero to anticonvulsant medications are compared with controls. 相似文献

16.

Discrete regularized discriminant analysis

Gilles Celeux Abdallah Mkhadri 《Statistics and Computing》1992,2(3):143-151

A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed. This method is related to the regularized discriminant analysis conceived by Friedman (1989) in a Gaussian framework for continuous data. Here, we are concerned with discrete data and consider the classification problem using the multionomial distribution. DRDA has been conceived in the small-sample, high-dimensional setting. This method has a median position between multinomial discrimination, the first-order independence model and kernel discrimination. DRDA is characterized by two parameters, the values of which are calculated by minimizing a sample-based estimate of future misclassification risk by cross-validation. The first parameter is acomplexity parameter which provides class-conditional probabilities as a convex combination of those derived from the full multinomial model and the first-order independence model. The second parameter is asmoothing parameter associated with the discrete kernel of Aitchison and Aitken (1976). The optimal complexity parameter is calculated first, then, holding this parameter fixed, the optimal smoothing parameter is determined. A modified approach, in which the smoothing parameter is chosen first, is discussed. The efficiency of the method is examined with other classical methods through application to data. 相似文献

17.

A Modified One-Sided Sequential Screening Procedure Based on Individual Misclassification Error

Shu-Fei Wu Ying-Po Lin Huei-Jiuan Lin 《统计学通讯:模拟与计算》2013,42(9):1754-1778

In this article, the modified procedure is proposed by simplifying the procedure of Tsai and Wu (2002 Tsai , H. T. , Wu , S. F. (2002). Sequential screening procedure based on individual misclassification error. IIE Trans. 34:1079–1085.[Taylor & Francis Online], [Web of Science ®] , [Google Scholar]) by only weighing the screening variables once instead of weighting twice. The numerical comparison of the modified procedure with the old procedure shows that the total inspection cost of the modified procedure is very close to the old one. A theorem is derived to simplify the calculation of all desired probabilities and the expected costs when k-screening variables are allocated into r-stages. Finally, an example of investigating the cycles of failure of silver-zinc batteries is given to illustrate the modified screening procedure. 相似文献

18.

Functional linear discriminant analysis for irregularly sampled curves

Gareth M. James & Trevor J. Hastie 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2001,63(3):533-550

We introduce a technique for extending the classical method of linear discriminant analysis (LDA) to data sets where the predictor variables are curves or functions. This procedure, which we call functional linear discriminant analysis ( FLDA ), is particularly useful when only fragments of the curves are observed. All the techniques associated with LDA can be extended for use with FLDA. In particular FLDA can be used to produce classifications on new (test) curves, give an estimate of the discriminant function between classes and provide a one- or two-dimensional pictorial representation of a set of curves. We also extend this procedure to provide generalizations of quadratic and regularized discriminant analysis. 相似文献

19.

Disaggregated spatial modelling for areal unit categorical data

Tassone EC Miranda ML Gelfand AE 《Journal of the Royal Statistical Society. Series C, Applied statistics》2010,59(1):175-190

Summary. We consider joint spatial modelling of areal multivariate categorical data assuming a multiway contingency table for the variables, modelled by using a log-linear model, and connected across units by using spatial random effects. With no distinction regarding whether variables are response or explanatory, we do not limit inference to conditional probabilities, as in customary spatial logistic regression. With joint probabilities we can calculate arbitrary marginal and conditional probabilities without having to refit models to investigate different hypotheses. Flexible aggregation allows us to investigate subgroups of interest; flexible conditioning enables not only the study of outcomes given risk factors but also retrospective study of risk factors given outcomes. A benefit of joint spatial modelling is the opportunity to reveal disparities in health in a richer fashion, e.g. across space for any particular group of cells, across groups of cells at a particular location, and, hence, potential space–group interaction. We illustrate with an analysis of birth records for the state of North Carolina and compare with spatial logistic regression. 相似文献

20.

Minimum Sample Size Considerations for Two-Group Linear and Quadratic Discriminant Analysis with Rare Populations

Shannon Zavorka Jamis J. Perrett 《统计学通讯:模拟与计算》2013,42(7):1726-1739

Linear discriminant analysis and quadratic discriminant analysis are used to predict group membership. Rare populations present situations in which group sizes differ drastically. This article examined k = 2 and k = 4 predictor variables for groups with different levels of rarity and different levels of sensitivity and specificity. Sample size recommendations were generated for both minimum and maximum group overlap using the leave-one-out (L-O-O) method of estimation. Minimum sample size recommendations are provided in tables for immediate implementation by applied researchers. 相似文献