首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Several mathematical programming approaches to the classification problem in discriminant analysis have recently been introduced. This paper empirically compares these newly introduced classification techniques with Fisher's linear discriminant analysis (FLDA), quadratic discriminant analysis (QDA), logit analysis, and several rank-based procedures for a variety of symmetric and skewed distributions. The percent of correctly classified observations by each procedure in a holdout sample indicate that while under some experimental conditions the linear programming approaches compete well with the classical procedures, overall, however, their performance lags behind that of the classical procedures.  相似文献   

2.
We introduce a technique for extending the classical method of linear discriminant analysis (LDA) to data sets where the predictor variables are curves or functions. This procedure, which we call functional linear discriminant analysis ( FLDA ), is particularly useful when only fragments of the curves are observed. All the techniques associated with LDA can be extended for use with FLDA. In particular FLDA can be used to produce classifications on new (test) curves, give an estimate of the discriminant function between classes and provide a one- or two-dimensional pictorial representation of a set of curves. We also extend this procedure to provide generalizations of quadratic and regularized discriminant analysis.  相似文献   

3.
The “bootstrap” approach of Efron is considered in its application to the estimation of error rates in discriminant analysis. Its efficiency relative to parametric estimation is investigated by simulation for Fisher's linear discriminant function in the context of two multivariate normal populations with a common covariance matrix.  相似文献   

4.
Linear discriminant analysis and quadratic discriminant analysis are used to predict group membership. Rare populations present situations in which group sizes differ drastically. This article examined k = 2 and k = 4 predictor variables for groups with different levels of rarity and different levels of sensitivity and specificity. Sample size recommendations were generated for both minimum and maximum group overlap using the leave-one-out (L-O-O) method of estimation. Minimum sample size recommendations are provided in tables for immediate implementation by applied researchers.  相似文献   

5.
For stepwise regression and discriminant analysis the parameters F in and F out govern the inclusion and deletion of variables. The candidate variable with the biggest F—ratio is included if this exceeds F inthe included variable with the smallest F—ratio is deleted if this is less than F in If F inF out; then return to a previous subset size implies improvement in the criterion measure. This result also holds for a generalization, stepwise multivariate analysis, which includes stepwise regression and discriminant analysis as special cases

Eliminations do not occur if forward regression and backward elimination yield the same sequence of subsets. Conversely, there is a more liberal stepping rule which always eliminates if the two sequences differ.  相似文献   

6.
In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case‐control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t ‐statistic for two‐group classification. Biometrics 2015, 71: 404–416) proposed the generalized t ‐statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods.  相似文献   

7.
Magnetic resonance imaging techniques can be used to measure some biophysical properties of tissue. In this context, the T2 relaxation time is an important parameter for soft‐tissue contrast. The authors develop a new technique to estimate the integral of the distribution of T2 relaxation time without imposing any constraint other than the monotonicity of the underlying cumulative relaxation time distribution. They explore the properties of the estimation and its applications for the analysis of breast tissue data. As they show, an extension of linear discriminant analysis is found to distinguish well between two classes of breast tissue.  相似文献   

8.
A method is devised for performing multiple discriminant analysis subject to inequality constraints on the probabilities of misassignment of different subpopulations. This procedure is motivated by attempts to devise.fair means of selection of applicants for schools, jobs, and credit. An algorithm is developed and sample calculations are given.  相似文献   

9.
This article demonstrates the application of classification trees (decision trees), logistic regression (LR), and linear discriminant function (LDR) to classify data of water quality (i.e., whether the water is fit for drinking on not fit for drinking). The data on water quality were obtained from Pakistan Council of Research in Water Resources (PCRWR) for two cities of Pakistan—one representing industrial environment (Sialkot) and the other one representing non-industrial environment (Narowal). To classify data on water quality, three statistical tools were employed—the Decision Tree methodology using Gini Index, LR, and LDA—using R software library. The results obtained by the said three techniques were compared using misclassification rates (a model with minimum value of misclassification rate is better). It was witnessed that LR performed well than the other two techniques while the Decision trees and LDA performed equally well. But for illustration purposes decision trees technique is comparatively easy to draw and interpret.  相似文献   

10.
A novel framework is proposed for the estimation of multiple sinusoids from irregularly sampled time series. This spectral analysis problem is addressed as an under-determined inverse problem, where the spectrum is discretized on an arbitrarily thin frequency grid. As we focus on line spectra estimation, the solution must be sparse, i.e. the amplitude of the spectrum must be zero almost everywhere. Such prior information is taken into account within the Bayesian framework. Two models are used to account for the prior sparseness of the solution, namely a Laplace prior and a Bernoulli–Gaussian prior, associated to optimization and stochastic sampling algorithms, respectively. Such approaches are efficient alternatives to usual sequential prewhitening methods, especially in case of strong sampling aliases perturbating the Fourier spectrum. Both methods should be intensively tested on real data sets by physicists.  相似文献   

11.
In this paper, we propose an asymptotic approximation for the expected probabilities of misclassification (EPMC) in the linear discriminant function on the basis of k-step monotone missing training data for general k. We derive certain relations of the statistics in order to obtain the approximation. Finally, we perform Monte Carlo simulation to evaluate the accuracy of our result and to compare it with existing approximations.  相似文献   

12.
This study investigates the use of stratification to improve discrimination when prior probabilities vary across strata of a population of interest. Sources of heterogeneity in prior probabilities include differences in geographic locale, age differences in the population studied, or differences in the time component of the data collected. The article suggests using logistic regression both to identify the underlying stratification and to estimate prior probabilities. A simulation study compares misclassification rates under two alternative stratification schemes with the traditional discriminant approach that ignores stratification in favor of pooled prior estimates. The simulations show that large asymptotic gains can be realized by stratification, and that these gains can be realized in finite samples, given moderate differences in prior probabilities.  相似文献   

13.
Peanut allergy is one of the most prevalent food allergies. The possibility of a lethal accidental exposure and the persistence of the disease make it a public health problem. Evaluating the intensity of symptoms is accomplished with a double blind placebo-controlled food challenge (DBPCFC), which scores the severity of reactions and measures the dose of peanut that elicits the first reaction. Since DBPCFC can result in life-threatening responses, we propose an alternate procedure with the long-term goal of replacing invasive allergy tests. Discriminant analyses of DBPCFC score, the eliciting dose and the first accidental exposure score were performed in 76 allergic patients using 6 immunoassays and 28 skin prick tests. A multiple factorial analysis was performed to assign equal weights to both groups of variables, and predictive models were built by cross-validation with linear discriminant analysis, k-nearest neighbours, classification and regression trees, penalized support vector machine, stepwise logistic regression and AdaBoost methods. We developed an algorithm for simultaneously clustering eliciting dose values and selecting discriminant variables. Our main conclusion is that antibody measurements offer information on the allergy severity, especially those directed against rAra-h1 and rAra-h3. Further independent validation of these results and the use of new predictors will help extend this study to clinical practices.  相似文献   

14.
The quadratic discriminant function (QDF) with known parameters has been represented in terms of a weighted sum of independent noncentral chi-square variables. To approximate the density function of the QDF as m-dimensional exponential family, its moments in each order have been calculated. This is done using the recursive formula for the moments via the Stein's identity in the exponential family. We validate the performance of our method using simulation study and compare with other methods in the literature based on the real data. The finding results reveal better estimation of misclassification probabilities, and less computation time with our method.  相似文献   

15.
In this article, a sequential correction of two linear methods: linear discriminant analysis (LDA) and perceptron is proposed. This correction relies on sequential joining of additional features on which the classifier is trained. These new features are posterior probabilities determined by a basic classification method such as LDA and perceptron. In each step, we add the probabilities obtained on a slightly different data set, because the vector of added probabilities varies at each step. We therefore have many classifiers of the same type trained on slightly different data sets. Four different sequential correction methods are presented based on different combining schemas (e.g. mean rule and product rule). Experimental results on different data sets demonstrate that the improvements are efficient, and that this approach outperforms classical linear methods, providing a significant reduction in the mean classification error rate.  相似文献   

16.
We study the design problem for the optimal classification of functional data. The goal is to select sampling time points so that functional data observed at these time points can be classified accurately. We propose optimal designs that are applicable to either dense or sparse functional data. Using linear discriminant analysis, we formulate our design objectives as explicit functions of the sampling points. We study the theoretical properties of the proposed design objectives and provide a practical implementation. The performance of the proposed design is evaluated through simulations and real data applications. The Canadian Journal of Statistics 48: 285–307; 2020 © 2019 Statistical Society of Canada  相似文献   

17.
The distribution of the probabilities of misclassification is derived in this paper, which are reproduced by the use of the linear discriminant function. The statistical background is two independent doubly truncated t populations with distinct location parameters and common scale parameter and degrees of freedom. The behavior of the linear discriminant function is studied by comparing the distribution function of the errors of misclassification under the truncated t and truncated normal models.  相似文献   

18.
We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑1=∑2, (2) multivariate normal with ∑1≠∑2 and (3) multivariate non-normal with ∑1=∑2. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement.  相似文献   

19.
In this article, we highlight some interesting facts about Bayesian variable selection methods for linear regression models in settings where the design matrix exhibits strong collinearity. We first demonstrate via real data analysis and simulation studies that summaries of the posterior distribution based on marginal and joint distributions may give conflicting results for assessing the importance of strongly correlated covariates. The natural question is which one should be used in practice. The simulation studies suggest that posterior inclusion probabilities and Bayes factors that evaluate the importance of correlated covariates jointly are more appropriate, and some priors may be more adversely affected in such a setting. To obtain a better understanding behind the phenomenon, we study some toy examples with Zellner’s g-prior. The results show that strong collinearity may lead to a multimodal posterior distribution over models, in which joint summaries are more appropriate than marginal summaries. Thus, we recommend a routine examination of the correlation matrix and calculation of the joint inclusion probabilities for correlated covariates, in addition to marginal inclusion probabilities, for assessing the importance of covariates in Bayesian variable selection.  相似文献   

20.
An important problem in statistics is the study of longitudinal data taking into account the effect of other explanatory variables such as treatments and time. In this paper, a new Bayesian approach for analysing longitudinal data is proposed. This innovative approach takes into account the possibility of having nonlinear regression structures on the mean and linear regression structures on the variance–covariance matrix of normal observations, and it is based on the modelling strategy suggested by Pourahmadi [M. Pourahmadi, Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterizations, Biometrika, 87 (1999), pp. 667–690.]. We initially extend the classical methodology to accommodate the fitting of nonlinear mean models then we propose our Bayesian approach based on a generalization of the Metropolis–Hastings algorithm of Cepeda [E.C. Cepeda, Variability modeling in generalized linear models, Unpublished Ph.D. Thesis, Mathematics Institute, Universidade Federal do Rio de Janeiro, 2001]. Finally, we illustrate the proposed methodology by analysing one example, the cattle data set, that is used to study cattle growth.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号