期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Classification and similarity analysis of fundamental frequency patterns in infant spoken language acquisition

Hiroko Kato Solvang Masanobu Taniguchi Tomohiro Nakatani Shigeaki Amano 《Statistical Methodology》2008,5(3):187-208

Fundamental frequency (F0) patterns, which indicate the vibration frequency of vocal cords, reflect the developmental changes in infant spoken language. In previous studies of developmental psychology, however, F0 patterns were manually classified into subjectively specified categories. Furthermore, since F0 has sequential missing and indicates a mean nonstationarity, classification that employs subsequent partition and conventional discriminant analysis based on stationary and local stationary processes is considered inadequate. Consequently, we propose a classification method based on discriminant analysis of time series data with mean nonstationarity and sequential missing, and a measurement technique for investigating the configuration similarities for classification. Using our proposed procedures, we analyse a longitudinal database of recorded conversations between infants and parents over a five-year period. Various F0 patterns were automatically classified into appropriate pattern groups, and the classification similarities calculated. These similarities gradually decreased with infant’s monthly age until a large change occurred around 20 months. The results suggest that our proposed methods are useful for analysing large-scale data and can contribute to studies of infant spoken language acquisition. 相似文献

2.

Sparse discriminant analysis based on estimation of posterior probabilities

Akinori Hidaka Kenji Watanabe Takio Kurita 《Journal of applied statistics》2019,46(15):2761-2785

ABSTRACT

Fisher's linear discriminant analysis (FLDA) is known as a method to find a discriminative feature space for multi-class classification. As a theory of extending FLDA to an ultimate nonlinear form, optimal nonlinear discriminant analysis (ONDA) has been proposed. ONDA indicates that the best theoretical nonlinear map for maximizing the Fisher's discriminant criterion is formulated by using the Bayesian a posterior probabilities. In addition, the theory proves that FLDA is equivalent to ONDA when the Bayesian a posterior probabilities are approximated by linear regression (LR). Due to some limitations of the linear model, there is room to modify FLDA by using stronger approximation/estimation methods. For the purpose of probability estimation, multi-nominal logistic regression (MLR) is more suitable than LR. Along this line, in this paper, we develop a nonlinear discriminant analysis (NDA) in which the posterior probabilities in ONDA are estimated by MLR. In addition, in this paper, we develop a way to introduce sparseness into discriminant analysis. By applying L1 or L2 regularization to LR or MLR, we can incorporate sparseness in FLDA and our NDA to increase generalization performance. The performance of these methods is evaluated by benchmark experiments using last_exam17 standard datasets and a face classification experiment. 相似文献

3.

A comparison of the classical and the linear programming approaches to the classification problem in discriminant analysis

《Journal of Statistical Computation and Simulation》2012,82(1-2):73-93

Several mathematical programming approaches to the classification problem in discriminant analysis have recently been introduced. This paper empirically compares these newly introduced classification techniques with Fisher's linear discriminant analysis (FLDA), quadratic discriminant analysis (QDA), logit analysis, and several rank-based procedures for a variety of symmetric and skewed distributions. The percent of correctly classified observations by each procedure in a holdout sample indicate that while under some experimental conditions the linear programming approaches compete well with the classical procedures, overall, however, their performance lags behind that of the classical procedures. 相似文献

4.

Nenparametric Two-Group Classification: Concepts and a SAS-Based Software Package

A. Pedro Duarte Silva Antonie Stam 《The American statistician》2013,67(2):185-197

This article introduces BestClass, a set of SAS macros, available in the mainframe and workstation environment, designed for solving two-group classification problems using a class of recently developed nonparametric classification methods. The criteria used to estimate the classification function are based on either minimizing a function of the absolute deviations from the surface which separates the groups, or directly minimizing a function of the number of misclassified entities in the training sample. The solution techniques used by BestClass to estimate the classification rule use the mathematical programming routines of the SAS/OR software. Recently, a number of research studies have reported that under certain data conditions this class of classification methods can provide more accurate classification results than existing methods, such as Fisher's linear discriminant function and logistic regression. However, these robust classification methods have not yet been implemented in the major statistical packages, and hence are beyond the reach of those statistical analysts who are unfamiliar with mathematical programming techniques. We use a limited simulation experiment and an example to compare and contrast properties of the methods included in Best-Class with existing parametric and nonparametric methods. We believe that BestClass contributes significantly to the field of nonparametric classification analysis, in that it provides the statistical community with convenient access to this recently developed class of methods. BestClass is available from the authors. 相似文献

5.

Variational discriminant analysis with variable selection

Weichang Yu John T. Ormerod Michael Stewart 《Statistics and Computing》2020,30(4):933-951

A fast Bayesian method that seamlessly fuses classification and hypothesis testing via discriminant analysis is developed. Building upon the original discriminant analysis classifier, modelling components are added to identify discriminative variables. A combination of cake priors and a novel form of variational Bayes we call reverse collapsed variational Bayes gives rise to variable selection that can be directly posed as a multiple hypothesis testing approach using likelihood ratio statistics. Some theoretical arguments are presented showing that Chernoff-consistency (asymptotically zero type I and type II error) is maintained across all hypotheses. We apply our method on some publicly available genomics datasets and show that our method performs well in practice for its computational cost. An R package VaDA has also been made available on Github. 相似文献

6.

A use of mixtures of two normal distributions in a classification problem

《Journal of Statistical Computation and Simulation》2012,82(3-4):281-294

An assumption made in the classification problem is that the distribution of the data being classified has the same parameters as the data used to obtain the discriminant functions. A method based on mixtures of two normal distributions is proposed as method of checking this assumption and modifying the discriminant functions accordingly. As a first step, the case considered in this paper, is that of a shift in the mean of one or two univariate normal distributions with all other parameters remaining fixed and known. Calculations based on the asymptotic the proposed method works well even for small shifts. 相似文献

7.

A Comparison of Two Group Classification Approaches to Fat-tailed and Skewed Data

Filiz Kardiyen Hülya Olmuş 《统计学通讯:模拟与计算》2016,45(1):17-32

The problem of two-group classification has implications in a number of fields, such as medicine, finance, and economics. This study aims to compare the methods of two-group classification. The minimum sum of deviations and linear programming model, linear discriminant analysis, quadratic discriminant analysis and logistic regression, multivariate analysis of variance (MANOVA) test-based classification and the unpooled T-square test-based classification methods, support vector machines and k-nearest neighbor methods, and combined classification method will be compared for data structures having fat-tail and/or skewness. The comparison has been carried out by using a simulation procedure designed for various stable distribution structures and sample sizes. 相似文献

8.

Cluster designs to assess the prevalence of acute malnutrition by lot quality assurance sampling: a validation study by computer simulation

Casey Olives Marcello Pagano Megan Deitchler Bethany L. Hedt Kari Egge Joseph J. Valadez 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(2):495-510

Summary. Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67×3 (67 clusters of three observations) and a 33×6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67×3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis. 相似文献

9.

判别分析统计检验体系的探讨

傅德印王俊《统计与信息论坛》2008,23(5):9-14,18

判别分析已越来越受到人们的重视并取得了重要的应用成果，但应用中存在着简单套用的情况，对判别分析的适用性、判别效果的显著性、判别变量的判别能力以及判别函数的判别能力的检验等问题重视不够。为了更好地应用判别分析，就应对判别分析进行统计检验并建立统计检验体系，统计检验体系应包括：判别分析适用性检验，判别效果显著性检验，判别变量的判别能力检验和判别函数的判别能力检验。相似文献

10.

Creating the UK National Statistics 2001 output area classification 总被引：3，自引：0，他引：3

Dan Vickers Phil Rees 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(2):379-403

相似文献

11.

A study on discriminant analysis techniques applied to multivariate lognormal data

《Journal of Statistical Computation and Simulation》2012,82(1-2):79-100

The purpose of this paper is to examine the multiple group (>2) discrimination problem in which the group sizes are unequal and the variables used in the classification are correlated with skewed distributions. Using statistical simulation based on data from a clinical study, we compare the performances, in terms of misclassification rates, of nine statistical discrimination methods. These methods are linear and quadratic discriminant analysis applied to untransformed data, rank transformed data, and inverse normal scores data, as well as fixed kernel discriminant analysis, variable kernel discriminant analysis, and variable kernel discriminant analysis applied to inverse normal scores data. It is found that the parametric methods with transformed data generally outperform the other methods, and the parametric methods applied to inverse normal scores usually outperform the parametric methods applied to rank transformed data. Although the kernel methods often have very biased estimates, the variable kernel method applied to inverse normal scores data provides considerable improvement in terms of total nonerror rate. 相似文献

12.

Using unlabelled data to update classification rules with applications in food authenticity studies

Nema Dean Thomas Brendan Murphy Gerard Downey 《Journal of the Royal Statistical Society. Series C, Applied statistics》2006,55(1):1-14

Summary. An authentic food is one that is what it purports to be. Food processors and consumers need to be assured that, when they pay for a specific product or ingredient, they are receiving exactly what they pay for. Classification methods are an important tool in food authenticity studies where they are used to assign food samples of unknown type to known types. A classification method is developed where the classification rule is estimated by using both the labelled and the unlabelled data, in contrast with many classical methods which use only the labelled data for estimation. This methodology models the data as arising from a Gaussian mixture model with parsimonious covariance structure, as is done in model-based clustering. A missing data formulation of the mixture model is used and the models are fitted by using the EM and classification EM algorithms. The methods are applied to the analysis of spectra of food-stuffs recorded over the visible and near infra-red wavelength range in food authenticity studies. A comparison of the performance of model-based discriminant analysis and the method of classification proposed is given. The classification method proposed is shown to yield very good misclassification rates. The correct classification rate was observed to be as much as 15% higher than the correct classification rate for model-based discriminant analysis. 相似文献

13.

Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions

Jeffrey L. Andrews Paul D. McNicholas 《Statistics and Computing》2012,22(5):1021-1029

The last decade has seen an explosion of work on the use of mixture models for clustering. The use of the Gaussian mixture model has been common practice, with constraints sometimes imposed upon the component covariance matrices to give families of mixture models. Similar approaches have also been applied, albeit with less fecundity, to classification and discriminant analysis. In this paper, we begin with an introduction to model-based clustering and a succinct account of the state-of-the-art. We then put forth a novel family of mixture models wherein each component is modeled using a multivariate t-distribution with an eigen-decomposed covariance structure. This family, which is largely a t-analogue of the well-known MCLUST family, is known as the tEIGEN family. The efficacy of this family for clustering, classification, and discriminant analysis is illustrated with both real and simulated data. The performance of this family is compared to its Gaussian counterpart on three real data sets. 相似文献

14.

Asymptotic Optimality of Sparse Linear Discriminant Analysis with Arbitrary Number of Classes

下载免费PDF全文

Ruiyan Luo Xin Qi 《Scandinavian Journal of Statistics》2017,44(3):598-616

Many sparse linear discriminant analysis (LDA) methods have been proposed to overcome the major problems of the classic LDA in high‐dimensional settings. However, the asymptotic optimality results are limited to the case with only two classes. When there are more than two classes, the classification boundary is complicated and no explicit formulas for the classification errors exist. We consider the asymptotic optimality in the high‐dimensional settings for a large family of linear classification rules with arbitrary number of classes. Our main theorem provides easy‐to‐check criteria for the asymptotic optimality of a general classification rule in this family as dimensionality and sample size both go to infinity and the number of classes is arbitrary. We establish the corresponding convergence rates. The general theory is applied to the classic LDA and the extensions of two recently proposed sparse LDA methods to obtain the asymptotic optimality. 相似文献

15.

An Extended Projection Data Depth and Its Applications to Discrimination

Xia Cui Lu Lin Guangren Yang 《统计学通讯:理论与方法》2013,42(14):2276-2290

This article investigates the possible use of our newly defined extended projection depth (abbreviated to EPD) in nonparametric discriminant analysis. We propose a robust nonparametric classifier, which relies on the intuitively simple notion of EPD. The EPD-based classifier assigns an observation to the population with respect to which it has the maximum EPD. Asymptotic properties of misclassification rates and robust properties of EPD-based classifier are discussed. A few simulated data sets are used to compare the performance of EPD-based classifier with Fisher's linear discriminant rule, quadratic discriminant rule, and PD-based classifier. It is also found that when the underlying distributions are elliptically symmetric, EPD-based classifier is asymptotically equivalent to the optimal Bayes classifier. 相似文献

16.

A NEW APPROACH TO DISCRIMINATION AND CLASSIFICATION USING A HAUSDORFF TYPE DISTANCE

Sangit Chatterjee A. Narayanan 《Australian & New Zealand Journal of Statistics》1992,34(3):391-406

A new method of discrimination and classification based on a Hausdorff type distance is proposed. In two groups, the Hausdorff distance is defined as the sum of the furthest distance of the nearest elements of one set to another. This distance has some useful properties and is exploited in developing a discriminant criterion between individual objects belonging to two groups based on a finite number of classification variables. The discrimination criterion is generalized to more than two groups in a couple of ways. Several data sets are analysed and their classification accuracy is compared to that obtained from linear discriminant function and the results are encouraging. The method in simple, lends itself to parallel computation and imposes less stringent conditions on the data. 相似文献

17.

Discriminant analysis under parameter restrictions statistical and computational aspects

Jurcen Lauter 《Statistics》2013,47(1):125-137

In the paper, it is shown that the error rate of the discriminant analysis can be diminished when restrictions for the parameters are valid. To find suitable restrictions, at first the properties of hierarchical and other multivariate systems are investigated. Then, in a practical section, a modification of the discriminant analysis is offered that consists in decreasing the estimated partial correlations. Finally in the theoretical section, it is veri¬fied that an improvement of the discriminant analysis is attained by a suitable correctionof the positive and negative signs of the discriminant function. 相似文献

18.

Optimal design for classification of functional data

Cai Li Luo Xiao 《Revue canadienne de statistique》2020,48(2):285-307

We study the design problem for the optimal classification of functional data. The goal is to select sampling time points so that functional data observed at these time points can be classified accurately. We propose optimal designs that are applicable to either dense or sparse functional data. Using linear discriminant analysis, we formulate our design objectives as explicit functions of the sampling points. We study the theoretical properties of the proposed design objectives and provide a practical implementation. The performance of the proposed design is evaluated through simulations and real data applications. The Canadian Journal of Statistics 48: 285–307; 2020 © 2019 Statistical Society of Canada 相似文献

19.

Robust linear discriminant analysis using S‐estimators

Christophe Croux Catherine Dehon 《Revue canadienne de statistique》2001,29(3):473-493

The authors consider a robust linear discriminant function based on high breakdown location and covariance matrix estimators. They derive influence functions for the estimators of the parameters of the discriminant function and for the associated classification error. The most B‐robust estimator is determined within the class of multivariate S‐estimators. This estimator, which minimizes the maximal influence that an outlier can have on the classification error, is also the most B‐robust location S‐estimator. A comparison of the most B‐robust estimator with the more familiar biweight S‐estimator is made. 相似文献

20.

Identification of Influential Cases in Kernel Fisher Discriminant Analysis

Nelmarie Louw Morne M. C. Lamont 《统计学通讯:模拟与计算》2013,42(10):2050-2062

We study the influence of a single data case on the results of a statistical analysis. This problem has been addressed in several articles for linear discriminant analysis (LDA). Kernel Fisher discriminant analysis (KFDA) is a kernel based extension of LDA. In this article, we study the effect of atypical data points on KFDA and develop criteria for identification of cases having a detrimental effect on the classification performance of the KFDA classifier. We find that the criteria are successful in identifying cases whose omission from the training data prior to obtaining the KFDA classifier results in reduced error rates. 相似文献