首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

The majority of the existing literature on model-based clustering deals with symmetric components. In some cases, especially when dealing with skewed subpopulations, the estimate of the number of groups can be misleading; if symmetric components are assumed we need more than one component to describe an asymmetric group. Existing mixture models, based on multivariate normal distributions and multivariate t distributions, try to fit symmetric distributions, i.e. they fit symmetric clusters. In the present paper, we propose the use of finite mixtures of the normal inverse Gaussian distribution (and its multivariate extensions). Such finite mixture models start from a density that allows for skewness and fat tails, generalize the existing models, are tractable and have desirable properties. We examine both the univariate case, to gain insight, and the multivariate case, which is more useful in real applications. EM type algorithms are described for fitting the models. Real data examples are used to demonstrate the potential of the new model in comparison with existing ones.  相似文献   

A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates. The proposed methodology is illustrated by analyzing a real data example.  相似文献   

Lin  Tsung I.  Lee  Jack C.  Ni  Huey F. 《Statistics and Computing》2004,14(2):119-130
A finite mixture model using the multivariate t distribution has been shown as a robust extension of normal mixtures. In this paper, we present a Bayesian approach for inference about parameters of t-mixture models. The specifications of prior distributions are weakly informative to avoid causing nonintegrable posterior distributions. We present two efficient EM-type algorithms for computing the joint posterior mode with the observed data and an incomplete future vector as the sample. Markov chain Monte Carlo sampling schemes are also developed to obtain the target posterior distribution of parameters. The advantages of Bayesian approach over the maximum likelihood method are demonstrated via a set of real data.  相似文献   

Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

Mixtures of multivariate t distributions provide a robust parametric extension to the fitting of data with respect to normal mixtures. In presence of some noise component, potential outliers or data with longer-than-normal tails, one way to broaden the model can be provided by considering t distributions. In this framework, the degrees of freedom can act as a robustness parameter, tuning the heaviness of the tails, and downweighting the effect of the outliers on the parameters estimation. The aim of this paper is to extend to mixtures of multivariate elliptical distributions some theoretical results about the likelihood maximization on constrained parameter spaces. Further, a constrained monotone algorithm implementing maximum likelihood mixture decomposition of multivariate t distributions is proposed, to achieve improved convergence capabilities and robustness. Monte Carlo numerical simulations and a real data study illustrate the better performance of the algorithm, comparing it to earlier proposals.  相似文献   

We present a new class of models to fit longitudinal data, obtained with a suitable modification of the classical linear mixed-effects model. For each sample unit, the joint distribution of the random effect and the random error is a finite mixture of scale mixtures of multivariate skew-normal distributions. This extension allows us to model the data in a more flexible way, taking into account skewness, multimodality and discrepant observations at the same time. The scale mixtures of skew-normal form an attractive class of asymmetric heavy-tailed distributions that includes the skew-normal, skew-Student-t, skew-slash and the skew-contaminated normal distributions as special cases, being a flexible alternative to the use of the corresponding symmetric distributions in this type of models. A simple efficient MCMC Gibbs-type algorithm for posterior Bayesian inference is employed. In order to illustrate the usefulness of the proposed methodology, two artificial and two real data sets are analyzed.  相似文献   

Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach.  相似文献   

In this paper we consider the Capital Asset Pricing Model under Elliptical (symmetric) Distributions. This class of distributions, which contains the normal distribution, t, contaminated normal and power exponential, among others, offers a more flexible framework for modelling asset prices or returns. In order to analyze the sensibility to possible outliers and/or atypical returns of the maximum likelihood estimators, the local influence method was implemented. The results are illustrated by using a set of shares from companies who trade in the Chilean Stock Market. Our main conclusion is that symmetric distributions having heavier tails than those of the normal distribution, especially the t distribution with small degrees of freedom, show a better fit and allow the reduction of the influence of atypical returns in the maximum likelihood estimators.  相似文献   

We consider here a generalization of the skew-normal distribution, GSN(λ1,λ2,ρ), defined through a standard bivariate normal distribution with correlation ρ, which is a special case of the unified multivariate skew-normal distribution studied recently by Arellano-Valle and Azzalini [2006. On the unification of families of skew-normal distributions. Scand. J. Statist. 33, 561–574]. We then present some simple and useful properties of this distribution and also derive its moment generating function in an explicit form. Next, we show that distributions of order statistics from the trivariate normal distribution are mixtures of these generalized skew-normal distributions; thence, using the established properties of the generalized skew-normal distribution, we derive the moment generating functions of order statistics, and also present expressions for means and variances of these order statistics.Next, we introduce a generalized skew-tν distribution, which is a special case of the unified multivariate skew-elliptical distribution presented by Arellano-Valle and Azzalini [2006. On the unification of families of skew-normal distributions. Scand. J. Statist. 33, 561–574] and is in fact a three-parameter generalization of Azzalini and Capitanio's [2003. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. J. Roy. Statist. Soc. Ser. B 65, 367–389] univariate skew-tν form. We then use the relationship between the generalized skew-normal and skew-tν distributions to discuss some properties of generalized skew-tν as well as distributions of order statistics from bivariate and trivariate tν distributions. We show that these distributions of order statistics are indeed mixtures of generalized skew-tν distributions, and then use this property to derive explicit expressions for means and variances of these order statistics.  相似文献   

This article proposes a class of multivariate bilateral selection t distributions useful for analyzing non-normal (skewed and/or bimodal) multivariate data. The class is associated with a bilateral selection mechanism, and it is obtained from a marginal distribution of the centrally truncated multivariate t. It is flexible enough to include the multivariate t and multivariate skew-t distributions and mathematically tractable enough to account for central truncation of a hidden t variable. The class, closed under linear transformation, marginal, and conditional operations, is studied from several aspects such as shape of the probability density function, conditioning of a distribution, scale mixtures of multivariate normal, and a probabilistic representation. The relationships among these aspects are given, and various properties of the class are also discussed. Necessary theories and two applications are provided.  相似文献   

Skew-normal/independent distributions are a class of asymmetric thick-tailed distributions that include the skew-normal distribution as a special case. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis in multivariate measurement errors models. We propose the use of skew-normal/independent distributions to model the unobserved value of the covariates (latent variable) and symmetric normal/independent distributions for the random errors term, providing an appealing robust alternative to the usual symmetric process in multivariate measurement errors models. Among the distributions that belong to this class of distributions, we examine univariate and multivariate versions of the skew-normal, skew-t, skew-slash and skew-contaminated normal distributions. The results and methods are applied to a real data set.  相似文献   

The multivariate t linear mixed model (MtLMM) has been recently proposed as a robust tool for analysing multivariate longitudinal data with atypical observations. Missing outcomes frequently occur in longitudinal research even in well controlled situations. As a powerful alternative to the traditional expectation maximization based algorithm employing single imputation, we consider a Bayesian analysis of the MtLMM to account for the uncertainties of model parameters and missing outcomes through multiple imputation. An inverse Bayes formulas sampler coupled with Metropolis-within-Gibbs scheme is used to effectively draw the posterior distributions of latent data and model parameters. The techniques for multiple imputation of missing values, estimation of random effects, prediction of future responses, and diagnostics of potential outliers are investigated as well. The proposed methodology is illustrated through a simulation study and an application to AIDS/HIV data.  相似文献   

This paper considers multiple regression model with multivariate spherically symmetric errors to determine optimal β-expectation tolerance regions for the future regression vector (FRV) and future residual sum of squares (FRSS) by using the prediction distributions of some appropriate functions of future responses. The prediction distribution of the FRV, conditional on the observed responses, is multivariate Student-t distribution. Similarly, the prediction distribution of the FRSS is a beta distribution. The optimal β-expectation tolerance regions for the FRV and FRSS have been obtained based on the F -distribution and beta distribution, respectively. The results in this paper are applicable for multiple regression model with normal and Student-t errors.   相似文献   


The multivariate elliptically contoured distributions provide a viable framework for modeling time-series data. It includes the multivariate normal, power exponential, t, and Cauchy distributions as special cases. For multivariate elliptically contoured autoregressive models, we derive the exact likelihood equations for the model parameters. They are closely related to the Yule-Walker equations and involve simple function of the data. The maximum likelihood estimators are obtained by alternately solving two linear systems and illustrated using the simulation data.  相似文献   

This paper presents a robust mixture modeling framework using the multivariate skew t distributions, an extension of the multivariate Student’s t family with additional shape parameters to regulate skewness. The proposed model results in a very complicated likelihood. Two variants of Monte Carlo EM algorithms are developed to carry out maximum likelihood estimation of mixture parameters. In addition, we offer a general information-based method for obtaining the asymptotic covariance matrix of maximum likelihood estimates. Some practical issues including the selection of starting values as well as the stopping criterion are also discussed. The proposed methodology is applied to a subset of the Australian Institute of Sport data for illustration.  相似文献   

This paper presents a robust probabilistic mixture model based on the multivariate skew-t-normal distribution, a skew extension of the multivariate Student’s t distribution with more powerful abilities in modelling data whose distribution seriously deviates from normality. The proposed model includes mixtures of normal, t and skew-normal distributions as special cases and provides a flexible alternative to recently proposed skew t mixtures. We develop two analytically tractable EM-type algorithms for computing maximum likelihood estimates of model parameters in which the skewness parameters and degrees of freedom are asymptotically uncorrelated. Standard errors for the parameter estimates can be obtained via a general information-based method. We also present a procedure of merging mixture components to automatically identify the number of clusters by fitting piecewise linear regression to the rescaled entropy plot. The effectiveness and performance of the proposed methodology are illustrated by two real-life examples.  相似文献   

This paper aims at introducing a Bayesian robust error-in-variable regression model in which the dependent variable is censored. We extend previous works by assuming a multivariate t distribution for jointly modelling the behaviour of the errors and the latent explanatory variable. Inference is done under the Bayesian paradigm. We use a data augmentation approach and develop a Markov chain Monte Carlo algorithm to sample from the posterior distributions. We run a Monte Carlo study to evaluate the efficiency of the posterior estimators in different settings. We compare the proposed model to three other models previously discussed in the literature. As a by-product we also provide a Bayesian analysis of the t-tobit model. We fit all four models to analyse the 2001 Medical Expenditure Panel Survey data.  相似文献   

A method is proposed in this paper to assess the local influence of minor perturbations for the Sharpe model when the normal distribution is replaced by normal/independent (NI) distributions. The family of NI distributions is an attractive class of symmetric heavy-tailed densities that includes as special cases the normal, t-Student, slash, and the contaminated normal distributions. Since the returns of the market are not observable, the statistical analysis is carried out in the context of an errors-in-variables model. An influence analysis for detecting influential observations (atypical returns) is developed to investigate the sensitivity of the maximum likelihood estimators. Diagnostic measures are obtained based on the conditional expectation of the complete-data log-likelihood function. The results are illustrated by using a set of shares of companies traded in the Chilean stock market.  相似文献   

Mahalanobis square distances (MSDs) based on robust estimators improves outlier detection performance in multivariate data. However, the unbiasedness of robust estimators are not guaranteed when the sample size is small and this reduces their performance in outlier detection. In this study, we propose a framework that uses MSDs with incorporated small sample correction factor (c) and show its impact on performance when the sample size is small. This is achieved by using two prototypes, minimum covariance determinant estimator and S-estimators with bi-weight and t-biweight functions. The results from simulations show that distribution of MSDs for non-extreme observations are more likely to fit to chi-square with p degrees of freedom and MSDs of the extreme observations fit to F distribution, when c is incorporated into the model. However, without c, the distributions deviate significantly from chi-square and F observed for the case with incorporated c. These results are even more prominent for S-estimators. We present seven distinct comparison methods with robust estimators and various cut-off values and test their outlier detection performance with simulated data. We also present an application of some of these methods to the real data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号