首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach.  相似文献   

2.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

3.
Nonlinear mixed-effects models are very useful to analyze repeated measures data and are used in a variety of applications. Normal distributions for random effects and residual errors are usually assumed, but such assumptions make inferences vulnerable to the presence of outliers. In this work, we introduce an extension of a normal nonlinear mixed-effects model considering a subclass of elliptical contoured distributions for both random effects and residual errors. This elliptical subclass, the scale mixtures of normal (SMN) distributions, includes heavy-tailed multivariate distributions, such as Student-t, the contaminated normal and slash, among others, and represents an interesting alternative to outliers accommodation maintaining the elegance and simplicity of the maximum likelihood theory. We propose an exact estimation procedure to obtain the maximum likelihood estimates of the fixed-effects and variance components, using a stochastic approximation of the EM algorithm. We compare the performance of the normal and the SMN models with two real data sets.  相似文献   

4.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

5.
Finite mixtures of multivariate skew t (MST) distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour. Recently, they have been exploited as an effective tool for modelling flow cytometric data. A number of algorithms for the computation of the maximum likelihood (ML) estimates for the model parameters of mixtures of MST distributions have been put forward in recent years. These implementations use various characterizations of the MST distribution, which are similar but not identical. While exact implementation of the expectation-maximization (EM) algorithm can be achieved for ‘restricted’ characterizations of the component skew t-distributions, Monte Carlo (MC) methods have been used to fit the ‘unrestricted’ models. In this paper, we review several recent fitting algorithms for finite mixtures of multivariate skew t-distributions, at the same time clarifying some of the connections between the various existing proposals. In particular, recent results have shown that the EM algorithm can be implemented exactly for faster computation of ML estimates for mixtures with unrestricted MST components. The gain in computational time is effected by noting that the semi-infinite integrals on the E-step of the EM algorithm can be put in the form of moments of the truncated multivariate non-central t-distribution, similar to the restricted case, which subsequently can be expressed in terms of the non-truncated form of the central t-distribution function for which fast algorithms are available. We present comparisons to illustrate the relative performance of the restricted and unrestricted models, and demonstrate the usefulness of the recently proposed methodology for the unrestricted MST mixture, by some applications to three real datasets.  相似文献   

6.
The main object of this article is to discuss maximum likelihood inference for the epsilon-skew-t distribution. Special cases of this distribution include the epsilon-skew-Cauchy and the epsilon-skew-normal distributions. We derive the information matrix for the maximum likelihood estimators. The approach is applied to a data set presenting significant amount of skewness and heavy tails. In the application we consider the epsilon-skew-t distribution with known and unknown degrees of freedom parameter, showing great flexibility in adjusting to skew data with heavy tails.  相似文献   

7.
In this paper, we consider the family of skew generalized t (SGT) distributions originally introduced by Theodossiou [P. Theodossiou, Financial data and the skewed generalized t distribution, Manage. Sci. Part 1 44 (12) ( 1998), pp. 1650–1661] as a skew extension of the generalized t (GT) distribution. The SGT distribution family warrants special attention, because it encompasses distributions having both heavy tails and skewness, and many of the widely used distributions such as Student's t, normal, Hansen's skew t, exponential power, and skew exponential power (SEP) distributions are included as limiting or special cases in the SGT family. We show that the SGT distribution can be obtained as the scale mixture of the SEP and generalized gamma distributions. We investigate several properties of the SGT distribution and consider the maximum likelihood estimation of the location, scale, and skewness parameters under the assumption that the shape parameters are known. We show that if the shape parameters are estimated along with the location, scale, and skewness parameters, the influence function for the maximum likelihood estimators becomes unbounded. We obtain the necessary conditions to ensure the uniqueness of the maximum likelihood estimators for the location, scale, and skewness parameters, with known shape parameters. We provide a simple iterative re-weighting algorithm to compute the maximum likelihood estimates for the location, scale, and skewness parameters and show that this simple algorithm can be identified as an EM-type algorithm. We finally present two applications of the SGT distributions in robust estimation.  相似文献   

8.
This paper presents a novel framework for maximum likelihood (ML) estimation in skew-t factor analysis (STFA) models in the presence of missing values or nonresponses. As a robust extension of the ordinary factor analysis model, the STFA model assumes a restricted version of the multivariate skew-t distribution for the latent factors and the unobservable errors to accommodate non-normal features such as asymmetry and heavy tails or outliers. An EM-type algorithm is developed to carry out ML estimation and imputation of missing values under a missing at random mechanism. The practical utility of the proposed methodology is illustrated through real and synthetic data examples.  相似文献   

9.
In this paper we consider the Capital Asset Pricing Model under Elliptical (symmetric) Distributions. This class of distributions, which contains the normal distribution, t, contaminated normal and power exponential, among others, offers a more flexible framework for modelling asset prices or returns. In order to analyze the sensibility to possible outliers and/or atypical returns of the maximum likelihood estimators, the local influence method was implemented. The results are illustrated by using a set of shares from companies who trade in the Chilean Stock Market. Our main conclusion is that symmetric distributions having heavier tails than those of the normal distribution, especially the t distribution with small degrees of freedom, show a better fit and allow the reduction of the influence of atypical returns in the maximum likelihood estimators.  相似文献   

10.
A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates. The proposed methodology is illustrated by analyzing a real data example.  相似文献   

11.
Abstract. We study the Jeffreys prior and its properties for the shape parameter of univariate skew‐t distributions with linear and nonlinear Student's t skewing functions. In both cases, we show that the resulting priors for the shape parameter are symmetric around zero and proper. Moreover, we propose a Student's t approximation of the Jeffreys prior that makes an objective Bayesian analysis easy to perform. We carry out a Monte Carlo simulation study that demonstrates an overall better behaviour of the maximum a posteriori estimator compared with the maximum likelihood estimator. We also compare the frequentist coverage of the credible intervals based on the Jeffreys prior and its approximation and show that they are similar. We further discuss location‐scale models under scale mixtures of skew‐normal distributions and show some conditions for the existence of the posterior distribution and its moments. Finally, we present three numerical examples to illustrate the implications of our results on inference for skew‐t distributions.  相似文献   

12.
Robust mixture modelling using the t distribution   总被引:2,自引:0,他引:2  
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.  相似文献   

13.
The majority of the existing literature on model-based clustering deals with symmetric components. In some cases, especially when dealing with skewed subpopulations, the estimate of the number of groups can be misleading; if symmetric components are assumed we need more than one component to describe an asymmetric group. Existing mixture models, based on multivariate normal distributions and multivariate t distributions, try to fit symmetric distributions, i.e. they fit symmetric clusters. In the present paper, we propose the use of finite mixtures of the normal inverse Gaussian distribution (and its multivariate extensions). Such finite mixture models start from a density that allows for skewness and fat tails, generalize the existing models, are tractable and have desirable properties. We examine both the univariate case, to gain insight, and the multivariate case, which is more useful in real applications. EM type algorithms are described for fitting the models. Real data examples are used to demonstrate the potential of the new model in comparison with existing ones.  相似文献   

14.
In many studies, the data collected are subject to some upper and lower detection limits. Hence, the responses are either left or right censored. A complication arises when these continuous measures present heavy tails and asymmetrical behavior; simultaneously. For such data structures, we propose a robust-censored linear model based on the scale mixtures of skew-normal (SMSN) distributions. The SMSN is an attractive class of asymmetrical heavy-tailed densities that includes the skew-normal, skew-t, skew-slash, skew-contaminated normal and the entire family of scale mixtures of normal (SMN) distributions as special cases. We propose a fast estimation procedure to obtain the maximum likelihood (ML) estimates of the parameters, using a stochastic approximation of the EM (SAEM) algorithm. This approach allows us to estimate the parameters of interest easily and quickly, obtaining as a byproducts the standard errors, predictions of unobservable values of the response and the log-likelihood function. The proposed methods are illustrated through real data applications and several simulation studies.  相似文献   

15.
This paper presents a robust probabilistic mixture model based on the multivariate skew-t-normal distribution, a skew extension of the multivariate Student’s t distribution with more powerful abilities in modelling data whose distribution seriously deviates from normality. The proposed model includes mixtures of normal, t and skew-normal distributions as special cases and provides a flexible alternative to recently proposed skew t mixtures. We develop two analytically tractable EM-type algorithms for computing maximum likelihood estimates of model parameters in which the skewness parameters and degrees of freedom are asymptotically uncorrelated. Standard errors for the parameter estimates can be obtained via a general information-based method. We also present a procedure of merging mixture components to automatically identify the number of clusters by fitting piecewise linear regression to the rescaled entropy plot. The effectiveness and performance of the proposed methodology are illustrated by two real-life examples.  相似文献   

16.
Lin  Tsung I.  Lee  Jack C.  Ni  Huey F. 《Statistics and Computing》2004,14(2):119-130
A finite mixture model using the multivariate t distribution has been shown as a robust extension of normal mixtures. In this paper, we present a Bayesian approach for inference about parameters of t-mixture models. The specifications of prior distributions are weakly informative to avoid causing nonintegrable posterior distributions. We present two efficient EM-type algorithms for computing the joint posterior mode with the observed data and an incomplete future vector as the sample. Markov chain Monte Carlo sampling schemes are also developed to obtain the target posterior distribution of parameters. The advantages of Bayesian approach over the maximum likelihood method are demonstrated via a set of real data.  相似文献   

17.
Abstract. For probability distributions on ? q, a detailed study of the breakdown properties of some multivariate M‐functionals related to Tyler's [Ann. Statist. 15 (1987) 234] ‘distribution‐free’ M‐functional of scatter is given. These include a symmetrized version of Tyler's M‐functional of scatter, and the multivariate t M‐functionals of location and scatter. It is shown that for ‘smooth’ distributions, the (contamination) breakdown point of Tyler's M‐functional of scatter and of its symmetrized version are 1/q and , respectively. For the multivariate t M‐functional which arises from the maximum likelihood estimate for the parameters of an elliptical t distribution on ν ≥ 1 degrees of freedom the breakdown point at smooth distributions is 1/( q + ν). Breakdown points are also obtained for general distributions, including empirical distributions. Finally, the sources of breakdown are investigated. It turns out that breakdown can only be caused by contaminating distributions that are concentrated near low‐dimensional subspaces.  相似文献   

18.
Model-based clustering typically involves the development of a family of mixture models and the imposition of these models upon data. The best member of the family is then chosen using some criterion and the associated parameter estimates lead to predicted group memberships, or clusterings. This paper describes the extension of the mixtures of multivariate t-factor analyzers model to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices. The result is a family of six mixture models, including parsimonious models. Parameter estimates for this family of models are derived using an alternating expectation-conditional maximization algorithm and convergence is determined based on Aitken’s acceleration. Model selection is carried out using the Bayesian information criterion (BIC) and the integrated completed likelihood (ICL). This novel family of mixture models is then applied to simulated and real data where clustering performance meets or exceeds that of established model-based clustering methods. The simulation studies include a comparison of the BIC and the ICL as model selection techniques for this novel family of models. Application to simulated data with larger dimensionality is also explored.  相似文献   

19.
As is the case of many studies, the data collected are limited and an exact value is recorded only if it falls within an interval range. Hence, the responses can be either left, interval or right censored. Linear (and nonlinear) regression models are routinely used to analyze these types of data and are based on normality assumptions for the errors terms. However, those analyzes might not provide robust inference when the normality assumptions are questionable. In this article, we develop a Bayesian framework for censored linear regression models by replacing the Gaussian assumptions for the random errors with scale mixtures of normal (SMN) distributions. The SMN is an attractive class of symmetric heavy-tailed densities that includes the normal, Student-t, Pearson type VII, slash and the contaminated normal distributions, as special cases. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo algorithm is introduced to carry out posterior inference. A new hierarchical prior distribution is suggested for the degrees of freedom parameter in the Student-t distribution. The likelihood function is utilized to compute not only some Bayesian model selection measures but also to develop Bayesian case-deletion influence diagnostics based on the q-divergence measure. The proposed Bayesian methods are implemented in the R package BayesCR. The newly developed procedures are illustrated with applications using real and simulated data.  相似文献   

20.
A critical issue in modeling binary response data is the choice of the links. We introduce a new link based on the Student’s t-distribution (t-link) for correlated binary data. The t-link relates to the common probit-normal link adding one additional parameter which controls the heaviness of the tails of the link. We propose an interesting EM algorithm for computing the maximum likelihood for generalized linear mixed t-link models for correlated binary data. In contrast with recent developments (Tan et al. in J. Stat. Comput. Simul. 77:929–943, 2007; Meza et al. in Comput. Stat. Data Anal. 53:1350–1360, 2009), this algorithm uses closed-form expressions at the E-step, as opposed to Monte Carlo simulation. Our proposed algorithm relies on available formulas for the mean and variance of a truncated multivariate t-distribution. To illustrate the new method, a real data set on respiratory infection in children and a simulation study are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号