首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

2.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

3.
The authors propose a robust transformation linear mixed‐effects model for longitudinal continuous proportional data when some of the subjects exhibit outlying trajectories over time. It becomes troublesome when including or excluding such subjects in the data analysis results in different statistical conclusions. To robustify the longitudinal analysis using the mixed‐effects model, they utilize the multivariate t distribution for random effects or/and error terms. Estimation and inference in the proposed model are established and illustrated by a real data example from an ophthalmology study. Simulation studies show a substantial robustness gain by the proposed model in comparison to the mixed‐effects model based on Aitchison's logit‐normal approach. As a result, the data analysis benefits from the robustness of making consistent conclusions in the presence of influential outliers. The Canadian Journal of Statistics © 2009 Statistical Society of Canada  相似文献   

4.
The multivariate t linear mixed model (MtLMM) has been recently proposed as a robust tool for analysing multivariate longitudinal data with atypical observations. Missing outcomes frequently occur in longitudinal research even in well controlled situations. As a powerful alternative to the traditional expectation maximization based algorithm employing single imputation, we consider a Bayesian analysis of the MtLMM to account for the uncertainties of model parameters and missing outcomes through multiple imputation. An inverse Bayes formulas sampler coupled with Metropolis-within-Gibbs scheme is used to effectively draw the posterior distributions of latent data and model parameters. The techniques for multiple imputation of missing values, estimation of random effects, prediction of future responses, and diagnostics of potential outliers are investigated as well. The proposed methodology is illustrated through a simulation study and an application to AIDS/HIV data.  相似文献   

5.
Robust mixture modelling using the t distribution   总被引:2,自引:0,他引:2  
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.  相似文献   

6.
The demand for reliable statistics in subpopulations, when only reduced sample sizes are available, has promoted the development of small area estimation methods. In particular, an approach that is now widely used is based on the seminal work by Battese et al. [An error-components model for prediction of county crop areas using survey and satellite data, J. Am. Statist. Assoc. 83 (1988), pp. 28–36] that uses linear mixed models (MM). We investigate alternatives when a linear MM does not hold because, on one side, linearity may not be assumed and/or, on the other, normality of the random effects may not be assumed. In particular, Opsomer et al. [Nonparametric small area estimation using penalized spline regression, J. R. Statist. Soc. Ser. B 70 (2008), pp. 265–283] propose an estimator that extends the linear MM approach to the case in which a linear relationship may not be assumed using penalized splines regression. From a very different perspective, Chambers and Tzavidis [M-quantile models for small area estimation, Biometrika 93 (2006), pp. 255–268] have recently proposed an approach for small-area estimation that is based on M-quantile (MQ) regression. This allows for models robust to outliers and to distributional assumptions on the errors and the area effects. However, when the functional form of the relationship between the qth MQ and the covariates is not linear, it can lead to biased estimates of the small area parameters. Pratesi et al. [Semiparametric M-quantile regression for estimating the proportion of acidic lakes in 8-digit HUCs of the Northeastern US, Environmetrics 19(7) (2008), pp. 687–701] apply an extended version of this approach for the estimation of the small area distribution function using a non-parametric specification of the conditional MQ of the response variable given the covariates [M. Pratesi, M.G. Ranalli, and N. Salvati, Nonparametric m-quantile regression using penalized splines, J. Nonparametric Stat. 21 (2009), pp. 287–304]. We will derive the small area estimator of the mean under this model, together with its mean-squared error estimator and compare its performance to the other estimators via simulations on both real and simulated data.  相似文献   

7.
Mixtures of multivariate t distributions provide a robust parametric extension to the fitting of data with respect to normal mixtures. In presence of some noise component, potential outliers or data with longer-than-normal tails, one way to broaden the model can be provided by considering t distributions. In this framework, the degrees of freedom can act as a robustness parameter, tuning the heaviness of the tails, and downweighting the effect of the outliers on the parameters estimation. The aim of this paper is to extend to mixtures of multivariate elliptical distributions some theoretical results about the likelihood maximization on constrained parameter spaces. Further, a constrained monotone algorithm implementing maximum likelihood mixture decomposition of multivariate t distributions is proposed, to achieve improved convergence capabilities and robustness. Monte Carlo numerical simulations and a real data study illustrate the better performance of the algorithm, comparing it to earlier proposals.  相似文献   

8.
Measurement error models constitute a wide class of models that include linear and nonlinear regression models. They are very useful to model many real-life phenomena, particularly in the medical and biological areas. The great advantage of these models is that, in some sense, they can be represented as mixed effects models, allowing us to implement well-known techniques, like the EM-algorithm for the parameter estimation. In this paper, we consider a class of multivariate measurement error models where the observed response and/or covariate are not fully observed, i.e., the observations are subject to certain threshold values below or above which the measurements are not quantifiable. Consequently, these observations are considered censored. We assume a Student-t distribution for the unobserved true values of the mismeasured covariate and the error term of the model, providing a robust alternative for parameter estimation. Our approach relies on a likelihood-based inference using an EM-type algorithm. The proposed method is illustrated through some simulation studies and the analysis of an AIDS clinical trial dataset.  相似文献   

9.
Linear mixed models have been widely used to analyze repeated measures data which arise in many studies. In most applications, it is assumed that both the random effects and the within-subjects errors are normally distributed. This can be extremely restrictive, obscuring important features of within-and among-subject variations. Here, quantile regression in the Bayesian framework for the linear mixed models is described to carry out the robust inferences. We also relax the normality assumption for the random effects by using a multivariate skew-normal distribution, which includes the normal ones as a special case and provides robust estimation in the linear mixed models. For posterior inference, we propose a Gibbs sampling algorithm based on a mixture representation of the asymmetric Laplace distribution and multivariate skew-normal distribution. The procedures are demonstrated by both simulated and real data examples.  相似文献   

10.
Quantitative traits measured over pedigrees of individuals may be analysed using maximum likelihood estimation, assuming that the trait has a multivariate normal distribution. This approach is often used in the analysis of mixed linear models. In this paper a robust version of the log likelihood for multivariate normal data is used to construct M-estimators which are resistant to contamination by outliers. The robust estimators are found using a minimisation routine which retains the flexible parameterisations of the multivariate normal approach. Asymptotic properties of the estimators are derived, computation of the estimates and their use in outlier detection tests are discussed, and a small simulation study is conducted.  相似文献   

11.
The subject of this paper is Bayesian inference about the fixed and random effects of a mixed-effects linear statistical model with two variance components. It is assumed that a priori the fixed effects have a noninformative distribution and that the reciprocals of the variance components are distributed independently (of each other and of the fixed effects) as gamma random variables. It is shown that techniques similar to those employed in a ridge analysis of a response surface can be used to construct a one-dimensional curve that contains all of the stationary points of the posterior density of the random effects. The “ridge analysis” (of the posterior density) can be useful (from a computational standpoint) in finding the number and the locations of the stationary points and can be very informative about various features of the posterior density. Depending on what is revealed by the ridge analysis, a multivariate normal or multivariate-t distribution that is centered at a posterior mode may provide a satisfactory approximation to the posterior distribution of the random effects (which is of the poly-t form).  相似文献   

12.
We propose a strongly root-n consistent simulation-based estimator for the generalized linear mixed models. This estimator is constructed based on the first two marginal moments of the response variables, and it allows the random effects to have any parametric distribution (not necessarily normal). Consistency and asymptotic normality for the proposed estimator are derived under fairly general regularity conditions. We also demonstrate that this estimator has a bounded influence function and that it is robust against data outliers. A bias correction technique is proposed to reduce the finite sample bias in the estimation of variance components. The methodology is illustrated through an application to the famed seizure count data and some simulation studies.  相似文献   

13.
In this article, an alternative estimation approach is proposed to fit linear mixed effects models where the random effects follow a finite mixture of normal distributions. This heterogeneity linear mixed model is an interesting tool since it relaxes the classical normality assumption and is also perfectly suitable for classification purposes, based on longitudinal profiles. Instead of fitting directly the heterogeneity linear mixed model, we propose to fit an equivalent mixture of linear mixed models under some restrictions which is computationally simpler. Unlike the former model, the latter can be maximized analytically using an EM-algorithm and the obtained parameter estimates can be easily used to compute the parameter estimates of interest.  相似文献   

14.
Maximum likelihood approach is the most frequently employed approach for the inference of linear mixed models. However, it relies on the normal distributional assumption of the random effects and the within-subject errors, and it is lack of robustness against outliers. This article proposes a semiparametric estimation approach for linear mixed models. This approach is based on the first two marginal moments of the response variable, and does not require any parametric distributional assumptions of random effects or error terms. The consistency and asymptotically normality of the estimator are derived under fairly general conditions. In addition, we show that the proposed estimator has a bounded influence function and a redescending property so it is robust to outliers. The methodology is illustrated through an application to the famed Framingham cholesterol data. The finite sample behavior and the robustness properties of the proposed estimator are evaluated through extensive simulation studies.  相似文献   

15.
In this paper, we consider inferences in a binary dynamic mixed model. The existing estimation approaches mainly estimate the regression effects and the dynamic dependence parameters either through the estimation of the random effects or by avoiding the random effects technically. Under the assumption that the random effects follow a Gaussian distribution, we propose a generalized quasilikelihood (GQL) approach for the estimation of the parameters of the dynamic mixed models. The proposed approach is computationally less cumbersome than the exact maximum likelihood (ML) approach. We also carry out the GQL estimation under two competitive, namely, probit and logit mixed models, and discuss both the asymptotic and small-sample behaviour of their estimators.  相似文献   

16.
When a generalized linear mixed model with multiple (two or more) sources of random effects is considered, the inferences may vary depending on the nature of the random effects. In this paper, we consider a familial Poisson mixed model where each of the count responses of a family are influenced by two independent unobservable familial random effects with two distinct components of dispersion. A generalized quasilikelihood (GQL) approach is discussed for the estimation of the dispersion components as well as the regression effects of the model. A simulation study is conducted to examine the relative performance of the GQL approach as opposed to a simpler method of moments. Furthermore, the GQL estimation methodology is illustrated by using health care utilization data that follow a Poisson mixed model with one component of dispersion and by using simulated asthma data that follow a Poisson mixed model with two sources of random effects with two distinct components of dispersion.  相似文献   

17.
It is well known that in a traditional outlier-free situation, the generalized quasi-likelihood (GQL) approach [B.C. Sutradhar, On exact quasilikelihood inference in generalized linear mixed models, Sankhya: Indian J. Statist. 66 (2004), pp. 261–289] performs very well to obtain the consistent as well as the efficient estimates for the parameters involved in the generalized linear mixed models (GLMMs). In this paper, we first examine the effect of the presence of one or more outliers on the GQL estimation for the parameters in such GLMMs, especially in two important models such as count and binary mixed models. The outliers appear to cause serious biases and hence inconsistency in the estimation. As a remedy, we then propose a robust GQL (RGQL) approach in order to obtain the consistent estimates for the parameters in the GLMMs in the presence of one or more outliers. An extensive simulation study is conducted to examine the consistency performance of the proposed RGQL approach.  相似文献   

18.
This paper is concerned with a semiparametric partially linear regression model with unknown regression coefficients, an unknown nonparametric function for the non-linear component, and unobservable Gaussian distributed random errors. We present a wavelet thresholding based estimation procedure to estimate the components of the partial linear model by establishing a connection between an l 1-penalty based wavelet estimator of the nonparametric component and Huber’s M-estimation of a standard linear model with outliers. Some general results on the large sample properties of the estimates of both the parametric and the nonparametric part of the model are established. Simulations are used to illustrate the general results and to compare the proposed methodology with other methods available in the recent literature.  相似文献   

19.
Maximum likelihood is a widely used estimation method in statistics. This method is model dependent and as such is criticized as being non robust. In this article, we consider using weighted likelihood method to make robust inferences for linear mixed models where weights are determined at both the subject level and the observation level. This approach is appropriate for problems where maximum likelihood is the basic fitting technique, but a subset of data points is discrepant with the model. It allows us to reduce the impact of outliers without complicating the basic linear mixed model with normally distributed random effects and errors. The weighted likelihood estimators are shown to be robust and asymptotically normal. Our simulation study demonstrates that the weighted estimates are much better than the unweighted ones when a subset of data points is far away from the rest. Its application to the analysis of deglutition apnea duration in normal swallows shows that the differences between the weighted and unweighted estimates are due to large amount of outliers in the data set.  相似文献   

20.
Nonlinear mixed-effects models are very useful to analyze repeated measures data and are used in a variety of applications. Normal distributions for random effects and residual errors are usually assumed, but such assumptions make inferences vulnerable to the presence of outliers. In this work, we introduce an extension of a normal nonlinear mixed-effects model considering a subclass of elliptical contoured distributions for both random effects and residual errors. This elliptical subclass, the scale mixtures of normal (SMN) distributions, includes heavy-tailed multivariate distributions, such as Student-t, the contaminated normal and slash, among others, and represents an interesting alternative to outliers accommodation maintaining the elegance and simplicity of the maximum likelihood theory. We propose an exact estimation procedure to obtain the maximum likelihood estimates of the fixed-effects and variance components, using a stochastic approximation of the EM algorithm. We compare the performance of the normal and the SMN models with two real data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号