首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 93 毫秒
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

When a two-level multilevel model (MLM) is used for repeated growth data, the individuals constitute level 2 and the successive measurements constitute level 1, which is nested within the individuals that make up level 2. The heterogeneity among individuals is represented by either the random-intercept or random-coefficient (slope) model. The variance components at level 1 involve serial effects and measurement errors under constant variance or heteroscedasticity. This study hypothesizes that missing serial effects or/and heteroscedasticity may bias the results obtained from two-level models. To illustrate this effect, we conducted two simulation studies, where the simulated data were based on the characteristics of an empirical mouse tumour data set. The results suggest that for repeated growth data with constant variance (measurement error) and misspecified serial effects (ρ > 0.3), the proportion of level-2 variation (intra-class correlation coefficient) increases with ρ and the two-level random-coefficient model is the minimum AIC (or AICc) model when compared with the fixed model, heteroscedasticity model, and random-intercept model. In addition, the serial effect (ρ > 0.1) and heteroscedasticity are both misspecified, implying that the two-level random-coefficient model is the minimum AIC (or AICc) model when compared with the fixed model and random-intercept model. This study demonstrates that missing serial effects and/or heteroscedasticity may indicate heterogeneity among individuals in repeated growth data (mixed or two-level MLM). This issue is critical in biomedical research.  相似文献   

Approaches that use the pseudolikelihood to perform multilevel modelling on survey data have been presented in the literature. To avoid biased estimates due to unequal selection probabilities, conditional weights can be introduced at each level. Less-biased estimators can also be obtained in a two-level linear model if the level-1 weights are scaled. In this paper, we studied several level-2 weights that can be introduced into the pseudolikelihood when the sampling design and the hierarchical structure of the multilevel model do not match. Two-level and three-level models were studied. The present work was motivated by a study that aims to estimate the contributions of lead sources to polluting the interior floor dust of the rooms within dwellings. We performed a simulation study using the real data collected from a French survey to achieve our objective. We conclude that it is preferable to use unweighted analyses or, at the most, to use conditional level-2 weights in a two-level or a three-level model. We state some warnings and make some recommendations.  相似文献   

Count data with excess zeros are widely encountered in the fields of biomedical, medical, public health and social survey, etc. Zero-inflated Poisson (ZIP) regression models with mixed effects are useful tools for analyzing such data, in which covariates are usually incorporated in the model to explain inter-subject variation and normal distribution is assumed for both random effects and random errors. However, in many practical applications, such assumptions may be violated as the data often exhibit skewness and some covariates may be measured with measurement errors. In this paper, we deal with these issues simultaneously by developing a Bayesian joint hierarchical modeling approach. Specifically, by treating intercepts and slopes in logistic and Poisson regression as random, a flexible two-level ZIP regression model is proposed, where a covariate process with measurement errors is established and a skew-t-distribution is considered for both random errors and random effects. Under the Bayesian framework, model selection is carried out using deviance information criterion (DIC) and a goodness-of-fit statistics is also developed for assessing the plausibility of the posited model. The main advantage of our method is that it allows for more robustness and correctness for investigating heterogeneity from different levels, while accommodating the skewness and measurement errors simultaneously. An application to Shanghai Youth Fitness Survey is used as an illustrate example. Through this real example, it is showed that our approach is of interest and usefulness for applications.  相似文献   

Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. Supplementary material for this article is available online.  相似文献   

Random effects regression mixture models are a way to classify longitudinal data (or trajectories) having possibly varying lengths. The mixture structure of the traditional random effects regression mixture model arises through the distribution of the random regression coefficients, which is assumed to be a mixture of multivariate normals. An extension of this standard model is presented that accounts for various levels of heterogeneity among the trajectories, depending on their assumed error structure. A standard likelihood ratio test is presented for testing this error structure assumption. Full details of an expectation-conditional maximization algorithm for maximum likelihood estimation are also presented. This model is used to analyze data from an infant habituation experiment, where it is desirable to assess whether infants comprise different populations in terms of their habituation time.  相似文献   

Global regression assumes that a single model adequately describes all parts of a study region. However, the heterogeneity in the data may be sufficiently strong that relationships between variables can not be spatially constant. In addition, the factors involved are often sufficiently complex that it is difficult to identify them in the form of explanatory variables. As a result Geographically Weighted Regression (GWR) was introduced as a tool for the modeling of non-stationary spatial data. Using kernel functions, the GWR methodology allows the model parameters to vary spatially and produces non-parametric surfaces of their estimates. To model count data with overdispersion, it is more appropriate to use a negative binomial distribution instead of a Poisson distribution. Therefore, we propose the Geographically Weighted Negative Binomial Regression (GWNBR) method for the modeling of data with overdispersion. The results obtained using simulated and real data show the superiority of this method for the modeling of non-stationary count data with overdispersion compared with competing models, such as global regressions, e.g., Poisson and negative binomial and Geographically Weighted Poisson Regression (GWPR). Moreover, we illustrate that these competing models are special cases of the more robust model GWNBR.  相似文献   

Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data.  相似文献   

Joint modelling skewness and heterogeneity is challenging in data analysis, particularly in regression analysis which allows a random probability distribution to change flexibly with covariates. This paper, based on a skew Laplace normal (SLN) mixture of location, scale, and skewness, introduces a new regression model which provides a flexible modelling of location, scale and skewness parameters simultaneously. The maximum likelihood (ML) estimators of all parameters of the proposed model via the expectation-maximization (EM) algorithm as well as their asymptotic properties are derived. Numerical analyses via a simulation study and a real data example are used to illustrate the performance of the proposed model.  相似文献   

We define the exponentiated power exponential distribution and propose a regression model with different systematic structures based on the new distribution. We show that the new regression model can be applied to dispersion data since it represents a parametric family of models that includes as sub-models some widely-known regression models. It then can be used more effectively in the analysis of real data. We use maximum likelihood estimation and derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes. Some global-influence measurements are also investigated and simulation studies are performed to evaluate the accuracy of the estimates. We provide an application of the regression model with four systematic structures to nursing activities score data in the Unit of the Medical Clinic of University of São Paulo (USP) Hospital.  相似文献   

In many financial applications, Poisson mixture regression models are commonly used to analyze heterogeneous count data. When fitting these models, the observed counts are supposed to come from two or more subpopulations and parameter estimation is typically performed by means of maximum likelihood via the Expectation–Maximization algorithm. In this study, we discuss briefly the procedure for fitting Poisson mixture regression models by means of maximum likelihood, the model selection and goodness-of-fit tests. These models are applied to a real data set for credit-scoring purposes. We aim to reveal the impact of demographic and financial variables in creating different groups of clients and to predict the group to which each client belongs, as well as his expected number of defaulted payments. The model's conclusions are very interesting, revealing that the population consists of three groups, contrasting with the traditional good versus bad categorization approach of the credit-scoring systems.  相似文献   

Finite mixture of regression (FMR) models are aimed at characterizing subpopulation heterogeneity stemming from different sets of covariates that impact different groups in a population. We address the contemporary problem of simultaneously conducting covariate selection and determining the number of mixture components from a Bayesian perspective that can incorporate prior information. We propose a Gibbs sampling algorithm with reversible jump Markov chain Monte Carlo implementation to accomplish concurrent covariate selection and mixture component determination in FMR models. Our Bayesian approach contains innovative features compared to previously developed reversible jump algorithms. In addition, we introduce component-adaptive weighted g priors for regression coefficients, and illustrate their improved performance in covariate selection. Numerical studies show that the Gibbs sampler with reversible jump implementation performs well, and that the proposed weighted priors can be superior to non-adaptive unweighted priors.  相似文献   

Focusing on the model selection problems in the family of Poisson mixture models (including the Poisson mixture regression model with random effects and zero‐inflated Poisson regression model with random effects), the current paper derives two conditional Akaike information criteria. The criteria are the unbiased estimators of the conditional Akaike information based on the conditional log‐likelihood and the conditional Akaike information based on the joint log‐likelihood, respectively. The derivation is free from the specific parametric assumptions about the conditional mean of the true data‐generating model and applies to different types of estimation methods. Additionally, the derivation is not based on the asymptotic argument. Simulations show that the proposed criteria have promising estimation accuracy. In addition, it is found that the criterion based on the conditional log‐likelihood demonstrates good model selection performance under different scenarios. Two sets of real data are used to illustrate the proposed method.  相似文献   

We study how different prior assumptions on the spatially structured heterogeneity term of the convolution hierarchical Bayesian model for spatial disease data could affect the results of an ecological analysis when response and exposure exhibit a strong spatial pattern. We show that in this case the estimate of the regression parameter could be strongly biased, both by analyzing the association between lung cancer mortality and education level on a real dataset and by a simulation experiment. The analysis is based on a hierarchical Bayesian model with a time dependent covariate in which we allow for a latency period between exposure and mortality, with time and space random terms and misaligned exposure-disease data.  相似文献   

We propose quantile regression (QR) in the Bayesian framework for a class of nonlinear mixed effects models with a known, parametric model form for longitudinal data. Estimation of the regression quantiles is based on a likelihood-based approach using the asymmetric Laplace density. Posterior computations are carried out via Gibbs sampling and the adaptive rejection Metropolis algorithm. To assess the performance of the Bayesian QR estimator, we compare it with the mean regression estimator using real and simulated data. Results show that the Bayesian QR estimator provides a fuller examination of the shape of the conditional distribution of the response variable. Our approach is proposed for parametric nonlinear mixed effects models, and therefore may not be generalized to models without a given model form.  相似文献   

Quantile regression methods have been widely used in many research areas in recent years. However conventional estimation methods for quantile regression models do not guarantee that the estimated quantile curves will be non‐crossing. While there are various methods in the literature to deal with this problem, many of these methods force the model parameters to lie within a subset of the parameter space in order for the required monotonicity to be satisfied. Note that different methods may use different subspaces of the space of model parameters. This paper establishes a relationship between the monotonicity of the estimated conditional quantiles and the comonotonicity of the model parameters. We develope a novel quasi‐Bayesian method for parameter estimation which can be used to deal with both time series and independent statistical data. Simulation studies and an application to real financial returns show that the proposed method has the potential to be very useful in practice.  相似文献   


In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology.  相似文献   

Expectile regression [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847] is a nice tool for estimating the conditional expectiles of a response variable given a set of covariates. Expectile regression at 50% level is the classical conditional mean regression. In many real applications having multiple expectiles at different levels provides a more complete picture of the conditional distribution of the response variable. Multiple linear expectile regression model has been well studied [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847; Efron B. Regression percentiles using asymmetric squared error loss, Stat Sin. 1991;1(93):125.], but it can be too restrictive for many real applications. In this paper, we derive a regression tree-based gradient boosting estimator for nonparametric multiple expectile regression. The new estimator, referred to as ER-Boost, is implemented in an R package erboost publicly available at http://cran.r-project.org/web/packages/erboost/index.html. We use two homoscedastic/heteroscedastic random-function-generator models in simulation to show the high predictive accuracy of ER-Boost. As an application, we apply ER-Boost to analyse North Carolina County crime data. From the nonparametric expectile regression analysis of this dataset, we draw several interesting conclusions that are consistent with the previous study using the economic model of crime. This real data example also provides a good demonstration of some nice features of ER-Boost, such as its ability to handle different types of covariates and its model interpretation tools.  相似文献   

Mixture regression models are used to investigate the relationship between variables that come from unknown latent groups and to model heterogenous datasets. In general, the error terms are assumed to be normal in the mixture regression model. However, the estimators under normality assumption are sensitive to the outliers. In this article, we introduce a robust mixture regression procedure based on the LTS-estimation method to combat with the outliers in the data. We give a simulation study and a real data example to illustrate the performance of the proposed estimators over the counterparts in terms of dealing with outliers.  相似文献   

Finite mixture models are currently used to analyze heterogeneous longitudinal data. By releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, finite mixture models not only can estimate model parameters but also cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, which might be associated with a clinically important binary outcome. This article develops a joint modeling of a finite mixture of NLME models for longitudinal data in the presence of covariate measurement errors and a logistic regression for a binary outcome, linked by individual latent class indicators, under a Bayesian framework. Simulation studies are conducted to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and logistic regression are fitted separately, followed by an application to a real data set from an AIDS clinical trial, in which the viral dynamics and dichotomized time to the first decline of CD4/CD8 ratio are analyzed jointly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号