首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 789 毫秒
1.
Mixture of linear regression models provide a popular treatment for modeling nonlinear regression relationship. The traditional estimation of mixture of regression models is based on Gaussian error assumption. It is well known that such assumption is sensitive to outliers and extreme values. To overcome this issue, a new class of finite mixture of quantile regressions (FMQR) is proposed in this article. Compared with the existing Gaussian mixture regression models, the proposed FMQR model can provide a complete specification on the conditional distribution of response variable for each component. From the likelihood point of view, the FMQR model is equivalent to the finite mixture of regression models based on errors following asymmetric Laplace distribution (ALD), which can be regarded as an extension to the traditional mixture of regression models with normal error terms. An EM algorithm is proposed to obtain the parameter estimates of the FMQR model by combining a hierarchical representation of the ALD. Finally, the iterated weighted least square estimation for each mixture component of the FMQR model is derived. Simulation studies are conducted to illustrate the finite sample performance of the estimation procedure. Analysis of an aphid data set is used to illustrate our methodologies.  相似文献   

2.
Unity measure errors (UME) in numerical survey data can determine serious bias in the estimates of interest. In this paper, a finite Gaussian mixture model is used to identify observations affected by UME and to robustly estimate the target parameters in presence of this type of error. In the proposed model, the mixture components are associated to the different error patterns across the variables. We follow a multiple imputation approach in a Bayesian setting that allows us to handle missing values in data. In this framework, the assessment of the uncertainty associated with both errors and missingness is based on repeatedly drawing from the predictive distribution of the true non contaminated data given the observed data. The draws are obtained through a suitable version of the data augmentation algorithm. Applications to both simulated and real data are presented.  相似文献   

3.
This research was motivated by our goal to design an efficient clinical trial to compare two doses of docosahexaenoic acid supplementation for reducing the rate of earliest preterm births (ePTB) and/or preterm births (PTB). Dichotomizing continuous gestational age (GA) data using a classic binomial distribution will result in a loss of information and reduced power. A distributional approach is an improved strategy to retain statistical power from the continuous distribution. However, appropriate distributions that fit the data properly, particularly in the tails, must be chosen, especially when the data are skewed. A recent study proposed a skew-normal method. We propose a three-component normal mixture model and introduce separate treatment effects at different components of GA. We evaluate operating characteristics of mixture model, beta-binomial model, and skew-normal model through simulation. We also apply these three methods to data from two completed clinical trials from the USA and Australia. Finite mixture models are shown to have favorable properties in PTB analysis but minimal benefit for ePTB analysis. Normal models on log-transformed data have the largest bias. Therefore we recommend finite mixture model for PTB study. Either finite mixture model or beta-binomial model is acceptable for ePTB study.  相似文献   

4.
Summary.  We consider a finite mixture model with k components and a kernel distribution from a general one-parameter family. The problem of testing the hypothesis k =2 versus k 3 is studied. There has been no general statistical testing procedure for this problem. We propose a modified likelihood ratio statistic where under the null and the alternative hypotheses the estimates of the parameters are obtained from a modified likelihood function. It is shown that estimators of the support points are consistent. The asymptotic null distribution of the modified likelihood ratio test proposed is derived and found to be relatively simple and easily applied. Simulation studies for the asymptotic modified likelihood ratio test based on finite mixture models with normal, binomial and Poisson kernels suggest that the test proposed performs well. Simulation studies are also conducted for a bootstrap method with normal kernels. An example involving foetal movement data from a medical study illustrates the testing procedure.  相似文献   

5.
A mixture measurement error model built upon skew normal distributions and normal distributions is developed to evaluate various impacts of measurement errors to parameter inferences in logistic regressions. Data generated from survey questionnaires are usually error contaminated. We consider two types of errors: person-specific bias and random errors. Person-specific bias is modelled using skew normal distribution, and the distribution of random errors is described by a normal distribution. Intensive simulations are conducted to evaluate the contribution of each component in the mixture to outcomes of interest. The proposed method is then applied to a questionnaire data set generated from a neural tube defect study. Simulation results and real data application indicate that ignoring measurement errors or misspecifying measurement error components can both produce misleading results, especially when measurement errors are actually skew distributed. The inferred parameters can be attenuated or inflated depending on how the measurement error components are specified. We expect the findings will self-explain the importance of adjusting measurement errors and thus benefit future data collection effort.  相似文献   

6.
Model-based clustering is a method that clusters data with an assumption of a statistical model structure. In this paper, we propose a novel model-based hierarchical clustering method for a finite statistical mixture model based on the Fisher distribution. The main foci of the proposed method are: (a) provide efficient solution to estimate the parameters of a Fisher mixture model (FMM); (b) generate a hierarchy of FMMs and (c) select the optimal model. To this aim, we develop a Bregman soft clustering method for FMM. Our model estimation strategy exploits Bregman divergence and hierarchical agglomerative clustering. Whereas, our model selection strategy comprises a parsimony-based approach and an evaluation graph-based approach. We empirically validate our proposed method by applying it on simulated data. Next, we apply the method on real data to perform depth image analysis. We demonstrate that the proposed clustering method can be used as a potential tool for unsupervised depth image analysis.  相似文献   

7.
Finite mixture models are currently used to analyze heterogeneous longitudinal data. By releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, finite mixture models not only can estimate model parameters but also cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, which might be associated with a clinically important binary outcome. This article develops a joint modeling of a finite mixture of NLME models for longitudinal data in the presence of covariate measurement errors and a logistic regression for a binary outcome, linked by individual latent class indicators, under a Bayesian framework. Simulation studies are conducted to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and logistic regression are fitted separately, followed by an application to a real data set from an AIDS clinical trial, in which the viral dynamics and dichotomized time to the first decline of CD4/CD8 ratio are analyzed jointly.  相似文献   

8.
A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates. The proposed methodology is illustrated by analyzing a real data example.  相似文献   

9.
Mixture separation for mixed-mode data   总被引:3,自引:0,他引:3  
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.  相似文献   

10.
The measurement error model (MEM) is an important model in statistics because in a regression problem, the measurement error of the explanatory variable will seriously affect the statistical inferences if measurement errors are ignored. In this paper, we revisit the MEM when both the response and explanatory variables are further involved with rounding errors. Additionally, the use of a normal mixture distribution to increase the robustness of model misspecification for the distribution of the explanatory variables in measurement error regression is in line with recent developments. This paper proposes a new method for estimating the model parameters. It can be proved that the estimates obtained by the new method possess the properties of consistency and asymptotic normality.  相似文献   

11.
A stochastic volatility in mean model with correlated errors using the symmetrical class of scale mixtures of normal distributions is introduced in this article. The scale mixture of normal distributions is an attractive class of symmetric distributions that includes the normal, Student-t, slash and contaminated normal distributions as special cases, providing a robust alternative to estimation in stochastic volatility in mean models in the absence of normality. Using a Bayesian paradigm, an efficient method based on Markov chain Monte Carlo (MCMC) is developed for parameter estimation. The methods developed are applied to analyze daily stock return data from the São Paulo Stock, Mercantile & Futures Exchange index (IBOVESPA). The Bayesian predictive information criteria (BPIC) and the logarithm of the marginal likelihood are used as model selection criteria. The results reveal that the stochastic volatility in mean model with correlated errors and slash distribution provides a significant improvement in model fit for the IBOVESPA data over the usual normal model.  相似文献   

12.
In this work, we develop modeling and estimation approach for the analysis of cross-sectional clustered data with multimodal conditional distributions where the main interest is in analysis of subpopulations. It is proposed to model such data in a hierarchical model with conditional distributions viewed as finite mixtures of normal components. With a large number of observations in the lowest level clusters, a two-stage estimation approach is used. In the first stage, the normal mixture parameters in each lowest level cluster are estimated using robust methods. Robust alternatives to the maximum likelihood estimation are used to provide stable results even for data with conditional distributions such that their components may not quite meet normality assumptions. Then the lowest level cluster-specific means and standard deviations are modeled in a mixed effects model in the second stage. A small simulation study was conducted to compare performance of finite normal mixture population parameter estimates based on robust and maximum likelihood estimation in stage 1. The proposed modeling approach is illustrated through the analysis of mice tendon fibril diameters data. Analyses results address genotype differences between corresponding components in the mixtures and demonstrate advantages of robust estimation in stage 1.  相似文献   

13.
Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with asymmetric behavior. In this paper, we introduce a variable selection procedure for FMR models using the skew-normal distribution. With appropriate choice of the tuning parameters, we establish the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. To estimate the parameters of the model, a modified EM algorithm for numerical computations is developed. The methodology is illustrated through numerical experiments and a real data example.  相似文献   

14.
This paper develops Bayesian inference of extreme value models with a flexible time-dependent latent structure. The generalized extreme value distribution is utilized to incorporate state variables that follow an autoregressive moving average (ARMA) process with Gumbel-distributed innovations. The time-dependent extreme value distribution is combined with heavy-tailed error terms. An efficient Markov chain Monte Carlo algorithm is proposed using a state-space representation with a finite mixture of normal distributions to approximate the Gumbel distribution. The methodology is illustrated by simulated data and two different sets of real data. Monthly minima of daily returns of stock price index, and monthly maxima of hourly electricity demand are fit to the proposed model and used for model comparison. Estimation results show the usefulness of the proposed model and methodology, and provide evidence that the latent autoregressive process and heavy-tailed errors play an important role to describe the monthly series of minimum stock returns and maximum electricity demand.  相似文献   

15.
Longitudinal data are commonly modeled with the normal mixed-effects models. Most modeling methods are based on traditional mean regression, which results in non robust estimation when suffering extreme values or outliers. Median regression is also not a best choice to estimation especially for non normal errors. Compared to conventional modeling methods, composite quantile regression can provide robust estimation results even for non normal errors. In this paper, based on a so-called pseudo composite asymmetric Laplace distribution (PCALD), we develop a Bayesian treatment to composite quantile regression for mixed-effects models. Furthermore, with the location-scale mixture representation of the PCALD, we establish a Bayesian hierarchical model and achieve the posterior inference of all unknown parameters and latent variables using Markov Chain Monte Carlo (MCMC) method. Finally, this newly developed procedure is illustrated by some Monte Carlo simulations and a case analysis of HIV/AIDS clinical data set.  相似文献   

16.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

17.
In mixture experiments the properties of mixtures are usually studied by mixing the amounts of the mixture components that are required to obtain the necessary proportions. This paper considers the impact of inaccuracies in discharging the required amounts of the mixture components on the statistical analysis of the data. It shows how the regression calibration approach can be used to minimize the resulting bias in the model and in the estimates of the model parameters, as well as to find correct estimates of the corresponding variances. Its application is made difficult by the complex structure of these errors. We also show how knowledge of the form of the model bias allows for choosing a manufacturing setting for a mixture product that is not biased and has smaller signal to noise ratio.  相似文献   

18.
The World Health Organization (WHO) diagnostic criteria for diabetes mellitus were determined in part by evidence that in some populations the plasma glucose level 2 h after an oral glucose load is a mixture of two distinct distributions. We present a finite mixture model that allows the two component densities to be generalized linear models and the mixture probability to be a logistic regression model. The model allows us to estimate the prevalence of diabetes and sensitivity and specificity of the diagnostic criteria as a function of covariates and to estimate them in the absence of an external standard. Sensitivity is the probability that a test indicates disease conditionally on disease being present. Specificity is the probability that a test indicates no disease conditionally on no disease being present. We obtained maximum likelihood estimates via the EM algorithm and derived the standard errors from the information matrix and by the bootstrap. In the application to data from the diabetes in Egypt project a two-component mixture model fits well and the two components are interpreted as normal and diabetes. The means and variances are similar to results found in other populations. The minimum misclassification cutpoints decrease with age, are lower in urban areas and are higher in rural areas than the 200 mg dl-1 cutpoint recommended by the WHO. These differences are modest and our results generally support the WHO criterion. Our methods allow the direct inclusion of concomitant data whereas past analyses were based on partitioning the data.  相似文献   

19.
A faster alternative to the EM algorithm in finite mixture distributions is described, which alternates EM iterations with Gauss-Newton iterations using the observed information matrix. At the expense of modest additional analytical effort in obtaining the observed information, the hybrid algorithm reduces the computing time required and provides asymptotic standard errors at convergence. The algorithm is illustrated on the two-component normal mixture.  相似文献   

20.
In this paper, we consider the finite mixture of quantile regression model from a Bayesian perspective by assuming the errors have the asymmetric Laplace distribution (ALD), and develop the Gibbs sampling algorithm to estimate various quantile conditional on covariate in different groups using the Normal-Exponential representation of the ALD. We conduct several simulations under different error distributions to demonstrate the performance of the algorithm, and finally apply it to analyse a real data set, finding that the procedure has good performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号