首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
The majority of the existing literature on model-based clustering deals with symmetric components. In some cases, especially when dealing with skewed subpopulations, the estimate of the number of groups can be misleading; if symmetric components are assumed we need more than one component to describe an asymmetric group. Existing mixture models, based on multivariate normal distributions and multivariate t distributions, try to fit symmetric distributions, i.e. they fit symmetric clusters. In the present paper, we propose the use of finite mixtures of the normal inverse Gaussian distribution (and its multivariate extensions). Such finite mixture models start from a density that allows for skewness and fat tails, generalize the existing models, are tractable and have desirable properties. We examine both the univariate case, to gain insight, and the multivariate case, which is more useful in real applications. EM type algorithms are described for fitting the models. Real data examples are used to demonstrate the potential of the new model in comparison with existing ones.  相似文献   

2.
P. Economou 《Statistics》2013,47(2):453-464
Frailty models are often used to describe the extra heterogeneity in survival data by introducing an individual random, unobserved effect. The frailty term is usually assumed to act multiplicatively on a baseline hazard function common to all individuals. In order to apply the frailty model, a specific frailty distribution has to be assumed. If at least one of the latent variables is continuous, the frailty must follow a continuous distribution. In this paper, a finite mixture of continuous frailty distributions is used in order to describe situations in which one (or more) of the latent variables separates the population in study into two (or more) subpopulations. Closure properties of the unobserved quantity are given along with the maximum-likelihood estimates under the most common choices of frailty distributions. The model is illustrated on a set of lifetime data.  相似文献   

3.
We propose a semiparametric modeling approach for mixtures of symmetric distributions. The mixture model is built from a common symmetric density with different components arising through different location parameters. This structure ensures identifiability for mixture components, which is a key feature of the model as it allows applications to settings where primary interest is inference for the subpopulations comprising the mixture. We focus on the two-component mixture setting and develop a Bayesian model using parametric priors for the location parameters and for the mixture proportion, and a nonparametric prior probability model, based on Dirichlet process mixtures, for the random symmetric density. We present an approach to inference using Markov chain Monte Carlo posterior simulation. The performance of the model is studied with a simulation experiment and through analysis of a rainfall precipitation data set as well as with data on eruptions of the Old Faithful geyser.  相似文献   

4.
We developed a flexible non-parametric Bayesian model for regional disease-prevalence estimation based on cross-sectional data that are obtained from several subpopulations or clusters such as villages, cities, or herds. The subpopulation prevalences are modeled with a mixture distribution that allows for zero prevalence. The distribution of prevalences among diseased subpopulations is modeled as a mixture of finite Polya trees. Inferences can be obtained for (1) the proportion of diseased subpopulations in a region, (2) the distribution of regional prevalences, (3) the mean and median prevalence in the region, (4) the prevalence of any sampled subpopulation, and (5) predictive distributions of prevalences for regional subpopulations not included in the study, including the predictive probability of zero prevalence. We focus on prevalence estimation using data from a single diagnostic test, but we also briefly discuss the scenario where two conditionally dependent (or independent) diagnostic tests are used. Simulated data demonstrate the utility of our non-parametric model over parametric analysis. An example involving brucellosis in cattle is presented.  相似文献   

5.
In this work, we develop modeling and estimation approach for the analysis of cross-sectional clustered data with multimodal conditional distributions where the main interest is in analysis of subpopulations. It is proposed to model such data in a hierarchical model with conditional distributions viewed as finite mixtures of normal components. With a large number of observations in the lowest level clusters, a two-stage estimation approach is used. In the first stage, the normal mixture parameters in each lowest level cluster are estimated using robust methods. Robust alternatives to the maximum likelihood estimation are used to provide stable results even for data with conditional distributions such that their components may not quite meet normality assumptions. Then the lowest level cluster-specific means and standard deviations are modeled in a mixed effects model in the second stage. A small simulation study was conducted to compare performance of finite normal mixture population parameter estimates based on robust and maximum likelihood estimation in stage 1. The proposed modeling approach is illustrated through the analysis of mice tendon fibril diameters data. Analyses results address genotype differences between corresponding components in the mixtures and demonstrate advantages of robust estimation in stage 1.  相似文献   

6.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

7.
With rapid improvements in medical treatment and health care, many datasets dealing with time to relapse or death now reveal a substantial portion of patients who are cured (i.e., who never experience the event). Extended survival models called cure rate models account for the probability of a subject being cured and can be broadly classified into the classical mixture models of Berkson and Gage (BG type) or the stochastic tumor models pioneered by Yakovlev and extended to a hierarchical framework by Chen, Ibrahim, and Sinha (YCIS type). Recent developments in Bayesian hierarchical cure models have evoked significant interest regarding relationships and preferences between these two classes of models. Our present work proposes a unifying class of cure rate models that facilitates flexible hierarchical model-building while including both existing cure model classes as special cases. This unifying class enables robust modeling by accounting for uncertainty in underlying mechanisms leading to cure. Issues such as regressing on the cure fraction and propriety of the associated posterior distributions under different modeling assumptions are also discussed. Finally, we offer a simulation study and also illustrate with two datasets (on melanoma and breast cancer) that reveal our framework's ability to distinguish among underlying mechanisms that lead to relapse and cure.  相似文献   

8.
Some properties of the discrete mixture failure rates are studied. Specifically, similar to the continuous case, it is shown that the population mixture failure rate is always smaller than the unconditional expectation in the family of subpopulations failure rates. The analog of the multiplicative and the additive frailty models is introduced via the corresponding survival function. Another approach via the alternative discrete failure rate is also discussed. Stochastic comparisons for two mixed distributions with equal and different mixing distributions are studied.  相似文献   

9.
Consider a subject entered on a clinicaltrial in which the major endpoint is a time metric such as deathor time to reach a well defined event. During the observationalperiod the subject may experience an intermediate clinical event.The intermediate clinical event may induce a change in the survivaldistribution. We consider models for the one and two sample problem.The model for the one sample problem enables one to test if theoccurrence of the intermediate event changed the survival distribution.This models provides a way of carrying out non-randomized clinicaltrial to determine if a therapy has benefit. The two sample problemconsiders testing if the probability distributions, with andwithout an intermediate event, are the same. Statistical testsare derived using a semi-Markov or a time dependent mixture model.Simulation studies are carried out to compare these new procedureswith the log rank, stratified log rank and landmark tests. Thenew tests appear to have uniformly greater power than these competitortests. The methods are applied to a randomized clinical trialcarried out by the Aids Clinical Trial Group (ACTG) which comparedlow versus high doses of zidovudine (AZT).  相似文献   

10.
Summary.  Suppose that we have m repeated measures on each subject, and we model the observation vectors with a finite mixture model.  We further assume that the repeated measures are conditionally independent. We present methods to estimate the shape of the component distributions along with various features of the component distributions such as the medians, means and variances. We make no distributional assumptions on the components; indeed, we allow different shapes for different components.  相似文献   

11.
Robust mixture modelling using the t distribution   总被引:2,自引:0,他引:2  
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.  相似文献   

12.
Recently, mixture distribution becomes more and more popular in many scientific fields. Statistical computation and analysis of mixture models, however, are extremely complex due to the large number of parameters involved. Both EM algorithms for likelihood inference and MCMC procedures for Bayesian analysis have various difficulties in dealing with mixtures with unknown number of components. In this paper, we propose a direct sampling approach to the computation of Bayesian finite mixture models with varying number of components. This approach requires only the knowledge of the density function up to a multiplicative constant. It is easy to implement, numerically efficient and very practical in real applications. A simulation study shows that it performs quite satisfactorily on relatively high dimensional distributions. A well-known genetic data set is used to demonstrate the simplicity of this method and its power for the computation of high dimensional Bayesian mixture models.  相似文献   

13.
Two classes of semiparametric and nonparametric mixture models are defined to represent general kinds of prior information. For these models the nonparametric maximum likelihood estimator (NPMLE) of an unknown probability distribution is derived and is shown to be consistent and relative efficient. Linear functionals are used for the estimation of parameters. Their consistency is proved, the gain of efficiency is derived and asymptotical distributions are given.  相似文献   

14.
Parametric mixed-effects logistic models can provide effective analysis of binary matched-pairs data. Responses are assumed to follow a logistic model within pairs, with an intercept which varies across pairs according to a specified family of probability distributions G. In this paper we give necessary and sufficient conditions for consistent covariate effect estimation and present a geometric view of estimation which shows that when the assumed family of mixture distributions is rich enough, estimates of the effect of the binary covariate are typically consistent. The geometric view also shows that under the conditions for consistent estimation, the mixed-model estimator is identical to the familar conditional-likelihood estimator for matched pairs. We illustrate the findings with some examples.  相似文献   

15.
16.
The distribution of the estimated mean of the nonstandard mixture of distributions that has a discrete probability mass at zero and a gamma distribution for positive values is derived. Furthermore, for the studied nonstandard mixture of distributions, the distribution of the standardized statistic (estimator - true mean)/standard deviation of estimator is derived. The results are used to study the accuracy of the confidence interval for the mean based on a large sample approximation. Quantiles for the standardized statistic are also calculated.  相似文献   

17.
An evaluation of FBST, Fully Bayesian Significance Test, restricted to survival models is the main objective of the present paper. A Survival distribution should be chosen among the tree celebrated ones, lognormal, gamma, and Weibull. For this discrimination, a linear mixture of the three distributions is an important tool: the FBST is used to test the hypotheses defined on the mixture weights space. Another feature of the paper is that all three distributions are reparametrized in that all the six parameters are written as functions of the mean and the variance of the population been studied. Some numerical results from simulations with some right-censored data are considered.  相似文献   

18.
Abstract

We study alternative models for capturing abrupt structural changes (level shifts) in a times series. The problem is confounded by the presence of transient outliers. We compare the performance of non-Gaussian time-varying parameter models and multiprocess mixture models within a Monte Carlo experimental setup. Our findings suggest that once we incorporate shocks with thick-tailed probability distributions, the superiority of the multiprocess mixture models over the time-varying parameter models, reported in an earlier study, disappears. The behavior of the two models, both in adapting to level shifts and in reacting to transient outliers, is very similar.  相似文献   

19.
Bone mineral density decreases naturally as we age because existing bone tissue is reabsorbed by the body faster than new bone tissue is synthesized. When this occurs, bones lose calcium and other minerals. What is normal bone mineral density for men 50 years and older? Suitable diagnostic cutoff values for men are less well defined than for women. In this paper, we propose using normal mixture models to estimate the prevalence of low-lumbar spine bone mineral density in men 50 years and older with or at risk for human immunodeficiency virus infection when normal values of bone mineral density are not generally known. The Box–Cox power transformation is used to determine which transformation best suits normal mixture distributions. Parametric bootstrap tests are used to determine the number of mixture components and to determine whether the mixture components are homoscedastic or heteroscedastic.  相似文献   

20.
The negative effects of age on the life length of a device or positive ageing are commonly used criteria for classifying life distributions. In this paper two notions of positive ageing are considered. These are the new better (worse) than used in average conditional survival probability and harmonic new better (worse) than used in upper tail. Closure of these notions under mixture and convolution are studied. The survivals of a device subject to discrete shocks of these notions of ageing which occur according to homogeneous Poisson processes are studied. A cumulative damage model is considered. Two test statistics are proposed for the two notions of ageing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号