首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.
Summary.  As biological knowledge accumulates rapidly, gene networks encoding genomewide gene–gene interactions have been constructed. As an improvement over the standard mixture model that tests all the genes identically and independently distributed a priori , Wei and co-workers have proposed modelling a gene network as a discrete or Gaussian Markov random field (MRF) in a mixture model to analyse genomic data. However, how these methods compare in practical applications is not well understood and this is the aim here. We also propose two novel constraints in prior specifications for the Gaussian MRF model and a fully Bayesian approach to the discrete MRF model. We assess the accuracy of estimating the false discovery rate by posterior probabilities in the context of MRF models. Applications to a chromatin immuno-precipitation–chip data set and simulated data show that the modified Gaussian MRF models have superior performance compared with other models, and both MRF-based mixture models, with reasonable robustness to misspecified gene networks, outperform the standard mixture model.  相似文献   

2.
We propose a mixture modelling framework for both identifying and exploring the nature of genotype-trait associations. This framework extends the classical mixed effects modelling approach for this setting by incorporating a Gaussian mixture distribution for random genotype effects. The primary advantages of this paradigm over existing approaches include that the mixture modelling framework addresses the degrees-of-freedom challenge that is inherent in application of the usual fixed effects analysis of covariance, relaxes the restrictive single normal distribution assumption of the classical mixed effects models and offers an exploratory framework for discovery of underlying structure across multiple genetic loci. An application to data arising from a study of antiretroviral-associated dyslipidaemia in human immunodeficiency virus infection is presented. Extensive simulations studies are also implemented to investigate the performance of this approach.  相似文献   

3.
Model-based clustering of Gaussian copulas for mixed data   总被引:1,自引:0,他引:1  
Clustering of mixed data is important yet challenging due to a shortage of conventional distributions for such data. In this article, we propose a mixture model of Gaussian copulas for clustering mixed data. Indeed copulas, and Gaussian copulas in particular, are powerful tools for easily modeling the distribution of multivariate variables. This model clusters data sets with continuous, integer, and ordinal variables (all having a cumulative distribution function) by considering the intra-component dependencies in a similar way to the Gaussian mixture. Indeed, each component of the Gaussian copula mixture produces a correlation coefficient for each pair of variables and its univariate margins follow standard distributions (Gaussian, Poisson, and ordered multinomial) depending on the nature of the variable (continuous, integer, or ordinal). As an interesting by-product, this model generalizes many well-known approaches and provides tools for visualization based on its parameters. The Bayesian inference is achieved with a Metropolis-within-Gibbs sampler. The numerical experiments, on simulated and real data, illustrate the benefits of the proposed model: flexible and meaningful parameterization combined with visualization features.  相似文献   

4.
In some situations, the distribution of the error terms of a multivariate linear regression model may depart from normality. This problem has been addressed, for example, by specifying a different parametric distribution family for the error terms, such as multivariate skewed and/or heavy-tailed distributions. A new solution is proposed, which is obtained by modelling the error term distribution through a finite mixture of multi-dimensional Gaussian components. The multivariate linear regression model is studied under this assumption. Identifiability conditions are proved and maximum likelihood estimation of the model parameters is performed using the EM algorithm. The number of mixture components is chosen through model selection criteria; when this number is equal to one, the proposal results in the classical approach. The performances of the proposed approach are evaluated through Monte Carlo experiments and compared to the ones of other approaches. In conclusion, the results obtained from the analysis of a real dataset are presented.  相似文献   

5.
We propose a new type of multivariate statistical model that permits non‐Gaussian distributions as well as the inclusion of conditional independence assumptions specified by a directed acyclic graph. These models feature a specific factorisation of the likelihood that is based on pair‐copula constructions and hence involves only univariate distributions and bivariate copulas, of which some may be conditional. We demonstrate maximum‐likelihood estimation of the parameters of such models and compare them to various competing models from the literature. A simulation study investigates the effects of model misspecification and highlights the need for non‐Gaussian conditional independence models. The proposed methods are finally applied to modeling financial return data. The Canadian Journal of Statistics 40: 86–109; 2012 © 2012 Statistical Society of Canada  相似文献   

6.
Model-based classification using latent Gaussian mixture models   总被引:1,自引:0,他引:1  
A novel model-based classification technique is introduced based on parsimonious Gaussian mixture models (PGMMs). PGMMs, which were introduced recently as a model-based clustering technique, arise from a generalization of the mixtures of factor analyzers model and are based on a latent Gaussian mixture model. In this paper, this mixture modelling structure is used for model-based classification and the particular area of application is food authenticity. Model-based classification is performed by jointly modelling data with known and unknown group memberships within a likelihood framework and then estimating parameters, including the unknown group memberships, within an alternating expectation-conditional maximization framework. Model selection is carried out using the Bayesian information criteria and the quality of the maximum a posteriori classifications is summarized using the misclassification rate and the adjusted Rand index. This new model-based classification technique gives excellent classification performance when applied to real food authenticity data on the chemical properties of olive oils from nine areas of Italy.  相似文献   

7.
In this paper, we address the problem of simulating from a data-generating process for which the observed data do not follow a regular probability distribution. One existing method for doing this is bootstrapping, but it is incapable of interpolating between observed data. For univariate or bivariate data, in which a mixture structure can easily be identified, we could instead simulate from a Gaussian mixture model. In general, though, we would have the problem of identifying and estimating the mixture model. Instead of these, we introduce a non-parametric method for simulating datasets like this: Kernel Carlo Simulation. Our algorithm begins by using kernel density estimation to build a target probability distribution. Then, an envelope function that is guaranteed to be higher than the target distribution is created. We then use simple accept–reject sampling. Our approach is more flexible than others, can simulate intelligently across gaps in the data, and requires no subjective modelling decisions. With several univariate and multivariate examples, we show that our method returns simulated datasets that, compared with the observed data, retain the covariance structures and have distributional characteristics that are remarkably similar.  相似文献   

8.
Random effects models have been playing a critical role for modelling longitudinal data. However, there are little studies on the kernel-based maximum likelihood method for semiparametric random effects models. In this paper, based on kernel and likelihood methods, we propose a pooled global maximum likelihood method for the partial linear random effects models. The pooled global maximum likelihood method employs the local approximations of the nonparametric function at a group of grid points simultaneously, instead of one point. Gaussian quadrature is used to approximate the integration of likelihood with respect to random effects. The asymptotic properties of the proposed estimators are rigorously studied. Simulation studies are conducted to demonstrate the performance of the proposed approach. We also apply the proposed method to analyse correlated medical costs in the Medical Expenditure Panel Survey data set.  相似文献   

9.
The majority of the existing literature on model-based clustering deals with symmetric components. In some cases, especially when dealing with skewed subpopulations, the estimate of the number of groups can be misleading; if symmetric components are assumed we need more than one component to describe an asymmetric group. Existing mixture models, based on multivariate normal distributions and multivariate t distributions, try to fit symmetric distributions, i.e. they fit symmetric clusters. In the present paper, we propose the use of finite mixtures of the normal inverse Gaussian distribution (and its multivariate extensions). Such finite mixture models start from a density that allows for skewness and fat tails, generalize the existing models, are tractable and have desirable properties. We examine both the univariate case, to gain insight, and the multivariate case, which is more useful in real applications. EM type algorithms are described for fitting the models. Real data examples are used to demonstrate the potential of the new model in comparison with existing ones.  相似文献   

10.
In this paper we introduce a new class of multivariate unimodal distributions, motivated by Khintchine's representation for unimodal densities on the real line. We start by introducing a new class of unimodal distributions which can then be naturally extended to higher dimensions, using the multivariate Gaussian copula. Under both univariate and multivariate settings, we provide MCMC algorithms to perform inference about the model parameters and predictive densities. The methodology is illustrated with univariate and bivariate examples, and with variables taken from a real data set.  相似文献   

11.
Although generalized linear mixed models are recognized to be of major practical importance, it is also known that they can be computationally demanding. The problem is the evaluation of the integral in calculating the marginalized likelihood. The straightforward method is based on the Gauss–Hermite technique, based on Gaussian quadrature points. Another approach is provided by the class of penalized quasi-likelihood methods. It is commonly believed that the Gauss–Hermite method works relatively well in simple situations but fails in more complicated structures. However, we present here a strikingly simple example of a logistic random-intercepts model in the context of a longitudinal clinical trial where the method gives valid results only for a high number of quadrature points ( Q ). As a consequence, this result warns the practitioner to examine routinely the dependence of the results on Q . The adaptive Gaussian quadrature, as implemented in the new SAS procedure NLMIXED, offered the solution to our problem. However, even the adaptive version of Gaussian quadrature needs careful handling to ensure convergence.  相似文献   

12.
Summary This paper presents a selective survey on panel data methods. The focus is on new developments. In particular, linear multilevel models, specific nonlinear, nonparametric and semiparametric models are at the center of the survey. In contrast to linear models there do not exist unified methods for nonlinear approaches. In this case conditional maximum likelihood methods dominate for fixed effects models. Under random effects assumptions it is sometimes possible to employ conventional maximum likelihood methods using Gaussian quadrature to reduce a T-dimensional integral. Alternatives are generalized methods of moments and simulated estimators. If the nonlinear function is not exactly known, nonparametric or semiparametric methods should be preferred. Helpful comments and suggestions from an unknown referee are gratefully acknowledged.  相似文献   

13.
Summary.  The literature on multivariate linear regression includes multivariate normal models, models that are used in survival analysis and a variety of models that are used in other areas such as econometrics. The paper considers the class of location–scale models, which includes a large proportion of the preceding models. It is shown that, for complete data, the maximum likelihood estimators for regression coefficients in a linear location–scale framework are consistent even when the joint distribution is misspecified. In addition, gains in efficiency arising from the use of a bivariate model, as opposed to separate univariate models, are studied. A major area of application for multivariate regression models is to clustered, 'parallel' lifetime data, so we also study the case of censored responses. Estimators of regression coefficients are no longer consistent under model misspecification, but we give simulation results that show that the bias is small in many practical situations. Gains in efficiency from bivariate models are also examined in the censored data setting. The methodology in the paper is illustrated by using lifetime data from the Diabetic Retinopathy Study.  相似文献   

14.
To model extreme spatial events, a general approach is to use the generalized extreme value (GEV) distribution with spatially varying parameters such as spatial GEV models and latent variable models. In the literature, this approach is mostly used to capture spatial dependence for only one type of event. This limits the applications to air pollutants data as different pollutants may chemically interact with each other. A recent advancement in spatial extremes modelling for multiple variables is the multivariate max-stable processes. Similarly to univariate max-stable processes, the multivariate version also assumes standard distributions such as unit-Fréchet as margins. Additional modelling is required for applications such as spatial prediction. In this paper, we extend the marginal methods such as spatial GEV models and latent variable models into a multivariate setting based on copulas so that it is capable of handling both the spatial dependence and the dependence among multiple pollutants. We apply our proposed model to analyse weekly maxima of nitrogen dioxide, sulphur dioxide, respirable suspended particles, fine suspended particles, and ozone collected in Pearl River Delta in China.  相似文献   

15.
By means of a fractional factorial simulation experiment, we compare the performance of penalised quasi-likelihood (PQL), non-adaptive Gaussian quadrature and adaptive Gaussian quadrature in estimating parameters for multilevel logistic regression models. The comparison is done in terms of bias, mean-squared error (MSE), numerical convergence and computational efficiency. It turns out that in terms of MSE, standard versions of the quadrature methods perform relatively poorly in comparison with PQL.  相似文献   

16.
Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach.  相似文献   

17.
In treating dynamic systems, sequential Monte Carlo methods use discrete samples to represent a complicated probability distribution and use rejection sampling, importance sampling and weighted resampling to complete the on-line 'filtering' task. We propose a special sequential Monte Carlo method, the mixture Kalman filter, which uses a random mixture of the Gaussian distributions to approximate a target distribution. It is designed for on-line estimation and prediction of conditional and partial conditional dynamic linear models, which are themselves a class of widely used non-linear systems and also serve to approximate many others. Compared with a few available filtering methods including Monte Carlo methods, the gain in efficiency that is provided by the mixture Kalman filter can be very substantial. Another contribution of the paper is the formulation of many non-linear systems into conditional or partial conditional linear form, to which the mixture Kalman filter can be applied. Examples in target tracking and digital communications are given to demonstrate the procedures proposed.  相似文献   

18.
Abstract

In this paper we introduce continuous tree mixture model that is the mixture of undirected graphical models with tree structured graphs and is considered as multivariate analysis with a non parametric approach. We estimate its parameters, the component edge sets and mixture proportions through regularized maximum likalihood procedure. Our new algorithm, which uses expectation maximization algorithm and the modified version of Kruskal algorithm, simultaneosly estimates and prunes the mixture component trees. Simulation studies indicate this method performs better than the alternative Gaussian graphical mixture model. The proposed method is also applied to water-level data set and is compared with the results of Gaussian mixture model.  相似文献   

19.
Motivated by problems of modelling torsional angles in molecules, Singh, Hnizdo & Demchuk (2002) proposed a bivariate circular model which is a natural torus analogue of the bivariate normal distribution and a natural extension of the univariate von Mises distribution to the bivariate case. The authors present here a multivariate extension of the bivariate model of Singh, Hnizdo & Demchuk (2002). They study the conditional distributions and investigate the shapes of marginal distributions for a special case. The methods of moments and pseudo‐likelihood are considered for the estimation of parameters of the new distribution. The authors investigate the efficiency of the pseudo‐likelihood approach in three dimensions. They illustrate their methods with protein data of conformational angles  相似文献   

20.
It is well known that there exist multiple roots of the likelihood equations for finite normal mixture models. Selecting a consistent root for finite normal mixture models has long been a challenging problem. Simply using the root with the largest likelihood will not work because of the spurious roots. In addition, the likelihood of normal mixture models with unequal variance is unbounded and thus its maximum likelihood estimate (MLE) is not well defined. In this paper, we propose a simple root selection method for univariate normal mixture models by incorporating the idea of goodness of fit test. Our new method inherits both the consistency properties of distance estimators and the efficiency of the MLE. The new method is simple to use and its computation can be easily done using existing R packages for mixture models. In addition, the proposed root selection method is very general and can be also applied to other univariate mixture models. We demonstrate the effectiveness of the proposed method and compare it with some other existing methods through simulation studies and a real data application.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号