首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Model-based clustering of Gaussian copulas for mixed data   总被引:1,自引:0,他引:1  
Clustering of mixed data is important yet challenging due to a shortage of conventional distributions for such data. In this article, we propose a mixture model of Gaussian copulas for clustering mixed data. Indeed copulas, and Gaussian copulas in particular, are powerful tools for easily modeling the distribution of multivariate variables. This model clusters data sets with continuous, integer, and ordinal variables (all having a cumulative distribution function) by considering the intra-component dependencies in a similar way to the Gaussian mixture. Indeed, each component of the Gaussian copula mixture produces a correlation coefficient for each pair of variables and its univariate margins follow standard distributions (Gaussian, Poisson, and ordered multinomial) depending on the nature of the variable (continuous, integer, or ordinal). As an interesting by-product, this model generalizes many well-known approaches and provides tools for visualization based on its parameters. The Bayesian inference is achieved with a Metropolis-within-Gibbs sampler. The numerical experiments, on simulated and real data, illustrate the benefits of the proposed model: flexible and meaningful parameterization combined with visualization features.  相似文献   

Summary.  Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models , where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality, which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.  相似文献   

Summary.  Gaussian Markov random-field (GMRF) models are frequently used in a wide variety of applications. In most cases parts of the GMRF are observed through mutually independent data; hence the full conditional of the GMRF, a hidden GMRF (HGMRF), is of interest. We are concerned with the case where the likelihood is non-Gaussian, leading to non-Gaussian HGMRF models. Several researchers have constructed block sampling Markov chain Monte Carlo schemes based on approximations of the HGMRF by a GMRF, using a second-order expansion of the log-density at or near the mode. This is possible as the GMRF approximation can be sampled exactly with a known normalizing constant. The Markov property of the GMRF approximation yields computational efficiency.The main contribution in the paper is to go beyond the GMRF approximation and to construct a class of non-Gaussian approximations which adapt automatically to the particular HGMRF that is under study. The accuracy can be tuned by intuitive parameters to nearly any precision. These non-Gaussian approximations share the same computational complexity as those which are based on GMRFs and can be sampled exactly with computable normalizing constants. We apply our approximations in spatial disease mapping and model-based geostatistical models with different likelihoods, obtain procedures for block updating and construct Metropolized independence samplers.  相似文献   

Following the extension from linear mixed models to additive mixed models, extension from generalized linear mixed models to generalized additive mixed models is made, Algorithms are developed to compute the MLE's of the nonlinear effects and the covariance structures based on the penalized marginal likelihood. Convergence of the algorithms and selection of the smooth param¬eters are discussed.  相似文献   

We consider the problem of learning a Gaussian variational approximation to the posterior distribution for a high-dimensional parameter, where we impose sparsity in the precision matrix to reflect appropriate conditional independence structure in the model. Incorporating sparsity in the precision matrix allows the Gaussian variational distribution to be both flexible and parsimonious, and the sparsity is achieved through parameterization in terms of the Cholesky factor. Efficient stochastic gradient methods that make appropriate use of gradient information for the target distribution are developed for the optimization. We consider alternative estimators of the stochastic gradients, which have lower variation and are more stable. Our approach is illustrated using generalized linear mixed models and state-space models for time series.  相似文献   

Estimation and prediction in generalized linear mixed models are often hampered by intractable high dimensional integrals. This paper provides a framework to solve this intractability, using asymptotic expansions when the number of random effects is large. To that end, we first derive a modified Laplace approximation when the number of random effects is increasing at a lower rate than the sample size. Second, we propose an approximate likelihood method based on the asymptotic expansion of the log-likelihood using the modified Laplace approximation which is maximized using a quasi-Newton algorithm. Finally, we define the second order plug-in predictive density based on a similar expansion to the plug-in predictive density and show that it is a normal density. Our simulations show that in comparison to other approximations, our method has better performance. Our methods are readily applied to non-Gaussian spatial data and as an example, the analysis of the rhizoctonia root rot data is presented.  相似文献   

This paper examines strategies for simulating exactly from large Gaussian linear models conditional on some Gaussian observations. Local computation strategies based on the conditional independence structure of the model are developed in order to reduce costs associated with storage and computation. Application of these algorithms to simulation from nested hierarchical linear models is considered, and the construction of efficient MCMC schemes for Bayesian inference in high-dimensional linear models is outlined.  相似文献   

Estimation in mixed linear models is, in general, computationally demanding, since applied problems may involve extensive data sets and large numbers of random effects. Existing computer algorithms are slow and/or require large amounts of memory. These problems are compounded in generalized linear mixed models for categorical data, since even approximate methods involve fitting of a linear mixed model within steps of an iteratively reweighted least squares algorithm. Only in models in which the random effects are hierarchically nested can the computations for fitting these models to large data sets be carried out rapidly. We describe a data augmentation approach to these computational difficulties in which we repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as 'offsets'. The submodels are chosen so that they have a nested random-effect structure, thus allowing maximum exploitation of the computational efficiency which is available in this case. Examples of the use of the algorithm for both metric and discrete responses are discussed, all calculations being carried out using macros within the MLwiN program.  相似文献   

Summary.  Functional magnetic resonance imaging has become a standard technology in human brain mapping. Analyses of the massive spatiotemporal functional magnetic resonance imaging data sets often focus on parametric or non-parametric modelling of the temporal component, whereas spatial smoothing is based on Gaussian kernels or random fields. A weakness of Gaussian spatial smoothing is underestimation of activation peaks or blurring of high curvature transitions between activated and non-activated regions of the brain. To improve spatial adaptivity, we introduce a class of inhomogeneous Markov random fields with stochastic interaction weights in a space-varying coefficient model. For given weights, the random field is conditionally Gaussian, but marginally it is non-Gaussian. Fully Bayesian inference, including estimation of weights and variance parameters, can be carried out through efficient Markov chain Monte Carlo simulation. Although motivated by the analysis of functional magnetic resonance imaging data, the methodological development is general and can also be used for spatial smoothing and regression analysis of areal data on irregular lattices. An application to stylized artificial data and to real functional magnetic resonance imaging data from a visual stimulation experiment demonstrates the performance of our approach in comparison with Gaussian and robustified non-Gaussian Markov random-field models.  相似文献   

Often, the response variables on sampling units are observed repeatedly over time. The sampling units may come from different populations, such as treatment groups. This setting is routinely modeled by a random coefficients growth curve model, and the techniques of general linear mixed models are applied to address the primary research aim. An alternative approach is to reduce each subject’s data to summary measures, such as within-subject averages or regression coefficients. One may then test for equality of means of the summary measures (or functions of them) among treatment groups. Here, we compare by simulation the performance characteristics of three approximate tests based on summary measures and one based on the full data, focusing mainly on accuracy of p-values. We find that performances of these procedures can be quite different for small samples in several different configurations of parameter values. The summary-measures approach performed at least as well as the full-data mixed models approach.  相似文献   

For a one-way mixed Gaussian ANOVA model we prove local asymptotic normality and local asymptotic minimaxity of maximum likelihood estimates (MLE) and of its certain iterative approximations. A geometric rate of convergence in probability is proved for these iterative estimates to MLE. Asymptotically optimal designs for large samples are studied.  相似文献   

Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach.  相似文献   

There is an emerging need to advance linear mixed model technology to include variable selection methods that can simultaneously choose and estimate important effects from a potentially large number of covariates. However, the complex nature of variable selection has made it difficult for it to be incorporated into mixed models. In this paper we extend the well known class of penalties and show that they can be integrated succinctly into a linear mixed model setting. Under mild conditions, the estimator obtained from this mixed model penalised likelihood is shown to be consistent and asymptotically normally distributed. A simulation study reveals that the extended family of penalties achieves varying degrees of estimator shrinkage depending on the value of one of its parameters. The simulation study also shows there is a link between the number of false positives detected and the number of true coefficients when using the same penalty. This new mixed model variable selection (MMVS) technology was applied to a complex wheat quality data set to determine significant quantitative trait loci (QTL).  相似文献   

Modelling of the relationship between concentration (PK) and response (PD) plays an important role in drug development. The modelling becomes complicated when the drug concentration and response measurements are not taken simultaneously and/or hysteresis occurs between the response and the concentration. A model‐based approach fits a joint pharmacokinetic (PK) and concentration–response (PK/PD) model, including an effect compartment if necessary, to concentration and response data. However, this approach relies on the PK data being well described by a common PK model. We propose an algorithm for a semi‐parametric approach to fitting nonlinear mixed PK/PD models including an effect compartment using linear interpolation and extrapolation for concentration data. This approach is independent of the PK model, and the algorithm can easily be implemented using SAS PROC NLMIXED. Practical issues in programming and computing are also discussed. The properties of this approach are examined using simulations. This approach is used to analyse data from a study of the PK/PD relationship between insulin and glucose levels. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

In this article we investigate the relationship between the EM algorithm and the Gibbs sampler. We show that the approximate rate of convergence of the Gibbs sampler by Gaussian approximation is equal to that of the corresponding EM-type algorithm. This helps in implementing either of the algorithms as improvement strategies for one algorithm can be directly transported to the other. In particular, by running the EM algorithm we know approximately how many iterations are needed for convergence of the Gibbs sampler. We also obtain a result that under certain conditions, the EM algorithm used for finding the maximum likelihood estimates can be slower to converge than the corresponding Gibbs sampler for Bayesian inference. We illustrate our results in a number of realistic examples all based on the generalized linear mixed models.  相似文献   

This paper considers the problem of estimating a nonlinear statistical model subject to stochastic linear constraints among unknown parameters. These constraints represent prior information which originates from a previous estimation of the same model using an alternative database. One feature of this specification allows for the disign matrix of stochastic linear restrictions to be estimated. The mixed regression technique and the maximum likelihood approach are used to derive the estimator for both the model coefficients and the unknown elements of this design matrix. The proposed estimator whose asymptotic properties are studied, contains as a special case the conventional mixed regression estimator based on a fixed design matrix. A new test of compatibility between prior and sample information is also introduced. Thesuggested estimator is tested empirically with both simulated and actual marketing data.  相似文献   

This paper considers the problem of estimating a nonlinear statistical model subject to stochastic linear constraints among unknown parameters. These constraints represent prior information which originates from a previous estimation of the same model using an alternative database. One feature of this specification allows for the disign matrix of stochastic linear restrictions to be estimated. The mixed regression technique and the maximum likelihood approach are used to derive the estimator for both the model coefficients and the unknown elements of this design matrix. The proposed estimator whose asymptotic properties are studied, contains as a special case the conventional mixed regression estimator based on a fixed design matrix. A new test of compatibility between prior and sample information is also introduced. Thesuggested estimator is tested empirically with both simulated and actual marketing data.  相似文献   

In this article, we present a strategy for producing low-dimensional projections that maximally separate the classes in Gaussian Mixture Model classification. The most revealing linear manifolds are those along which the classes are maximally separable. Here we consider a particular probability product kernel as a measure of similarity or affinity between the class-conditional distributions. It takes an appealing closed analytical form in the case of Gaussian mixture components. The performance of the proposed strategy has been evaluated on real data.  相似文献   

Abstract. Partially linear models are extensions of linear models to include a non-parametric function of some covariate. They have been found to be useful in both cross-sectional and longitudinal studies. This paper provides a convenient means to extend Cook's local influence analysis to the penalized Gaussian likelihood estimator that uses a smoothing spline as a solution to its non-parametric component. Insight is also provided into the interplay of the influence or leverage measures between the linear and the non-parametric components in the model. The diagnostics are applied to a mouthwash data set and a longitudinal hormone study with informative results.  相似文献   

在对广义线性模型与经典线性模型进行对比分析基础上,重点介绍了广义线性混合模型与估计方法及其在满意度调查数据中的模型设定与应用,并采用某调查机构在2011年1月至2012年3月期间对购买过某地区银行理财产品的客户进行的满意度调查数据进行实证分析。研究表明:相对于经典线性回归模型与广义线性模型,广义线性混合模型是分析满意度调查数据的有效方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号