首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 334 毫秒
1.
Latent variable models are widely used for jointly modeling of mixed data including nominal, ordinal, count and continuous data. In this paper, we consider a latent variable model for jointly modeling relationships between mixed binary, count and continuous variables with some observed covariates. We assume that, given a latent variable, mixed variables of interest are independent and count and continuous variables have Poisson distribution and normal distribution, respectively. As such data may be extracted from different subpopulations, consideration of an unobserved heterogeneity has to be taken into account. A mixture distribution is considered (for the distribution of the latent variable) which accounts the heterogeneity. The generalized EM algorithm which uses the Newton–Raphson algorithm inside the EM algorithm is used to compute the maximum likelihood estimates of parameters. The standard errors of the maximum likelihood estimates are computed by using the supplemented EM algorithm. Analysis of the primary biliary cirrhosis data is presented as an application of the proposed model.  相似文献   

2.
We propose a latent variable model for informative missingness in longitudinal studies which is an extension of latent dropout class model. In our model, the value of the latent variable is affected by the missingness pattern and it is also used as a covariate in modeling the longitudinal response. So the latent variable links the longitudinal response and the missingness process. In our model, the latent variable is continuous instead of categorical and we assume that it is from a normal distribution. The EM algorithm is used to obtain the estimates of the parameter we are interested in and Gauss–Hermite quadrature is used to approximate the integration of the latent variable. The standard errors of the parameter estimates can be obtained from the bootstrap method or from the inverse of the Fisher information matrix of the final marginal likelihood. Comparisons are made to the mixed model and complete-case analysis in terms of a clinical trial dataset, which is Weight Gain Prevention among Women (WGPW) study. We use the generalized Pearson residuals to assess the fit of the proposed latent variable model.  相似文献   

3.
In this article, a general approach to latent variable models based on an underlying generalized linear model (GLM) with factor analysis observation process is introduced. We call these models Generalized Linear Factor Models (GLFM). The observations are produced from a general model framework that involves observed and latent variables that are assumed to be distributed in the exponential family. More specifically, we concentrate on situations where the observed variables are both discretely measured (e.g., binomial, Poisson) and continuously distributed (e.g., gamma). The common latent factors are assumed to be independent with a standard multivariate normal distribution. Practical details of training such models with a new local expectation-maximization (EM) algorithm, which can be considered as a generalized EM-type algorithm, are also discussed. In conjunction with an approximated version of the Fisher score algorithm (FSA), we show how to calculate maximum likelihood estimates of the model parameters, and to yield inferences about the unobservable path of the common factors. The methodology is illustrated by an extensive Monte Carlo simulation study and the results show promising performance.  相似文献   

4.
The lasso is a popular technique of simultaneous estimation and variable selection in many research areas. The marginal posterior mode of the regression coefficients is equivalent to estimates given by the non-Bayesian lasso when the regression coefficients have independent Laplace priors. Because of its flexibility of statistical inferences, the Bayesian approach is attracting a growing body of research in recent years. Current approaches are primarily to either do a fully Bayesian analysis using Markov chain Monte Carlo (MCMC) algorithm or use Monte Carlo expectation maximization (MCEM) methods with an MCMC algorithm in each E-step. However, MCMC-based Bayesian method has much computational burden and slow convergence. Tan et al. [An efficient MCEM algorithm for fitting generalized linear mixed models for correlated binary data. J Stat Comput Simul. 2007;77:929–943] proposed a non-iterative sampling approach, the inverse Bayes formula (IBF) sampler, for computing posteriors of a hierarchical model in the structure of MCEM. Motivated by their paper, we develop this IBF sampler in the structure of MCEM to give the marginal posterior mode of the regression coefficients for the Bayesian lasso, by adjusting the weights of importance sampling, when the full conditional distribution is not explicit. Simulation experiments show that the computational time is much reduced with our method based on the expectation maximization algorithm and our algorithms and our methods behave comparably with other Bayesian lasso methods not only in prediction accuracy but also in variable selection accuracy and even better especially when the sample size is relatively large.  相似文献   

5.
In many areas of medical research, especially in studies that involve paired organs, a bivariate ordered categorical response should be analyzed. Using a bivariate continuous distribution as the latent variable is an interesting strategy for analyzing these data sets. In this context, the bivariate standard normal distribution, which leads to the bivariate cumulative probit regression model, is the most common choice. In this paper, we introduce another latent variable regression model for modeling bivariate ordered categorical responses. This model may be an appropriate alternative for the bivariate cumulative probit regression model, when postulating a symmetric form for marginal or joint distribution of response data does not appear to be a valid assumption. We also develop the necessary numerical procedure to obtain the maximum likelihood estimates of the model parameters. To illustrate the proposed model, we analyze data from an epidemiologic study to identify some of the most important risk indicators of periodontal disease among students 15-19 years in Tehran, Iran.  相似文献   

6.
A data-driven approach for modeling volatility dynamics and co-movements in financial markets is introduced. Special emphasis is given to multivariate conditionally heteroscedastic factor models in which the volatilities of the latent factors depend on their past values, and the parameters are driven by regime switching in a latent state variable. We propose an innovative indirect estimation method based on the generalized EM algorithm principle combined with a structured variational approach that can handle models with large cross-sectional dimensions. Extensive Monte Carlo simulations and preliminary experiments with financial data show promising results.  相似文献   

7.
This paper develops Bayesian inference of extreme value models with a flexible time-dependent latent structure. The generalized extreme value distribution is utilized to incorporate state variables that follow an autoregressive moving average (ARMA) process with Gumbel-distributed innovations. The time-dependent extreme value distribution is combined with heavy-tailed error terms. An efficient Markov chain Monte Carlo algorithm is proposed using a state-space representation with a finite mixture of normal distributions to approximate the Gumbel distribution. The methodology is illustrated by simulated data and two different sets of real data. Monthly minima of daily returns of stock price index, and monthly maxima of hourly electricity demand are fit to the proposed model and used for model comparison. Estimation results show the usefulness of the proposed model and methodology, and provide evidence that the latent autoregressive process and heavy-tailed errors play an important role to describe the monthly series of minimum stock returns and maximum electricity demand.  相似文献   

8.
Approximate Bayesian computation (ABC) methods permit approximate inference for intractable likelihoods when it is possible to simulate from the model. However, they perform poorly for high-dimensional data and in practice must usually be used in conjunction with dimension reduction methods, resulting in a loss of accuracy which is hard to quantify or control. We propose a new ABC method for high-dimensional data based on rare event methods which we refer to as RE-ABC. This uses a latent variable representation of the model. For a given parameter value, we estimate the probability of the rare event that the latent variables correspond to data roughly consistent with the observations. This is performed using sequential Monte Carlo and slice sampling to systematically search the space of latent variables. In contrast, standard ABC can be viewed as using a more naive Monte Carlo estimate. We use our rare event probability estimator as a likelihood estimate within the pseudo-marginal Metropolis–Hastings algorithm for parameter inference. We provide asymptotics showing that RE-ABC has a lower computational cost for high-dimensional data than standard ABC methods. We also illustrate our approach empirically, on a Gaussian distribution and an application in infectious disease modelling.  相似文献   

9.
Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspondence between the estimated latent classes and groups in the population of interest. The mixture of latent trait analyzers model extends latent class analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable; the discrete latent class accommodates group structure and the continuous latent trait accommodates dependence within these groups. Fitting the mixture of latent trait analyzers model is potentially difficult because the likelihood function involves an integral that cannot be evaluated analytically. We develop a variational approach for fitting the mixture of latent trait models and this provides an efficient model fitting strategy. The mixture of latent trait analyzers model is demonstrated on the analysis of data from the National Long Term Care Survey (NLTCS) and voting in the U.S. Congress. The model is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.  相似文献   

10.
Summary.  We propose a generic on-line (also sometimes called adaptive or recursive) version of the expectation–maximization (EM) algorithm applicable to latent variable models of independent observations. Compared with the algorithm of Titterington, this approach is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete-data distribution. The resulting algorithm is usually simpler and is shown to achieve convergence to the stationary points of the Kullback–Leibler divergence between the marginal distribution of the observation and the model distribution at the optimal rate, i.e. that of the maximum likelihood estimator. In addition, the approach proposed is also suitable for conditional (or regression) models, as illustrated in the case of the mixture of linear regressions model.  相似文献   

11.
《Econometric Reviews》2012,31(1):27-53
Abstract

Transformed diffusions (TDs) have become increasingly popular in financial modeling for their model flexibility and tractability. While existing TD models are predominately one-factor models, empirical evidence often prefers models with multiple factors. We propose a novel distribution-driven nonlinear multifactor TD model with latent components. Our model is a transformation of a underlying multivariate Ornstein–Uhlenbeck (MVOU) process, where the transformation function is endogenously specified by a flexible parametric stationary distribution of the observed variable. Computationally efficient exact likelihood inference can be implemented for our model using a modified Kalman filter algorithm and the transformed affine structure also allows us to price derivatives in semi-closed form. We compare the proposed multifactor model with existing TD models for modeling VIX and pricing VIX futures. Our results show that the proposed model outperforms all existing TD models both in the sample and out of the sample consistently across all categories and scenarios of our comparison.  相似文献   

12.
I consider the design of multistage sampling schemes for epidemiologic studies involving latent variable models, with surrogate measurements of the latent variables on a subset of subjects. Such models arise in various situations: when detailed exposure measurements are combined with variables that can be used to assign exposures to unmeasured subjects; when biomarkers are obtained to assess an unobserved pathophysiologic process; or when additional information is to be obtained on confounding or modifying variables. In such situations, it may be possible to stratify the subsample on data available for all subjects in the main study, such as outcomes, exposure predictors, or geographic locations. Three circumstances where analytic calculations of the optimal design are possible are considered: (i) when all variables are binary; (ii) when all are normally distributed; and (iii) when the latent variable and its measurement are normally distributed, but the outcome is binary. In each of these cases, it is often possible to considerably improve the cost efficiency of the design by appropriate selection of the sampling fractions. More complex situations arise when the data are spatially distributed: the spatial correlation can be exploited to improve exposure assignment for unmeasured locations using available measurements on neighboring locations; some approaches for informative selection of the measurement sample using location and/or exposure predictor data are considered.  相似文献   

13.
A correct detection of areas with excess of pollution relies first on accurate predictions of pollutant concentrations, a task that is usually complicated by skewed histograms and the presence of censored data. The unified skew-Gaussian (SUG) random field proposed by Zareifard and Jafari Khaledi [19] offers a more flexible class of sampling spatial models to account for skewness. In this paper, we adopt a Bayesian framework to perform prediction for the SUG model in the presence of censored data. Owing to the presence of many latent variables with strongly dependent components in the model, we encounter convergence issues when using Monte Carlo Markov Chain algorithms. To overcome this obstacle, we use a computationally efficient inverse Bayes formulas sampling procedure to obtain approximately independent samples from the posterior distribution of latent variables. Then they are applied to update parameters in a Gibbs sampler scheme. This hybrid algorithm provides effective samples, resulting in some computational advantages and precise predictions. The proposed approach is illustrated with a simulation study and applied to a spatial data set which contains right censored data.  相似文献   

14.
The marginal likelihood can be notoriously difficult to compute, and particularly so in high-dimensional problems. Chib and Jeliazkov employed the local reversibility of the Metropolis–Hastings algorithm to construct an estimator in models where full conditional densities are not available analytically. The estimator is free of distributional assumptions and is directly linked to the simulation algorithm. However, it generally requires a sequence of reduced Markov chain Monte Carlo runs which makes the method computationally demanding especially in cases when the parameter space is large. In this article, we study the implementation of this estimator on latent variable models which embed independence of the responses to the observables given the latent variables (conditional or local independence). This property is employed in the construction of a multi-block Metropolis-within-Gibbs algorithm that allows to compute the estimator in a single run, regardless of the dimensionality of the parameter space. The counterpart one-block algorithm is also considered here, by pointing out the difference between the two approaches. The paper closes with the illustration of the estimator in simulated and real-life data sets.  相似文献   

15.
The expectation maximization (EM) algorithm is a widely used parameter approach for estimating the parameters of multivariate multinomial mixtures in a latent class model. However, this approach has unsatisfactory computing efficiency. This study proposes a fuzzy clustering algorithm (FCA) based on both the maximum penalized likelihood (MPL) for the latent class model and the modified penalty fuzzy c-means (PFCM) for normal mixtures. Numerical examples confirm that the FCA-MPL algorithm is more efficient (that is, requires fewer iterations) and more computationally effective (measured by the approximate relative ratio of accurate classification) than the EM algorithm.  相似文献   

16.
Albert and Chib introduced a complete Bayesian method to analyze data arising from the generalized linear model in which they used the Gibbs sampling algorithm facilitated by latent variables. Recently, Cowles proposed an alternative algorithm to accelerate the convergence of the Albert-Chib algorithm. The novelty in this latter algorithm is achieved by using a Hastings algorithm to generate latent variables and bin boundary parameters jointly instead of individually from their respective full conditionals. In the same spirit, we reparameterize the cumulative-link generalized linear model to accelerate the convergence of Cowles’ algorithm even further. One important advantage of our method is that for the three-bin problem it does not require the Hastings algorithm. In addition, for problems with more than three bins, while the Hastings algorithm is required, we provide a proposal density based on the Dirichlet distribution which is more natural than the truncated normal density used in the competing algorithm. Also, using diagnostic procedures recommended in the literature for the Markov chain Monte Carlo algorithm (both single and multiple runs) we show that our algorithm is substantially better than the one recently obtained. Precisely, our algorithm provides faster convergence and smaller autocorrelations between the iterates. Using the probit link function, extensive results are obtained for the three-bin and the five-bin multinomial ordinal data problems.  相似文献   

17.
Probabilistic Principal Component Analysis   总被引:2,自引:0,他引:2  
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA.  相似文献   

18.
In biomedical and public health research, both repeated measures of biomarkers Y as well as times T to key clinical events are often collected for a subject. The scientific question is how the distribution of the responses [ T , Y | X ] changes with covariates X . [ T | X ] may be the focus of the estimation where Y can be used as a surrogate for T . Alternatively, T may be the time to drop-out in a study in which [ Y | X ] is the target for estimation. Also, the focus of a study might be on the effects of covariates X on both T and Y or on some underlying latent variable which is thought to be manifested in the observable outcomes. In this paper, we present a general model for the joint analysis of [ T , Y | X ] and apply the model to estimate [ T | X ] and other related functionals by using the relevant information in both T and Y . We adopt a latent variable formulation like that of Fawcett and Thomas and use it to estimate several quantities of clinical relevance to determine the efficacy of a treatment in a clinical trial setting. We use a Markov chain Monte Carlo algorithm to estimate the model's parameters. We illustrate the methodology with an analysis of data from a clinical trial comparing risperidone with a placebo for the treatment of schizophrenia.  相似文献   

19.
In this article, we utilize a scale mixture of Gaussian random field as a tool for modeling spatial ordered categorical data with non-Gaussian latent variables. In fact, we assume a categorical random field is created by truncating a Gaussian Log-Gaussian latent variable model to accommodate heavy tails. Since the traditional likelihood approach for the considered model involves high-dimensional integrations which are computationally intensive, the maximum likelihood estimates are obtained using a stochastic approximation expectation–maximization algorithm. For this purpose, Markov chain Monte Carlo methods are employed to draw from the posterior distribution of latent variables. A numerical example illustrates the methodology.  相似文献   

20.
It is commonly required to detect change points in sequences of random variables. In the most difficult setting of this problem, change detection must be performed sequentially with new observations being constantly received over time. Further, the parameters of both the pre- and post- change distributions may be unknown. In Hawkins and Zamba (Technometrics 47(2):164–173, 2005), the sequential generalised likelihood ratio test was introduced for detecting changes in this context, under the assumption that the observations follow a Gaussian distribution. However, we show that the asymptotic approximation used in their test statistic leads to it being conservative even when a large numbers of observations is available. We propose an improved procedure which is more efficient, in the sense of detecting changes faster, in all situations. We also show that similar issues arise in other parametric change detection contexts, which we illustrate by introducing a novel monitoring procedure for sequences of Exponentially distributed random variable, which is an important topic in time-to-failure modelling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号