共查询到20条相似文献,搜索用时 46 毫秒
1.
In this article we investigate the relationship between the EM algorithm and the Gibbs sampler. We show that the approximate rate of convergence of the Gibbs sampler by Gaussian approximation is equal to that of the corresponding EM-type algorithm. This helps in implementing either of the algorithms as improvement strategies for one algorithm can be directly transported to the other. In particular, by running the EM algorithm we know approximately how many iterations are needed for convergence of the Gibbs sampler. We also obtain a result that under certain conditions, the EM algorithm used for finding the maximum likelihood estimates can be slower to converge than the corresponding Gibbs sampler for Bayesian inference. We illustrate our results in a number of realistic examples all based on the generalized linear mixed models. 相似文献
2.
D. Clayton & J. Rasbash 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1999,162(3):425-436
Estimation in mixed linear models is, in general, computationally demanding, since applied problems may involve extensive data sets and large numbers of random effects. Existing computer algorithms are slow and/or require large amounts of memory. These problems are compounded in generalized linear mixed models for categorical data, since even approximate methods involve fitting of a linear mixed model within steps of an iteratively reweighted least squares algorithm. Only in models in which the random effects are hierarchically nested can the computations for fitting these models to large data sets be carried out rapidly. We describe a data augmentation approach to these computational difficulties in which we repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as 'offsets'. The submodels are chosen so that they have a nested random-effect structure, thus allowing maximum exploitation of the computational efficiency which is available in this case. Examples of the use of the algorithm for both metric and discrete responses are discussed, all calculations being carried out using macros within the MLwiN program. 相似文献
3.
Model-based clustering for social networks 总被引:5,自引:0,他引:5
Mark S. Handcock Adrian E. Raftery Jeremy M. Tantrum 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(2):301-354
Summary. Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model , under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean 'social space', and the actors' locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model. 相似文献
4.
Trevor C. Bailey Paul J. Hewson 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(3):501-517
Summary. Traffic safety in the UK is one of the increasing number of areas where central government sets targets based on 'outcome-focused' performance indicators (PIs). Judgments about such PIs are often based solely on rankings of raw indicators and simple league tables dominate centrally published analyses. There is a considerable statistical literature examining health and education issues which has tended to use the generalized linear mixed model (GLMM) to address variability in the data when drawing inferences about relative performance from headline PIs. This methodology could obviously be applied in contexts such as traffic safety. However, when such models are applied to the fairly crude data sets that are currently available, the interval estimates generated, e.g. in respect of rankings, are often too broad to allow much real differentiation between the traffic safety performance of the units that are being considered. Such results sit uncomfortably with the ethos of 'performance management' and raise the question of whether the inference from such data sets about relative performance can be improved in some way. Motivated by consideration of a set of nine road safety performance indicators measured on English local authorities in the year 2000, the paper considers methods to strengthen the weak inference that is obtained from GLMMs of individual indicators by simultaneous, multivariate modelling of a range of related indicators. The correlation structure between indicators is used to reduce the uncertainty that is associated with rankings of any one of the individual indicators. The results demonstrate that credible intervals can be substantially narrowed by the use of the multivariate GLMM approach and that multivariate modelling of multiple PIs may therefore have considerable potential for introducing more robust and realistic assessments of differential performance in some contexts. 相似文献
5.
H. Tak 《Journal of Statistical Computation and Simulation》2017,87(15):2929-2939
A uniform shrinkage prior (USP) distribution on the unknown variance component of a random-effects model is known to produce good frequency properties. The USP has a parameter that determines the shape of its density function, but it has been neglected whether the USP can maintain such good frequency properties regardless of the choice for the shape parameter. We investigate which choice for the shape parameter of the USP produces Bayesian interval estimates of random effects that meet their nominal confidence levels better than several existent choices in the literature. Using univariate and multivariate Gaussian hierarchical models, we show that the USP can achieve its best frequency properties when its shape parameter makes the USP behave similarly to an improper flat prior distribution on the unknown variance component. 相似文献
6.
Models for geostatistical data introduce spatial dependence in the covariance matrix of location-specific random effects. This is usually defined to be a parametric function of the distances between locations. Bayesian formulations of such models overcome asymptotic inference and estimation problems involved in maximum likelihood-based approaches and can be fitted using Markov chain Monte Carlo (MCMC) simulation. The MCMC implementation, however, requires repeated inversions of the covariance matrix which makes the problem computationally intensive, especially for large number of locations. In the present work, we propose to convert the spatial covariance matrix to a sparse matrix and compare a number of numerical algorithms especially suited within the MCMC framework in order to accelerate large matrix inversion. The algorithms are assessed empirically on simulated datasets of different size and sparsity. We conclude that the band solver applied after ordering the distance matrix reduces the computational time in inverting covariance matrices substantially. 相似文献
7.
A model based on the skew Gaussian distribution is presented to handle skewed spatial data. It extends the results of popular Gaussian process models. Markov chain Monte Carlo techniques are used to generate samples from the posterior distributions of the parameters. Finally, this model is applied in the spatial prediction of weekly rainfall. Cross-validation shows that the predictive performance of our model compares favorably with several kriging variants. 相似文献
8.
Mahmoud Torabi 《统计学通讯:模拟与计算》2015,44(7):1692-1701
Spatial modeling is widely used in environmental sciences, biology, and epidemiology. Generalized linear mixed models are employed to account for spatial variations of point-referenced data called spatial generalized linear mixed models (SGLMMs). Frequentist analysis of these type of data is computationally difficult. On the other hand, the advent of the Markov chain Monte Carlo algorithm has made the Bayesian analysis of SGLMM computationally convenient. Recent introduction of the method of data cloning, which leads to maximum likelihood estimate, has made frequentist analysis of mixed models also equally computationally convenient. Recently, the data cloning was employed to estimate model parameters in SGLMMs, however, the prediction of spatial random effects and kriging are also very important. In this article, we propose a frequentist approach based on data cloning to predict (and provide prediction intervals) spatial random effects and kriging. We illustrate this approach using a real dataset and also by a simulation study. 相似文献
9.
Brian S. Caffo Wolfgang Jank Galin L. Jones 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(2):235-251
Summary. The expectation–maximization (EM) algorithm is a popular tool for maximizing likelihood functions in the presence of missing data. Unfortunately, EM often requires the evaluation of analytically intractable and high dimensional integrals. The Monte Carlo EM (MCEM) algorithm is the natural extension of EM that employs Monte Carlo methods to estimate the relevant integrals. Typically, a very large Monte Carlo sample size is required to estimate these integrals within an acceptable tolerance when the algorithm is near convergence. Even if this sample size were known at the onset of implementation of MCEM, its use throughout all iterations is wasteful, especially when accurate starting values are not available. We propose a data-driven strategy for controlling Monte Carlo resources in MCEM. The algorithm proposed improves on similar existing methods by recovering EM's ascent (i.e. likelihood increasing) property with high probability, being more robust to the effect of user-defined inputs and handling classical Monte Carlo and Markov chain Monte Carlo methods within a common framework. Because of the first of these properties we refer to the algorithm as 'ascent-based MCEM'. We apply ascent-based MCEM to a variety of examples, including one where it is used to accelerate the convergence of deterministic EM dramatically. 相似文献
10.
Mark J. Palmer Grant B. Douglas 《Journal of the Royal Statistical Society. Series C, Applied statistics》2008,57(3):313-327
Summary. An important problem in the management of water supplies is identifying the sources of sediment. The paper develops a Bayesian approach, utilizing an end member model, to estimate the proportion of various sources of sediments in samples taken from a dam. This approach not only allows for the incorporation of prior knowledge about the geochemical compositions of the sources (or end members) but also allows for correlation between spatially contiguous samples and the prediction of the sediment's composition at unsampled locations. Sediments that were sampled from the North Pine Dam in south-east Queensland, Australia, are analysed to illustrate the approach. 相似文献
11.
Ulrich Menzefricke 《统计学通讯:模拟与计算》2013,42(4):1089-1108
We formulate a hierarchical version of the Gaussian Process model. In particular, we assume there to be data on several units randomly drawn from the same population. For each unit, several responses are available that arise from a Gaussian Process model. The parameters characterizing the Gaussian Process model for the units are modeled to arise from normal or gamma distributions. Results for two simulations are given that compare the performance of the hierarchical and non-hierarchical models. 相似文献
12.
《Scandinavian Journal of Statistics》2018,45(2):382-404
Conditional simulation of max‐stable processes allows for the analysis of spatial extremes taking into account additional information provided by the conditions. Instead of observations at given sites as usually done, we consider a single condition given by a more general functional of the process as may occur in the context of climate models. As the problem turns out to be intractable analytically, we make use of Markov chain Monte Carlo methods to sample from the conditional distribution. Simulation studies indicate fast convergence of the Markov chains involved. In an application to precipitation data, the utility of the procedure as a tool to downscale climate data is demonstrated. 相似文献
13.
George S. Fishman 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(3):623-641
Convergence rates, statistical efficiency and sampling costs are studied for the original and extended Swendsen–Wang methods of generating a sample path { S j , j ≥1} with equilibrium distribution π , with r distinct elements, on a finite state space X of size N 1 . Given S j -1 , each method uses auxiliary random variables to identify the subset of X from which S j is to be randomly sampled. Let πmin and πmax denote respectively the smallest and largest elements in π and let Nr denote the number of elements in π with value πmax . For a single auxiliary variable, uniform sampling from the subset and ( N 1 − Nr )πmin + Nr πmax ≈1, our results show rapid convergence and high statistical efficiency for large πmin /πmax or Nr / N 1 and slow convergence and poor statistical efficiency for small πmin /πmax and Nr / N1 . Other examples provide additional insight. For extended Swendsen–Wang methods with non-uniform subset sampling, the analysis identifies the properties of a decomposition of π( x ) that favour fast convergence and high statistical efficiency. In the absence of exploitable special structure, subset sampling can be costly regardless of which of these methods is employed. 相似文献
14.
Claudia Furlan 《Statistical Methods and Applications》2008,17(3):335-350
Prediction of possible cliff erosion at some future date is fundamental to coastal planning and shoreline management, for example to avoid development in vulnerable areas. Historically, to predict cliff recession rates deterministic methods were used. More recently, recession predictions have been expressed in probabilistic terms. However, to date, only simplistic models have been developed. We consider the cliff erosion along the Holderness Coast. Since 1951 a monitoring program has been started in 118 stations along the coast, providing an invaluable, but often missing, source of information. We build hierarchical random effect models, taking account of the known dynamics of the process and including the missing information. 相似文献
15.
Sujit K. Sahu Kanti V. Mardia 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(1):223-244
Summary. Short-term forecasts of air pollution levels in big cities are now reported in news-papers and other media outlets. Studies indicate that even short-term exposure to high levels of an air pollutant called atmospheric particulate matter can lead to long-term health effects. Data are typically observed at fixed monitoring stations throughout a study region of interest at different time points. Statistical spatiotemporal models are appropriate for modelling these data. We consider short-term forecasting of these spatiotemporal processes by using a Bayesian kriged Kalman filtering model. The spatial prediction surface of the model is built by using the well-known method of kriging for optimum spatial prediction and the temporal effects are analysed by using the models underlying the Kalman filtering method. The full Bayesian model is implemented by using Markov chain Monte Carlo techniques which enable us to obtain the optimal Bayesian forecasts in time and space. A new cross-validation method based on the Mahalanobis distance between the forecasts and observed data is also developed to assess the forecasting performance of the model implemented. 相似文献
16.
Reid D. Landes 《统计学通讯:模拟与计算》2013,42(7):1351-1364
17.
Bayesian inference for generalized additive mixed models based on Markov random field priors 总被引:9,自引:0,他引:9
Ludwig Fahrmeir & Stefan Lang 《Journal of the Royal Statistical Society. Series C, Applied statistics》2001,50(2):201-220
Most regression problems in practice require flexible semiparametric forms of the predictor for modelling the dependence of responses on covariates. Moreover, it is often necessary to add random effects accounting for overdispersion caused by unobserved heterogeneity or for correlation in longitudinal or spatial data. We present a unified approach for Bayesian inference via Markov chain Monte Carlo simulation in generalized additive and semiparametric mixed models. Different types of covariates, such as the usual covariates with fixed effects, metrical covariates with non-linear effects, unstructured random effects, trend and seasonal components in longitudinal data and spatial covariates, are all treated within the same general framework by assigning appropriate Markov random field priors with different forms and degrees of smoothness. We applied the approach in several case-studies and consulting cases, showing that the methods are also computationally feasible in problems with many covariates and large data sets. In this paper, we choose two typical applications. 相似文献
18.
Markov chain Monte Carlo (MCMC) is an important computational technique for generating samples from non-standard probability distributions. A major challenge in the design of practical MCMC samplers is to achieve efficient convergence and mixing properties. One way to accelerate convergence and mixing is to adapt the proposal distribution in light of previously sampled points, thus increasing the probability of acceptance. In this paper, we propose two new adaptive MCMC algorithms based on the Independent Metropolis–Hastings algorithm. In the first, we adjust the proposal to minimize an estimate of the cross-entropy between the target and proposal distributions, using the experience of pre-runs. This approach provides a general technique for deriving natural adaptive formulae. The second approach uses multiple parallel chains, and involves updating chains individually, then updating a proposal density by fitting a Bayesian model to the population. An important feature of this approach is that adapting the proposal does not change the limiting distributions of the chains. Consequently, the adaptive phase of the sampler can be continued indefinitely. We include results of numerical experiments indicating that the new algorithms compete well with traditional Metropolis–Hastings algorithms. We also demonstrate the method for a realistic problem arising in Comparative Genomics. 相似文献
19.
Binary probability maps using a hidden conditional autoregressive Gaussian process with an application to Finnish common toad data 总被引:3,自引:0,他引:3
I. S. Weir & A. N. Pettitt 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(4):473-484
The Finnish common toad data of Heikkinen and Hogmander are reanalysed using an alternative fully Bayesian model that does not require a pseudolikelihood approximation and an alternative prior distribution for the true presence or absence status of toads in each 10 km×10 km square. Markov chain Monte Carlo methods are used to obtain posterior probability estimates of the square-specific presences of the common toad and these are presented as a map. The results are different from those of Heikkinen and Hogmander and we offer an explanation in terms of the prior used for square-specific presence of the toads. We suggest that our approach is more faithful to the data and avoids unnecessary confounding of effects. We demonstrate how to extend our model efficiently with square-specific covariates and illustrate this by introducing deterministic spatial changes. 相似文献
20.
Jun Yan Mary Kathryn Cowles Shaowen Wang Marc P. Armstrong 《Statistics and Computing》2007,17(4):323-335
When MCMC methods for Bayesian spatiotemporal modeling are applied to large geostatistical problems, challenges arise as a
consequence of memory requirements, computing costs, and convergence monitoring. This article describes the parallelization
of a reparametrized and marginalized posterior sampling (RAMPS) algorithm, which is carefully designed to generate posterior
samples efficiently. The algorithm is implemented using the Parallel Linear Algebra Package (PLAPACK). The scalability of
the algorithm is investigated via simulation experiments that are implemented using a cluster with 25 processors. The usefulness
of the method is illustrated with an application to sulfur dioxide concentration data from the Air Quality System database
of the U.S. Environmental Protection Agency. 相似文献