期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Integrated nested Laplace approximation for the analysis of count data via the combined model: A simulation study

Thomas Neyens Christel Faes Geert Molenberghs 《统计学通讯:模拟与计算》2019,48(3):819-836

The combined model accounts for different forms of extra-variability and has traditionally been applied in the likelihood framework, or in the Bayesian setting via Markov chain Monte Carlo. In this article, integrated nested Laplace approximation is investigated as an alternative estimation method for the combined model for count data, and compared with the former estimation techniques. Longitudinal, spatial, and multi-hierarchical data scenarios are investigated in three case studies as well as a simulation study. As a conclusion, integrated nested Laplace approximation provides fast and precise estimation, while avoiding convergence problems often seen when using Markov chain Monte Carlo. 相似文献

2.

A Modular CDF Approach for the Approximation of Percentiles

Kingshuk Roy Choudhury Sabin Tabirca 《统计学通讯:模拟与计算》2013,42(10):1948-1965

This article describes a method for computing approximate statistics for large data sets, when exact computations may not be feasible. Such situations arise in applications such as climatology, data mining, and information retrieval (search engines). The key to our approach is a modular approximation to the cumulative distribution function (cdf) of the data. Approximate percentiles (as well as many other statistics) can be computed from this approximate cdf. This enables the reduction of a potentially overwhelming computational exercise into smaller, manageable modules. We illustrate the properties of this algorithm using a simulated data set. We also examine the approximation characteristics of the approximate percentiles, using a von Mises functional type approach. In particular, it is shown that the maximum error between the approximate cdf and the actual cdf of the data is never more than 1% (or any other preset level). We also show that under assumptions of underlying smoothness of the cdf, the approximation error is much lower in an expected sense. Finally, we derive bounds for the approximation error of the percentiles themselves. Simulation experiments show that these bounds can be quite tight in certain circumstances. 相似文献

3.

The accuracy of normal approximation in a heterogeneous panel data unit root test

Kristian Jönsson 《Statistical Papers》2008,49(3):565-579

Tests for unit roots in panel data have become very popular. Two attractive features of panel data unit root tests are the increased power compared to time-series tests, and the often well-behaved limiting distributions of the tests. In this paper we apply Monte Carlo simulations to investigate how well the normal approximation works for a heterogeneous panel data unit root test when there are only a few cross sections in the sample. We find that the normal approximation, which should be valid for large numbers of cross-sectional units, works well, at conventional significance levels, even when the number of cross sections is as small as two. This finding is valuable for the applied researcher since critical values will be easy to obtain and p-values will be readily available. 相似文献

4.

Bias due to Ignoring the Sample Design in Case–Control Studies

John M. Neuhaus 《Australian & New Zealand Journal of Statistics》2002,44(3):285-293

Case–control studies allow efficient estimation of the associations of covariates with a binary response in settings where the probability of a positive response is small. It is well known that covariate–response associations can be consistently estimated using a logistic model by acting as if the case–control (retrospective) data were prospective, and that this result does not hold for other binary regression models. However, in practice an investigator may be interested in fitting a non–logistic link binary regression model and this paper examines the magnitude of the bias resulting from ignoring the case–control sample design with such models. The paper presents an approximation to the magnitude of this bias in terms of the sampling rates of cases and controls, as well as simulation results that show that the bias can be substantial. 相似文献

5.

Automatic choice of driving values in Monte Carlo likelihood approximation via posterior simulations

Kuk Anthony Y. C. 《Statistics and Computing》2003,13(2):101-109

For models with random effects or missing data, the likelihood function is sometimes intractable analytically but amenable to Monte Carlo approximation. To get a good approximation, the parameter value that drives the simulations should be sufficiently close to the maximum likelihood estimate (MLE) which unfortunately is unknown. Introducing a working prior distribution, we express the likelihood function as a posterior expectation and approximate it using posterior simulations. If the sample size is large, the sample information is likely to outweigh the prior specification and the posterior simulations will be concentrated around the MLE automatically, leading to good approximation of the likelihood near the MLE. For smaller samples, we propose to use the current posterior as the next prior distribution to make the posterior simulations closer to the MLE and hence improve the likelihood approximation. By using the technique of data duplication, we can simulate from the sharpened posterior distribution without actually updating the prior distribution. The suggested method works well in several test cases. A more complex example involving censored spatial data is also discussed. 相似文献

6.

Building and Fitting Non‐Gaussian Latent Variable Models via the Moment‐Generating Function

TORE SELLAND KLEPPE HANS J. SKAUG 《Scandinavian Journal of Statistics》2008,35(4):664-676

Abstract. For certain classes of hierarchical models, it is easy to derive an expression for the joint moment‐generating function (MGF) of data, whereas the joint probability density has an intractable form which typically involves an integral. The most important example is the class of linear models with non‐Gaussian latent variables. Parameters in the model can be estimated by approximate maximum likelihood, using a saddlepoint‐type approximation to invert the MGF. We focus on modelling heavy‐tailed latent variables, and suggest a family of mixture distributions that behaves well under the saddlepoint approximation (SPA). It is shown that the well‐known normalization issue renders the ordinary SPA useless in the present context. As a solution we extend the non‐Gaussian leading term SPA to a multivariate setting, and introduce a general rule for choosing the leading term density. The approach is applied to mixed‐effects regression, time‐series models and stochastic networks and it is shown that the modified SPA is very accurate. 相似文献

7.

Robust estimation and model identification for longitudinal data varying-coefficient model

Shu Liu Heng Lian 《统计学通讯:理论与方法》2018,47(11):2701-2719

It is well known that M-estimation is a widely used method for robust statistical inference and the varying coefficient models have been widely applied in many scientific areas. In this paper, we consider M-estimation and model identification of bivariate varying coefficient models for longitudinal data. We make use of bivariate tensor-product B-splines as an approximation of the function and consider M-type regression splines by minimizing the objective convex function. Mean and median regressions are included in this class. Moreover, with a double smoothly clipped absolute deviation (SCAD) penalization, we study the problem of simultaneous structure identification and estimation. Under approximate conditions, we show that the proposed procedure possesses the oracle property in the sense that it is as efficient as the estimator when the true model is known prior to statistical analysis. Simulation studies are carried out to demonstrate the methodological power of the proposed methods with finite samples. The proposed methodology is illustrated with an analysis of a real data example. 相似文献

8.

Modified linear projection for large spatial datasets

Toshihiro Hirano 《统计学通讯:模拟与计算》2017,46(2):870-889

Recent developments in engineering techniques for spatial data collection such as geographic information systems have resulted in an increasing need for methods to analyze large spatial datasets. These sorts of datasets can be found in various fields of the natural and social sciences. However, model fitting and spatial prediction using these large spatial datasets are impractically time-consuming, because of the necessary matrix inversions. Various methods have been developed to deal with this problem, including a reduced rank approach and a sparse matrix approximation. In this article, we propose a modification to an existing reduced rank approach to capture both the large- and small-scale spatial variations effectively. We have used simulated examples and an empirical data analysis to demonstrate that our proposed approach consistently performs well when compared with other methods. In particular, the performance of our new method does not depend on the dependence properties of the spatial covariance functions. 相似文献

9.

Bayesian computation for Log-Gaussian Cox processes: a comparative analysis of methods 总被引：1，自引：0，他引：1

Ming Teng Farouk Nathoo 《Journal of Statistical Computation and Simulation》2017,87(11):2227-2252

The Log-Gaussian Cox process is a commonly used model for the analysis of spatial point pattern data. Fitting this model is difficult because of its doubly stochastic property, that is, it is a hierarchical combination of a Poisson process at the first level and a Gaussian process at the second level. Various methods have been proposed to estimate such a process, including traditional likelihood-based approaches as well as Bayesian methods. We focus here on Bayesian methods and several approaches that have been considered for model fitting within this framework, including Hamiltonian Monte Carlo, the Integrated nested Laplace approximation, and Variational Bayes. We consider these approaches and make comparisons with respect to statistical and computational efficiency. These comparisons are made through several simulation studies as well as through two applications, the first examining ecological data and the second involving neuroimaging data. 相似文献

10.

Variational Bayes for estimating the parameters of a hidden Potts model

C. A. McGrory D. M. Titterington R. Reeves A. N. Pettitt 《Statistics and Computing》2009,19(3):329-340

Hidden Markov random field models provide an appealing representation of images and other spatial problems. The drawback is that inference is not straightforward for these models as the normalisation constant for the likelihood is generally intractable except for very small observation sets. Variational methods are an emerging tool for Bayesian inference and they have already been successfully applied in other contexts. Focusing on the particular case of a hidden Potts model with Gaussian noise, we show how variational Bayesian methods can be applied to hidden Markov random field inference. To tackle the obstacle of the intractable normalising constant for the likelihood, we explore alternative estimation approaches for incorporation into the variational Bayes algorithm. We consider a pseudo-likelihood approach as well as the more recent reduced dependence approximation of the normalisation constant. To illustrate the effectiveness of these approaches we present empirical results from the analysis of simulated datasets. We also analyse a real dataset and compare results with those of previous analyses as well as those obtained from the recently developed auxiliary variable MCMC method and the recursive MCMC method. Our results show that the variational Bayesian analyses can be carried out much faster than the MCMC analyses and produce good estimates of model parameters. We also found that the reduced dependence approximation of the normalisation constant outperformed the pseudo-likelihood approximation in our analysis of real and synthetic datasets. 相似文献

11.

A useful approximation for planning block designs with potential missing data

G. D. Herrin J. B. Neuhardt 《统计学通讯:理论与方法》2013,42(8):755-766

Many empirical studies are planned with the prior knowledge that some of the data may be missed. This knowledge is seldom explicitly incorporated into the experiment design process for lack of a candid methodology. This paper proposes an index related to the expected determinant of the information matrix as a criterion for planning block designs. Due to the intractable nature of the expected determinantal criterion an analytic expression is presented only for a simple 2x2 layout. A first order Taylor series approximation function is suggested for larger layouts. Ranges over which this approximation is adequate are shown via Monte Carlo simulations. The robustness of information in the block design relative to the completely randomized design with missing data is discussed. 相似文献

12.

A modified two-stage approach for joint modelling of longitudinal and time-to-event data

Pham Thi Thu Huong Darfiana Nur Hoa Pham Alan Branford 《Journal of Statistical Computation and Simulation》2018,88(17):3379-3398

Joint models for longitudinal and time-to-event data have been applied in many different fields of statistics and clinical studies. However, the main difficulty these models have to face with is the computational problem. The requirement for numerical integration becomes severe when the dimension of random effects increases. In this paper, a modified two-stage approach has been proposed to estimate the parameters in joint models. In particular, in the first stage, the linear mixed-effects models and best linear unbiased predictorsare applied to estimate parameters in the longitudinal submodel. In the second stage, an approximation of the fully joint log-likelihood is proposed using the estimated the values of these parameters from the longitudinal submodel. Survival parameters are estimated bymaximizing the approximation of the fully joint log-likelihood. Simulation studies show that the approach performs well, especially when the dimension of random effects increases. Finally, we implement this approach on AIDS data. 相似文献

13.

On the Second‐Order Random Walk Model for Irregular Locations

FINN LINDGREN HÅVARD RUE 《Scandinavian Journal of Statistics》2008,35(4):691-700

Abstract. The second‐order random walk (RW2) model is commonly used for smoothing data and for modelling response functions. It is computationally efficient due to the Markov properties of the joint (intrinsic) Gaussian density. For evenly spaced locations the RW2 model is well established, whereas for irregularly spaced locations there is no well established construction in the literature. By considering the RW2 model as the solution of a stochastic differential equation (SDE), a discretely observed integrated Wiener process, it is possible to derive the density preserving the Markov properties by augmenting the state‐space with the velocities. Here, we derive a computationally more efficient RW2 model for irregular locations using a Galerkin approximation to the solution of the SDE without the need of augmenting the state‐space. Numerical comparison with the exact solution demonstrates that the error in the Galerkin approximation is small and negligible in applications. 相似文献

14.

A Class of Goodness-of-fit Tests Based on Transformation

Simos G. Meintanis V. Alba-fernÁndez 《统计学通讯:理论与方法》2014,43(8):1708-1735

There is an increasing number of goodness-of-fit tests whose test statistics measure deviations between the empirical characteristic function and an estimated characteristic function of the distribution in the null hypothesis. With the aim of overcoming certain computational difficulties with the calculation of some of these test statistics, a transformation of the data is considered. To apply such a transformation, the data are assumed to be continuous with arbitrary dimension, but we also provide a modification for discrete random vectors. Practical considerations leading to analytic formulas for the test statistics are studied, as well as theoretical properties such as the asymptotic null distribution, validity of the corresponding bootstrap approximation, and consistency of the test against fixed alternatives. Five applications are provided in order to illustrate the theory. These applications also include numerical comparison with other existing techniques for testing goodness-of-fit. 相似文献

15.

On a hybrid data cloning method and its application in generalized linear mixed models

Hossein Baghishani H?vard Rue Mohsen Mohammadzadeh 《Statistics and Computing》2012,22(2):597-613

The data cloning method is a new computational tool for computing maximum likelihood estimates in complex statistical models such as mixed models. This method is synthesized with integrated nested Laplace approximation to compute maximum likelihood estimates efficiently via a fast implementation in generalized linear mixed models. Asymptotic behavior of the hybrid data cloning method is discussed. The performance of the proposed method is illustrated through a simulation study and real examples. It is shown that the proposed method performs well and rightly justifies the theory. Supplemental materials for this article are available online. 相似文献

16.

Optimal Weight Functions for Marginal Proportional Hazards Analysis of Clustered Failure Time Data

Gray RJ Li Y 《Lifetime data analysis》2002,8(1):5-19

The choice of weights in estimating equations for multivariate survival data is considered. Specifically, we consider families of weight functions which are constant on fixed time intervals, including the special case of time-constant weights. For a fixed set of time intervals, the optimal weights are identified as the solution to a system of linear equations. The optimal weights are computed for several scenarios. It is found that for the scenarios examined, the gains in efficiency using the optimal weights are quite small relative to simpler approaches except under extreme dependence, and that a simple estimator of an exchangeable approximation to the weights also performs well. 相似文献

17.

Exact and Monte Carlo calculations of integrated likelihoods for the latent class model 总被引：1，自引：0，他引：1

C. Biernacki G. Celeux G. Govaert 《Journal of statistical planning and inference》2010

The latent class model or multivariate multinomial mixture is a powerful approach for clustering categorical data. It uses a conditional independence assumption given the latent class to which a statistical unit is belonging. In this paper, we exploit the fact that a fully Bayesian analysis with Jeffreys non-informative prior distributions does not involve technical difficulty to propose an exact expression of the integrated complete-data likelihood, which is known as being a meaningful model selection criterion in a clustering perspective. Similarly, a Monte Carlo approximation of the integrated observed-data likelihood can be obtained in two steps: an exact integration over the parameters is followed by an approximation of the sum over all possible partitions through an importance sampling strategy. Then, the exact and the approximate criteria experimentally compete, respectively, with their standard asymptotic BIC approximations for choosing the number of mixture components. Numerical experiments on simulated data and a biological example highlight that asymptotic criteria are usually dramatically more conservative than the non-asymptotic presented criteria, not only for moderate sample sizes as expected but also for quite large sample sizes. This research highlights that asymptotic standard criteria could often fail to select some interesting structures present in the data. 相似文献

18.

The efficient integration of abundance and demographic data

P. Besbeas J.-D. Lebreton B. J. T. Morgan 《Journal of the Royal Statistical Society. Series C, Applied statistics》2003,52(1):95-102

Summary. A drawback of a new method for integrating abundance and mark–recapture–recovery data is the need to combine likelihoods describing the different data sets. Often these likelihoods will be formed by using specialist computer programs, which is an obstacle to the joint analysis. This difficulty is easily circumvented by the use of a multivariate normal approximation. We show that it is only necessary to make the approximation for the parameters of interest in the joint analysis. The approximation is evaluated on data sets for two bird species and is shown to be efficient and accurate. 相似文献

19.

Approximating the Conway–Maxwell–Poisson distribution normalization constant

Steven B. Gillispie Christopher G. Green 《Statistics》2015,49(5):1062-1073

By adding a second parameter, Conway and Maxwell created a new distribution for situations where data deviate from the standard Poisson distribution. This new distribution contains a normalization constant expressed as an infinite sum whose summation has no known closed-form expression. Shmueli et al. produced an approximation for this sum but proved that it was valid only for integer values of the second parameter, although they conjectured it was also valid for non-integers. Here we prove their conjecture to be true and discuss for what range of parameters the approximation can be accurately applied. 相似文献

20.

Applying multivariate techniques to high-dimensional temporally correlated fMRI data

Daniela Adolf Sebastian Baecke 《Journal of statistical planning and inference》2011,141(12):3760-3770

In first-level analyses of functional magnetic resonance imaging data, adjustments for temporal correlation as a Satterthwaite approximation or a prewhitening method are usually implemented in the univariate model to keep the nominal test level. In doing so, the temporal correlation structure of the data is estimated, assuming an autoregressive process of order one.We show that this is applicable in multivariate approaches too - more precisely in the so-called stabilized multivariate test statistics. Furthermore, we propose a block-wise permutation method including a random shift that renders an approximation of the temporal correlation structure unnecessary but also approximately keeps the nominal test level in spite of the dependence of sample elements.Although the intentions are different, a comparison of the multivariate methods with the multiple ones shows that the global approach may achieve advantages if applied to suitable regions of interest. This is illustrated using an example from fMRI studies. 相似文献