期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel inference for massive distributed spatial data using low-rank models

Matthias Katzfuss Dorit Hammerling 《Statistics and Computing》2017,27(2):363-375

Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to achieve faster (parallel) computation through a divide-and-conquer scheme. In both cases, the challenge is to obtain valid inference that does not require processing all data at a single central computing node. We show that for a very widely used class of spatial low-rank models, which can be written as a linear combination of spatial basis functions plus a fine-scale-variation component, parallel spatial inference and prediction for massive distributed data can be carried out exactly, meaning that the results are the same as for a traditional, non-distributed analysis. The communication cost of our distributed algorithms does not depend on the number of data points. After extending our results to the spatio-temporal case, we illustrate our methodology by carrying out distributed spatio-temporal particle filtering inference on total precipitable water measured by three different satellite sensor systems. 相似文献

2.

Sequential imputation for models with latent variables assuming latent ignorability

Lauren J. Beesley Jeremy M. G. Taylor Roderick J. A. Little 《Australian & New Zealand Journal of Statistics》2019,61(2):213-233

Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model. 相似文献

3.

Statistical inference in dynamic panel data models

Tze Leung Lai Dylan S. Small Jia Liu 《Journal of statistical planning and inference》2008

Anderson and his collaborators have made seminal contributions to inference with instrumental variables and to dynamic panel data models. We review these contributions and the extensive economic and statistical literature that these contributions spawned. We describe our recent work in these two areas, presenting new approaches to (a) making valid inferences in the presence of weak instruments and (b) instrument and model selection for dynamic panel data models. Both approaches use empirical likelihood and resampling. For inference in the presence of weak instruments, our approach uses model averaging to achieve asymptotic efficiency with strong instruments but maintain valid inferences with weak instruments. For instrument and model selection, our approach aims at choosing valid instruments that are strong enough to be useful. 相似文献

4.

On analysis of binary response data in longitudinal factorial studies

Chunpeng Fan 《Journal of Statistical Computation and Simulation》2017,87(1):100-122

Binary data are commonly used as responses to assess the effects of independent variables in longitudinal factorial studies. Such effects can be assessed in terms of the rate difference (RD), the odds ratio (OR), or the rate ratio (RR). Traditionally, the logistic regression seems always a recommended method with statistical comparisons made in terms of the OR. Statistical inference in terms of the RD and RR can then be derived using the delta method. However, this approach is hard to realize when repeated measures occur. To obtain statistical inference in longitudinal factorial studies, the current article shows that the mixed-effects model for repeated measures, the logistic regression for repeated measures, the log-transformed regression for repeated measures, and the rank-based methods are all valid methods that lead to inference in terms of the RD, OR, and RR, respectively. Asymptotic linear relationships between the estimators of the regression coefficients of these models are derived when the weight (working covariance) matrix is an identity matrix. Conditions for the Wald-type tests to be asymptotically equivalent in these models are provided and powers were compared using simulation studies. A phase III clinical trial is used to illustrate the investigated methods with corresponding SAS® code supplied. 相似文献

5.

From distance sampling to spatial capture–recapture

David L. Borchers Tiago A. Marques 《AStA Advances in Statistical Analysis》2017,101(4):475-494

Distance sampling and capture–recapture are the two most widely used wildlife abundance estimation methods. capture–recapture methods have only recently incorporated models for spatial distribution and there is an increasing tendency for distance sampling methods to incorporated spatial models rather than to rely on partly design-based spatial inference. In this overview we show how spatial models are central to modern distance sampling and that spatial capture–recapture models arise as an extension of distance sampling methods. Depending on the type of data recorded, they can be viewed as particular kinds of hierarchical binary regression, Poisson regression, survival or time-to-event models, with individuals’ locations as latent variables and a spatial model as the latent variable distribution. Incorporation of spatial models in these two methods provides new opportunities for drawing explicitly spatial inferences. Areas of likely future development include more sophisticated spatial and spatio-temporal modelling of individuals’ locations and movements, new methods for integrating spatial capture–recapture and other kinds of ecological survey data, and methods for dealing with the recapture uncertainty that often arise when “capture” consists of detection by a remote device like a camera trap or microphone. 相似文献

6.

A rare event approach to high-dimensional approximate Bayesian computation

Dennis Prangle Richard G. Everitt Theodore Kypraios 《Statistics and Computing》2018,28(4):819-834

Approximate Bayesian computation (ABC) methods permit approximate inference for intractable likelihoods when it is possible to simulate from the model. However, they perform poorly for high-dimensional data and in practice must usually be used in conjunction with dimension reduction methods, resulting in a loss of accuracy which is hard to quantify or control. We propose a new ABC method for high-dimensional data based on rare event methods which we refer to as RE-ABC. This uses a latent variable representation of the model. For a given parameter value, we estimate the probability of the rare event that the latent variables correspond to data roughly consistent with the observations. This is performed using sequential Monte Carlo and slice sampling to systematically search the space of latent variables. In contrast, standard ABC can be viewed as using a more naive Monte Carlo estimate. We use our rare event probability estimator as a likelihood estimate within the pseudo-marginal Metropolis–Hastings algorithm for parameter inference. We provide asymptotics showing that RE-ABC has a lower computational cost for high-dimensional data than standard ABC methods. We also illustrate our approach empirically, on a Gaussian distribution and an application in infectious disease modelling. 相似文献

7.

Augmentation schemes for particle MCMC

Paul?Fearnhead Email author Loukia?Meligkotsidou 《Statistics and Computing》2016,26(6):1293-1306

Particle MCMC involves using a particle filter within an MCMC algorithm. For inference of a model which involves an unobserved stochastic process, the standard implementation uses the particle filter to propose new values for the stochastic process, and MCMC moves to propose new values for the parameters. We show how particle MCMC can be generalised beyond this. Our key idea is to introduce new latent variables. We then use the MCMC moves to update the latent variables, and the particle filter to propose new values for the parameters and stochastic process given the latent variables. A generic way of defining these latent variables is to model them as pseudo-observations of the parameters or of the stochastic process. By choosing the amount of information these latent variables have about the parameters and the stochastic process we can often improve the mixing of the particle MCMC algorithm by trading off the Monte Carlo error of the particle filter and the mixing of the MCMC moves. We show that using pseudo-observations within particle MCMC can improve its efficiency in certain scenarios: dealing with initialisation problems of the particle filter; speeding up the mixing of particle Gibbs when there is strong dependence between the parameters and the stochastic process; and enabling further MCMC steps to be used within the particle filter. 相似文献

8.

Trend analysis with response incompatible formats and measurement error

J. Kowalski X. M. Tu 《Journal of applied statistics》2003,30(7):751-770

The increasing popularity of longitudinal studies, along with the rapid advances in science and technology, has created a potential incompatibility between data formats, which leads to an inference problem when applying conventional statistical methods. This inference problem is further compounded by measurement error, since incompatible data format often arise in the context of measuring latent constructs. Without a systematic study of the impact of scale differences, ad-hoc approaches generally lead to inconsistent estimates and thus, invalid statistical inferences. In this paper, we examine the asymptotic properties and identify conditions that guarantee consistent estimation within the context of a trend analysis with response incompatible formats and measurement error. For model estimation, we introduce two competing methods that use a generalized estimating equation approach to provide inferences for the parameters of interest, and highlight the relative strengths of each method. The approach is illustrated by data obtained from a multi-centre AIDS cohort study (MACS), where a trend analysis of an immunologic marker of HIV infection is of interest. 相似文献

9.

A flexible particle Markov chain Monte Carlo method

Eduardo F. Mendes Christopher K. Carter David Gunawan Robert Kohn 《Statistics and Computing》2020,30(4):783-798

Particle Markov Chain Monte Carlo methods are used to carry out inference in nonlinear and non-Gaussian state space models, where the posterior density of the states is approximated using particles. Current approaches usually perform Bayesian inference using either a particle marginal Metropolis–Hastings (PMMH) algorithm or a particle Gibbs (PG) sampler. This paper shows how the two ways of generating variables mentioned above can be combined in a flexible manner to give sampling schemes that converge to a desired target distribution. The advantage of our approach is that the sampling scheme can be tailored to obtain good results for different applications. For example, when some parameters and the states are highly correlated, such parameters can be generated using PMMH, while all other parameters are generated using PG because it is easier to obtain good proposals for the parameters within the PG framework. We derive some convergence properties of our sampling scheme and also investigate its performance empirically by applying it to univariate and multivariate stochastic volatility models and comparing it to other PMCMC methods proposed in the literature. 相似文献

10.

OLS and IV estimation of regression models including endogenous interaction terms

Maurice J. G. Bun Teresa D. Harrison 《Econometric Reviews》2019,38(7):814-827

We analyze a class of linear regression models including interactions of endogenous regressors and exogenous covariates. We show how to generate instrumental variables using the nonlinear functional form of the structural equation when traditional excluded instruments are unknown. We propose to use these instruments with identification robust IV inference. We furthermore show that, whenever functional form identification is not valid, the ordinary least squares (OLS) estimator of the coefficient of the interaction term is consistent and standard OLS inference applies. Using our alternative empirical methods we confirm recent empirical findings on the nonlinear causal relation between financial development and economic growth. 相似文献

11.

Beyond support in two-stage variable selection

Jean-Michel Bécu Yves Grandvalet Christophe Ambroise Cyril Dalmasso 《Statistics and Computing》2017,27(1):169-179

Numerous variable selection methods rely on a two-stage procedure, where a sparsity-inducing penalty is used in the first stage to predict the support, which is then conveyed to the second stage for estimation or inference purposes. In this framework, the first stage screens variables to find a set of possibly relevant variables and the second stage operates on this set of candidate variables, to improve estimation accuracy or to assess the uncertainty associated to the selection of variables. We advocate that more information can be conveyed from the first stage to the second one: we use the magnitude of the coefficients estimated in the first stage to define an adaptive penalty that is applied at the second stage. We give the example of an inference procedure that highly benefits from the proposed transfer of information. The procedure is precisely analyzed in a simple setting, and our large-scale experiments empirically demonstrate that actual benefits can be expected in much more general situations, with sensitivity gains ranging from 50 to 100 % compared to state-of-the-art. 相似文献

12.

Sequential Bayesian inference in hidden Markov stochastic kinetic models with application to detection and response to seasonal epidemics 总被引：1，自引：0，他引：1

Junjing Lin Michael Ludkovski 《Statistics and Computing》2014,24(6):1047-1062

We study sequential Bayesian inference in stochastic kinetic models with latent factors. Assuming continuous observation of all the reactions, our focus is on joint inference of the unknown reaction rates and the dynamic latent states, modeled as a hidden Markov factor. Using insights from nonlinear filtering of continuous-time jump Markov processes we develop a novel sequential Monte Carlo algorithm for this purpose. Our approach applies the ideas of particle learning to minimize particle degeneracy and exploit the analytical jump Markov structure. A motivating application of our methods is modeling of seasonal infectious disease outbreaks represented through a compartmental epidemic model. We demonstrate inference in such models with several numerical illustrations and also discuss predictive analysis of epidemic countermeasures using sequential Bayes estimates. 相似文献

13.

CAUSAL GRAPHICAL MODELS IN SYSTEMS GENETICS: A UNIFIED FRAMEWORK FOR JOINT INFERENCE OF CAUSAL NETWORK AND GENETIC ARCHITECTURE FOR CORRELATED PHENOTYPES

Neto EC Keller MP Attie AD Yandell BS 《The annals of applied statistics》2010,4(1):320-339

Causal inference approaches in systems genetics exploit quantitative trait loci (QTL) genotypes to infer causal relationships among phenotypes. The genetic architecture of each phenotype may be complex, and poorly estimated genetic architectures may compromise the inference of causal relationships among phenotypes. Existing methods assume QTLs are known or inferred without regard to the phenotype network structure. In this paper we develop a QTL-driven phenotype network method (QTLnet) to jointly infer a causal phenotype network and associated genetic architecture for sets of correlated phenotypes. Randomization of alleles during meiosis and the unidirectional influence of genotype on phenotype allow the inference of QTLs causal to phenotypes. Causal relationships among phenotypes can be inferred using these QTL nodes, enabling us to distinguish among phenotype networks that would otherwise be distribution equivalent. We jointly model phenotypes and QTLs using homogeneous conditional Gaussian regression models, and we derive a graphical criterion for distribution equivalence. We validate the QTLnet approach in a simulation study. Finally, we illustrate with simulated data and a real example how QTLnet can be used to infer both direct and indirect effects of QTLs and phenotypes that co-map to a genomic region. 相似文献

14.

Upper probabilities based only on the likelihood function

P. Walley & S. Moral 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(4):831-847

In the problem of parametric statistical inference with a finite parameter space, we propose some simple rules for defining posterior upper and lower probabilities directly from the observed likelihood function, without using any prior information. The rules satisfy the likelihood principle and a basic consistency principle ('avoiding sure loss'), they produce vacuous inferences when the likelihood function is constant, and they have other symmetry, monotonicity and continuity properties. One of the rules also satisfies fundamental frequentist principles. The rules can be used to eliminate nuisance parameters, and to interpret the likelihood function and to use it in making decisions. To compare the rules, they are applied to the problem of sampling from a finite population. Our results indicate that there are objective statistical methods which can reconcile three general approaches to statistical inference: likelihood inference, coherent inference and frequentist inference. 相似文献

15.

Estimating multiple-membership logit models with mixed effects: indirect inference versus data cloning

Anna Gottard Giorgio Calzolari 《Journal of Statistical Computation and Simulation》2017,87(12):2334-2348

Multiple-membership logit models with random effects are models for clustered binary data, where each statistical unit can belong to more than one group. The likelihood function of these models is analytically intractable. We propose two different approaches for parameter estimation: indirect inference and data cloning (DC). The former is a non-likelihood-based method which uses an auxiliary model to select reasonable estimates. We propose an auxiliary model with the same dimension of parameter space as the target model, which is particularly convenient to reach good estimates very fast. The latter method computes maximum likelihood estimates through the posterior distribution of an adequate Bayesian model, fitted to cloned data. We implement a DC algorithm specifically for multiple-membership models. A Monte Carlo experiment compares the two methods on simulated data. For further comparison, we also report Bayesian posterior mean and Integrated Nested Laplace Approximation hybrid DC estimates. Simulations show a negligible loss of efficiency for the indirect inference estimator, compensated by a relevant computational gain. The approaches are then illustrated with two real examples on matched paired data. 相似文献

16.

Linear regression with compositional explanatory variables 总被引：1，自引：0，他引：1

K. Hron P. Filzmoser K. Thompson 《Journal of applied statistics》2012,39(5):1115-1128

Compositional explanatory variables should not be directly used in a linear regression model because any inference statistic can become misleading. While various approaches for this problem were proposed, here an approach based on the isometric logratio (ilr) transformation is used. It turns out that the resulting model is easy to handle, and that parameter estimation can be done in like in usual linear regression. Moreover, it is possible to use the ilr variables for inference statistics in order to obtain an appropriate interpretation of the model. 相似文献

17.

Nonparametric and semiparametric estimation of the three way receiver operating characteristic surface

Jialiang Li Xiao-Hua Zhou 《Journal of statistical planning and inference》2009,139(12):4133-4142

In many situations the diagnostic decision is not limited to a binary choice. Binary statistical tools such as receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) need to be expanded to address three-category classification problem. Previous authors have suggest various ways to model the extension of AUC but not the ROC surface. Only simple parametric approaches are proposed for modeling the ROC measure under the assumption that test results all follow normal distributions. We study the estimation methods of three-dimensional ROC surfaces with nonparametric and semiparametric estimators. Asymptotical results are provided as a basis for statistical inference. Simulation studies are performed to assess the validity of our proposed methods in finite samples. We consider an Alzheimer's disease example from a clinical study in the US as an illustration. The nonparametric and semiparametric modelling approaches for the three way ROC analysis can be readily generalized to diagnostic problems with more than three classes. 相似文献

18.

SPACE–TIME MODELLING OF SYDNEY HARBOUR WINDS

Edward Cripps David Nott William T.M. Dunsmuir Christopher Wikle 《Australian & New Zealand Journal of Statistics》2005,47(1):3-17

This paper develops a space‐time statistical model for local forecasting of surface‐level wind fields in a coastal region with complex topography. The statistical model makes use of output from deterministic numerical weather prediction models which are able to produce forecasts of surface wind fields on a spatial grid. When predicting surface winds at observing stations , errors can arise due to sub‐grid scale processes not adequately captured by the numerical weather prediction model , and the statistical model attempts to correct for these influences. In particular , it uses information from observing stations within the study region as well as topographic information to account for local bias. Bayesian methods for inference are used in the model , with computations carried out using Markov chain Monte Carlo algorithms. Empirical performance of the model is described , illustrating that a structured Bayesian approach to complicated space‐time models of the type considered in this paper can be readily implemented and can lead to improvements in forecasting over traditional methods. 相似文献

19.

Bayesian nonparametric statistics: A new toolkit for discovery in cancer research

下载免费PDF全文

Peter F. Thall Peter Mueller Yanxun Xu Michele Guindani 《Pharmaceutical statistics》2017,16(6):414-423

Many commonly used statistical methods for data analysis or clinical trial design rely on incorrect assumptions or assume an over‐simplified framework that ignores important information. Such statistical practices may lead to incorrect conclusions about treatment effects or clinical trial designs that are impractical or that do not accurately reflect the investigator's goals. Bayesian nonparametric (BNP) models and methods are a very flexible new class of statistical tools that can overcome such limitations. This is because BNP models can accurately approximate any distribution or function and can accommodate a broad range of statistical problems, including density estimation, regression, survival analysis, graphical modeling, neural networks, classification, clustering, population models, forecasting and prediction, spatiotemporal models, and causal inference. This paper describes 3 illustrative applications of BNP methods, including a randomized clinical trial to compare treatments for intraoperative air leaks after pulmonary resection, estimating survival time with different multi‐stage chemotherapy regimes for acute leukemia, and evaluating joint effects of targeted treatment and an intermediate biological outcome on progression‐free survival time in prostate cancer. 相似文献

20.

A defence of subjective fiducial inference

Russell J. Bowater 《AStA Advances in Statistical Analysis》2017,101(2):177-197

This paper defends the fiducial argument. In particular, an interpretation of the fiducial argument is defended in which fiducial probability is treated as being subjective and the role taken by pivots in a more standard interpretation is taken by what are called primary random variables, which in fact form a special class of pivots. The resulting methodology, which is referred to as subjective fiducial inference, is outlined in the first part of the paper. This is followed by a defence of this methodology arranged in a series of criticisms and responses. These criticisms reflect objections that are often raised against standard fiducial inference and incorporate more specific concerns that are likely to exist with respect to subjective fiducial inference. It is hoped that the responses to these criticisms clarify the contribution that a system of fiducial reasoning can make to statistical inference. 相似文献