首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Pettitt  A. N.  Weir  I. S.  Hart  A. G. 《Statistics and Computing》2002,12(4):353-367
A Gaussian conditional autoregressive (CAR) formulation is presented that permits the modelling of the spatial dependence and the dependence between multivariate random variables at irregularly spaced sites so capturing some of the modelling advantages of the geostatistical approach. The model benefits not only from the explicit availability of the full conditionals but also from the computational simplicity of the precision matrix determinant calculation using a closed form expression involving the eigenvalues of a precision matrix submatrix. The introduction of covariates into the model adds little computational complexity to the analysis and thus the method can be straightforwardly extended to regression models. The model, because of its computational simplicity, is well suited to application involving the fully Bayesian analysis of large data sets involving multivariate measurements with a spatial ordering. An extension to spatio-temporal data is also considered. Here, we demonstrate use of the model in the analysis of bivariate binary data where the observed data is modelled as the sign of the hidden CAR process. A case study involving over 450 irregularly spaced sites and the presence or absence of each of two species of rain forest trees at each site is presented; Markov chain Monte Carlo (MCMC) methods are implemented to obtain posterior distributions of all unknowns. The MCMC method works well with simulated data and the tree biodiversity data set.  相似文献   

3.
In many research fields, scientific questions are investigated by analyzing data collected over space and time, usually at fixed spatial locations and time steps and resulting in geo-referenced time series. In this context, it is of interest to identify potential partitions of the space and study their evolution over time. A finite space-time mixture model is proposed to identify level-based clusters in spatio-temporal data and study their temporal evolution along the time frame. We anticipate space-time dependence by introducing spatio-temporally varying mixing weights to allocate observations at nearby locations and consecutive time points with similar cluster’s membership probabilities. As a result, a clustering varying over time and space is accomplished. Conditionally on the cluster’s membership, a state-space model is deployed to describe the temporal evolution of the sites belonging to each group. Fully posterior inference is provided under a Bayesian framework through Monte Carlo Markov chain algorithms. Also, a strategy to select the suitable number of clusters based upon the posterior temporal patterns of the clusters is offered. We evaluate our approach through simulation experiments, and we illustrate using air quality data collected across Europe from 2001 to 2012, showing the benefit of borrowing strength of information across space and time.  相似文献   

4.
Bayesian hierarchical spatio-temporal models are becoming increasingly important due to the increasing availability of space-time data in various domains. In this paper we develop a user friendly R package, spTDyn, for spatio-temporal modelling. It can be used to fit models with spatially varying and temporally dynamic coefficients. The former is used for modelling the spatially varying impact of explanatory variables on the response caused by spatial misalignment. This issue can arise when the covariates only vary over time, or when they are measured over a grid and hence do not match the locations of the response point-level data. The latter is to examine the temporally varying impact of explanatory variables in space-time data due, for example, to seasonality or other time-varying effects. The spTDyn package uses Markov chain Monte Carlo sampling written in C, which makes computations highly efficient, and the interface is written in R making these sophisticated modelling techniques easily accessible to statistical analysts. The models and software, and their advantages, are illustrated using temperature and ozone space-time data.  相似文献   

5.
This article focuses on the location, time, and spatio-temporal components associated with suitably aggregated data to improve prediction of individual asset values. Such effects are introduced in the context of hierarchical models, which we find more natural than attempting to model covariance structure. Indeed, our cross-sectional database, a sample of 7,936 transactions for 49 subdivisions over a 10-year period in Baton Rouge, Louisiana, precludes covariance modeling. A wide range of models arises, each fitted using sampling-based methods because likelihood-based fitting may not be possible. Choosing among an array of nonnested models is carried out using a posterior predictive criterion. In addition, one year of data is held out for model validation. A thorough analysis of the data incorporating all of the aforementioned issues is presented.  相似文献   

6.
In studies of affective disorder, individuals are often observed to experience recurrent symptomatic exacerbations warranting hospitalization. Interest may lie in modeling the occurrence of such exacerbations over time and identifying associated risk factors. In some patients, recurrent exacerbations are temporally clustered following disease onset, but cease to occur after a period of time. We develop a dynamic Mover–Stayer model in which a canonical binary variable associated with each event indicates whether the underlying disease has resolved. An individual whose disease process has not resolved will experience events following a standard point process model governed by a latent intensity. When the disease process resolves, the complete data intensity becomes zero and no further event will occur. An expectation–maximization algorithm is described for parametric and semiparametric model fitting based on a discrete time dynamic Mover–Stayer model and a latent intensity-based model of the underlying point process.  相似文献   

7.
Recurrent events involve the occurrences of the same type of event repeatedly over time and are commonly encountered in longitudinal studies. Examples include seizures in epileptic studies or occurrence of cancer tumors. In such studies, interest lies in the number of events that occur over a fixed period of time. One considerable challenge in analyzing such data arises when a large proportion of patients discontinues before the end of the study, for example, because of adverse events, leading to partially observed data. In this situation, data are often modeled using a negative binomial distribution with time‐in‐study as offset. Such an analysis assumes that data are missing at random (MAR). As we cannot test the adequacy of MAR, sensitivity analyses that assess the robustness of conclusions across a range of different assumptions need to be performed. Sophisticated sensitivity analyses for continuous data are being frequently performed. However, this is less the case for recurrent event or count data. We will present a flexible approach to perform clinically interpretable sensitivity analyses for recurrent event data. Our approach fits into the framework of reference‐based imputations, where information from reference arms can be borrowed to impute post‐discontinuation data. Different assumptions about the future behavior of dropouts dependent on reasons for dropout and received treatment can be made. The imputation model is based on a flexible model that allows for time‐varying baseline intensities. We assess the performance in a simulation study and provide an illustration with a clinical trial in patients who suffer from bladder cancer. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

8.
Liang and Zeger (1986) proposed an extension of generalized linear models to the analysis of longitudinal data. In their formulation, a common dispersion parameter assumption across observation times is required. However, this assumption is not expected to hold in most situations. Park (1993) proposed a simple extension of Liang and Zeger's formulation to allow for different dispersion parameters for each time point. The proposed model is easy to apply without heavy computations and useful to handle the cases when variations in over-dispersion over time exist. In this paper, we focus on evaluating the effect of additional dispersion parameters on the estimators of model parameters. Through a Monte Carlo simulation study, efficiency of Park's method is compared with the Liang and Zeger's method.  相似文献   

9.
The big data era demands new statistical analysis paradigms, since traditional methods often break down when datasets are too large to fit on a single desktop computer. Divide and Recombine (D&R) is becoming a popular approach for big data analysis, where results are combined over subanalyses performed in separate data subsets. In this article, we consider situations where unit record data cannot be made available by data custodians due to privacy concerns, and explore the concept of statistical sufficiency and summary statistics for model fitting. The resulting approach represents a type of D&R strategy, which we refer to as summary statistics D&R; as opposed to the standard approach, which we refer to as horizontal D&R. We demonstrate the concept via an extended Gamma–Poisson model, where summary statistics are extracted from different databases and incorporated directly into the fitting algorithm without having to combine unit record data. By exploiting the natural hierarchy of data, our approach has major benefits in terms of privacy protection. Incorporating the proposed modelling framework into data extraction tools such as TableBuilder by the Australian Bureau of Statistics allows for potential analysis at a finer geographical level, which we illustrate with a multilevel analysis of the Australian unemployment data. Supplementary materials for this article are available online.  相似文献   

10.
In many experiments, several measurements on the same variable are taken over time, a geographic region, or some other index set. It is often of interest to know if there has been a change over the index set in the parameters of the distribution of the variable. Frequently, the data consist of a sequence of correlated random variables, and there may also be several experimental units under observation, each providing a sequence of data. A problem in ascertaining the boundaries between the layers in geological sedimentary beds is used to introduce the model and then to illustrate the proposed methodology. It is assumed that, conditional on the change point, the data from each sequence arise from an autoregressive process that undergoes a change in one or more of its parameters. Unconditionally, the model then becomes a mixture of nonstationary autoregressive processes. Maximum-likelihood methods are used, and results of simulations to evaluate the performance of these estimators under practical conditions are given.  相似文献   

11.
Summary.  We discuss the analysis of data from single-nucleotide polymorphism arrays comparing tumour and normal tissues. The data consist of sequences of indicators for loss of heterozygosity (LOH) and involve three nested levels of repetition: chromosomes for a given patient, regions within chromosomes and single-nucleotide polymorphisms nested within regions. We propose to analyse these data by using a semiparametric model for multilevel repeated binary data. At the top level of the hierarchy we assume a sampling model for the observed binary LOH sequences that arises from a partial exchangeability argument. This implies a mixture of Markov chains model. The mixture is defined with respect to the Markov transition probabilities. We assume a non-parametric prior for the random-mixing measure. The resulting model takes the form of a semiparametric random-effects model with the matrix of transition probabilities being the random effects. The model includes appropriate dependence assumptions for the two remaining levels of the hierarchy, i.e. for regions within chromosomes and for chromosomes within patient. We use the model to identify regions of increased LOH in a data set coming from a study of treatment-related leukaemia in children with an initial cancer diagnostic. The model successfully identifies the desired regions and performs well compared with other available alternatives.  相似文献   

12.
Data in many experiments arises as curves and therefore it is natural to use a curve as a basic unit in the analysis, which is in terms of functional data analysis (FDA). Functional curves are encountered when units are observed over time. Although the whole function curve itself is not observed, a sufficiently large number of evaluations, as is common with modern recording equipment, is assumed to be available. In this article, we consider the statistical inference for the mean functions in the two samples problem drawn from functional data sets, in which we assume that functional curves are observed, that is, we consider the test if these two groups of curves have the same mean functional curve when the two groups of curves without noise are observed. The L 2-norm based and bootstrap-based test statistics are proposed. It is shown that the proposed methodology is flexible. Simulation study and real-data examples are used to illustrate our techniques.  相似文献   

13.
The 2 × 2 tables used to present the data in an experiment for comparing two proportions by means of two observations of two independent binomial distributions may appear simple but are not. The debate about the best method to use is unending, and has divided statisticians into practically irreconcilable groups. In this article, all the available non-asymptotic tests are reviewed (except the Bayesian methodology). The author states which is the optimal (for each group), referring to the tables and programs that exist for them, and contrast the arguments used by supporters of each of the options. They also sort the tangle of solutions into "families", based on the methodology used and/or prior assumptions, and point out the most frequent methodological mistakes committed when comparing the different families.  相似文献   

14.
ABSTRACT

This work presents advanced computational aspects of a new method for changepoint detection on spatio-temporal point process data. We summarize the methodology, based on building a Bayesian hierarchical model for the data and declaring prior conjectures on the number and positions of the changepoints, and show how to take decisions regarding the acceptance of potential changepoints. The focus of this work is about choosing an approach that detects the correct changepoint and delivers smooth reliable estimates in a feasible computational time; we propose Bayesian P-splines as a suitable tool for managing spatial variation, both under a computational and a model fitting performance perspective. The main computational challenges are outlined and a solution involving parallel computing in R is proposed and tested on a simulation study. An application is also presented on a data set of seismic events in Italy over the last 20 years.  相似文献   

15.
Kernel smoothing methods are used to extend the Poisson log‐linear approach to the estimation of the size of population using multiple lists to an open population when the multiple lists are recorded at each time point. The data is marginal as only the lists at each time point are available and the transitions of individuals between lists at different time points are not observable. Our analysis is motivated by and applied to data on the number of drug addicts in the Hong Kong Special Administrative Region.  相似文献   

16.
Longitudinal data with non-response occur in studies where the same subject is followed over time but data for each subject may not be available at every time point. When the response is categorical and the response at time t depends on the response at the previous time points, it may be appropriate to model the response using a Markov model. We generalize a second-order Markov model to include a non-ignorable non-response mechanism. Simulation is used to study the properties of the estimators. Large sample sizes are necessary to ensure that the algorithm converges and that the asymptotic properties of the estimators can be used.  相似文献   

17.
Clustered (longitudinal) count data arise in many bio-statistical practices in which a number of repeated count responses are observed on a number of individuals. The repeated observations may also represent counts over time from a number of individuals. One important problem that arises in practice is to test homogeneity within clusters (individuals) and between clusters (individuals). As data within clusters are observations of repeated responses, the count data may be correlated and/or over-dispersed. For over-dispersed count data with unknown over-dispersion parameter we derive two score tests by assuming a random intercept model within the framework of (i) the negative binomial mixed effects model and (ii) the double extended quasi-likelihood mixed effects model (Lee and Nelder, 2001). These two statistics are much simpler than a statistic derived by Jacqmin-Gadda and Commenges (1995) under the framework of the over-dispersed generalized linear model. The first statistic takes the over-dispersion more directly into the model and therefore is expected to do well when the model assumptions are satisfied and the other statistic is expected to be robust. Simulations show superior level property of the statistics derived under the negative binomial and double extended quasi-likelihood model assumptions. A data set is analyzed and a discussion is given.  相似文献   

18.
In the health and social sciences, researchers often encounter categorical data for which complexities come from a nested hierarchy and/or cross-classification for the sampling structure. A common feature of these studies is a non-standard data structure with repeated measurements which may have some degree of clustering. In this paper, methodology is presented for the joint estimation of quantities of interest in the context of a stratified two-stage sample with bivariate dichotomous data. These quantities are the mean value π of an observed dichotomous response for a certain condition or time-point and a set of correlation coefficients for intra-cluster association for each condition or time period and for inter-condition correlation within and among clusters. The methodology uses the cluster means and pairwise joint probability parameters from each cluster. They together provide appropriate information across clusters for the estimation of the correlation coefficients.  相似文献   

19.
We consider data that are longitudinal, arising from n individuals over m time periods. Each individual moves according to the same homogeneous Markov chain, with s states. If the individual sample paths are observed, so that ‘micro-data’ are available, the transition probability matrix is estimated by maximum likelihood straightforwardly from the transition counts. If only the overall numbers in the various states at each time point are observed, we have ‘macro-data’, and the likelihood function is difficult to compute. In that case a variety of methods has been proposed in the literature. In this paper we propose methods based on generating functions and investigate their performance.  相似文献   

20.
Given pollution measurement from a network of monitoring sites in the area of a city and over an extended period of time, an important problem is to identify the spatial and temporal structure of the data. In this paper we focus on the identification and estimate of a statistical non parametric model to analyse the SO2 in the city of Padua, where data are collected by some fixed stations and some mobile stations moving without any specific rule in different new locations. The impact of the use of mobile stations is that for each location there are times when data was not collected. Assuming temporal stationarity and spatial isotropy for the residuals of an additive model for the logarithm of SO2 concentration, we estimate the semivariogram using a kernel-type estimator. Attempts are made to avoid the assumption of spatial isotropy. Bootstrap confidence bands are obtained for the spatial component of the additive model that is a deterministic function which defines the spatial structure. Finally, an example is proposed to design an optimal network for the mobiles monitoring stations in a fixed future time, given all the information available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号