期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Clustering time-course microarray data using functional Bayesian infinite mixture model

Claudia Angelini Marianna Pensky 《Journal of applied statistics》2012,39(1):129-149

This paper presents a new Bayesian, infinite mixture model based, clustering approach, specifically designed for time-course microarray data. The problem is to group together genes which have “similar” expression profiles, given the set of noisy measurements of their expression levels over a specific time interval. In order to capture temporal variations of each curve, a non-parametric regression approach is used. Each expression profile is expanded over a set of basis functions and the sets of coefficients of each curve are subsequently modeled through a Bayesian infinite mixture of Gaussian distributions. Therefore, the task of finding clusters of genes with similar expression profiles is then reduced to the problem of grouping together genes whose coefficients are sampled from the same distribution in the mixture. Dirichlet processes prior is naturally employed in such kinds of models, since it allows one to deal automatically with the uncertainty about the number of clusters. The posterior inference is carried out by a split and merge MCMC sampling scheme which integrates out parameters of the component distributions and updates only the latent vector of the cluster membership. The final configuration is obtained via the maximum a posteriori estimator. The performance of the method is studied using synthetic and real microarray data and is compared with the performances of competitive techniques. 相似文献

2.

Modelling data observed irregularly over space and regularly in time

Chrysoula Dimitriou-Fakalou 《Statistical Methodology》2009,6(2):120-132

When the data has been collected regularly over time and irregularly over space, it is difficult to impose an explicit auto-regressive structure over the space as it is over time. We study a phenomenon on a number of fixed locations. On each location the process forms an auto-regressive time series. The second-order dependence over space is reflected by the covariance matrix of the noise process, which is ‘white’ in time but not over the space. We consider the asymptotic properties of our inference methods, when the number of recordings in time only tends to infinity. 相似文献

3.

Modelling spatio-temporal air pollution data from a mobile monitoring station

Simone Del Sarto Maria Giovanna Ranalli David Cappelletti Beatrice Moroni Stefano Crocchianti Silvia Castellini 《Journal of Statistical Computation and Simulation》2016,86(13):2546-2559

ABSTRACT

Environmental data is typically indexed in space and time. This work deals with modelling spatio-temporal air quality data, when multiple measurements are available for each space-time point. Typically this situation arises when different measurements referring to several response variables are observed in each space-time point, for example, different pollutants or size resolved data on particular matter. Nonetheless, such a kind of data also arises when using a mobile monitoring station moving along a path for a certain period of time. In this case, each spatio-temporal point has a number of measurements referring to the response variable observed several times over different locations in a close neighbourhood of the space-time point. We deal with this type of data within a hierarchical Bayesian framework, in which observed measurements are modelled in the first stage of the hierarchy, while the unobserved spatio-temporal process is considered in the following stages. The final model is very flexible and includes autoregressive terms in time, different structures for the variance-covariance matrix of the errors, and can manage covariates available at different space-time resolutions. This approach is motivated by the availability of data on urban pollution dynamics: fast measures of gases and size resolved particulate matter have been collected using an Optical Particle Counter located on a cabin of a public conveyance that moves on a monorail on a line transect of a town. Urban microclimate information is also available and included in the model. Simulation studies are conducted to evaluate the performance of the proposed model over existing alternatives that do not model data over the first stage of the hierarchy. 相似文献

4.

Sieve Estimation of Time-Varying Panel Data Models With Latent Structures

Liangjun Su Xia Wang Sainan Jin 《商业与经济统计学杂志》2013,31(2):334-349

We propose a heterogeneous time-varying panel data model with a latent group structure that allows the coefficients to vary over both individuals and time. We assume that the coefficients change smoothly over time and form different unobserved groups. When treated as smooth functions of time, the individual functional coefficients are heterogeneous across groups but homogeneous within a group. We propose a penalized-sieve-estimation-based classifier-Lasso (C-Lasso) procedure to identify the individuals’ membership and to estimate the group-specific functional coefficients in a single step. The classification exhibits the desirable property of uniform consistency. The C-Lasso estimators and their post-Lasso versions achieve the oracle property so that the group-specific functional coefficients can be estimated as well as if the individuals’ membership were known. Several extensions are discussed. Simulations demonstrate excellent finite sample performance of the approach in both classification and estimation. We apply our method to study the heterogeneous trending behavior of GDP per capita across 91 countries for the period 1960–2012 and find four latent groups. 相似文献

5.

Spatio-temporal modelling using B-spline for disease mapping: analysis of childhood cancer trends

Mahmoud Torabi 《Journal of applied statistics》2011,38(9):1769-1781

To examine childhood cancer diagnoses in the province of Alberta, Canada during 1983–2004, we construct a generalized additive mixed model for the analysis of geographic and temporal variability of cancer ratios. In this model, spatially correlated random effects and temporal components are adopted. The interaction between space and time is also accommodated. Spatio-temporal models that use conditional autoregressive smoothing across the spatial dimension and B-spline over the temporal dimension are considered. We study the patterns of incidence ratios over time and identify areas with consistently high ratio estimates as areas for potential further investigation. We apply the method of penalized quasi-likelihood to estimate the model parameters. We illustrate this approach using a yearly data set of childhood cancer diagnoses in the province of Alberta, Canada during 1983–2004. 相似文献

6.

Hierarchical spatially varying coefficient and temporal dynamic process models using spTDyn

《Journal of Statistical Computation and Simulation》2012,82(4):820-840

Bayesian hierarchical spatio-temporal models are becoming increasingly important due to the increasing availability of space-time data in various domains. In this paper we develop a user friendly R package, spTDyn, for spatio-temporal modelling. It can be used to fit models with spatially varying and temporally dynamic coefficients. The former is used for modelling the spatially varying impact of explanatory variables on the response caused by spatial misalignment. This issue can arise when the covariates only vary over time, or when they are measured over a grid and hence do not match the locations of the response point-level data. The latter is to examine the temporally varying impact of explanatory variables in space-time data due, for example, to seasonality or other time-varying effects. The spTDyn package uses Markov chain Monte Carlo sampling written in C, which makes computations highly efficient, and the interface is written in R making these sophisticated modelling techniques easily accessible to statistical analysts. The models and software, and their advantages, are illustrated using temperature and ozone space-time data. 相似文献

7.

Block clustering with collapsed latent block models 总被引：1，自引：0，他引：1

Jason Wyse Nial Friel 《Statistics and Computing》2012,22(2):415-428

We introduce a Bayesian extension of the latent block model for model-based block clustering of data matrices. Our approach considers a block model where block parameters may be integrated out. The result is a posterior defined over the number of clusters in rows and columns and cluster memberships. The number of row and column clusters need not be known in advance as these are sampled along with cluster memberhips using Markov chain Monte Carlo. This differs from existing work on latent block models, where the number of clusters is assumed known or is chosen using some information criteria. We analyze both simulated and real data to validate the technique. 相似文献

8.

A data-driven selection of the number of clusters in the Dirichlet allocation model via Bayesian mixture modelling

E. F. Saraiva C. A. B. Pereira A. K. Suzuki 《Journal of Statistical Computation and Simulation》2019,89(15):2848-2870

In this paper, we consider a Bayesian mixture model that allows us to integrate out the weights of the mixture in order to obtain a procedure in which the number of clusters is an unknown quantity. To determine clusters and estimate parameters of interest, we develop an MCMC algorithm denominated by sequential data-driven allocation sampler. In this algorithm, a single observation has a non-null probability to create a new cluster and a set of observations may create a new cluster through the split-merge movements. The split-merge movements are developed using a sequential allocation procedure based in allocation probabilities that are calculated according to the Kullback–Leibler divergence between the posterior distribution using the observations previously allocated and the posterior distribution including a ‘new’ observation. We verified the performance of the proposed algorithm on the simulated data and then we illustrate its use on three publicly available real data sets. 相似文献

9.

Non parametric space-time modeling of SO2 in presence of many missing data

Bruno Scarpa 《Statistical Methods and Applications》2005,14(1):67-82

Given pollution measurement from a network of monitoring sites in the area of a city and over an extended period of time, an important problem is to identify the spatial and temporal structure of the data. In this paper we focus on the identification and estimate of a statistical non parametric model to analyse the SO₂ in the city of Padua, where data are collected by some fixed stations and some mobile stations moving without any specific rule in different new locations. The impact of the use of mobile stations is that for each location there are times when data was not collected. Assuming temporal stationarity and spatial isotropy for the residuals of an additive model for the logarithm of SO₂ concentration, we estimate the semivariogram using a kernel-type estimator. Attempts are made to avoid the assumption of spatial isotropy. Bootstrap confidence bands are obtained for the spatial component of the additive model that is a deterministic function which defines the spatial structure. Finally, an example is proposed to design an optimal network for the mobiles monitoring stations in a fixed future time, given all the information available. 相似文献

10.

Scalable spatio‐temporal Bayesian analysis of high‐dimensional electroencephalography data

Shariq Mohammed Dipak K. Dey 《Revue canadienne de statistique》2021,49(1):107-128

We present a scalable Bayesian modelling approach for identifying brain regions that respond to a certain stimulus and use them to classify subjects. More specifically, we deal with multi‐subject electroencephalography (EEG) data with a binary response distinguishing between alcoholic and control groups. The covariates are matrix‐variate with measurements taken from each subject at different locations across multiple time points. EEG data have a complex structure with both spatial and temporal attributes. We use a divide‐and‐conquer strategy and build separate local models, that is, one model at each time point. We employ Bayesian variable selection approaches using a structured continuous spike‐and‐slab prior to identify the locations that respond to a certain stimulus. We incorporate the spatio‐temporal structure through a Kronecker product of the spatial and temporal correlation matrices. We develop a highly scalable estimation algorithm, using likelihood approximation, to deal with large number of parameters in the model. Variable selection is done via clustering of the locations based on their duration of activation. We use scoring rules to evaluate the prediction performance. Simulation studies demonstrate the efficiency of our scalable algorithm in terms of estimation and fast computation. We present results using our scalable approach on a case study of multi‐subject EEG data. 相似文献

11.

Bayesian scanning of spatial disease rates with integrated nested Laplace approximation (INLA)

Massimo Bilancia Giacomo Demarinis 《Statistical Methods and Applications》2014,23(1):71-94

Among the many tools suited to detect local clusters in group-level data, Kulldorff–Nagarwalla’s spatial scan statistic gained wide popularity (Kulldorff and Nagarwalla in Stat Med 14(8):799–810, 1995). The underlying assumptions needed for making statistical inference feasible are quite strong, as counts in spatial units are assumed to be independent Poisson distributed random variables. Unfortunately, outcomes in spatial units are often not independent of each other, and risk estimates of areas that are close to each other will tend to be positively correlated as they share a number of spatially varying characteristics. We therefore introduce a Bayesian model-based algorithm for cluster detection in the presence of spatially autocorrelated relative risks. Our approach has been made possible by the recent development of new numerical methods based on integrated nested Laplace approximation, by which we can directly compute very accurate approximations of posterior marginals within short computational time (Rue et al. in JRSS B 71(2):319–392, 2009). Simulated data and a case study show that the performance of our method is at least comparable to that of Kulldorff–Nagarwalla’s statistic. 相似文献

12.

Cluster detection and clustering with random start forward searches

Anthony C. Atkinson Marco Riani Andrea Cerioli 《Journal of applied statistics》2018,45(5):777-798

The forward search is a method of robust data analysis in which outlier free subsets of the data of increasing size are used in model fitting; the data are then ordered by closeness to the model. Here the forward search, with many random starts, is used to cluster multivariate data. These random starts lead to the diagnostic identification of tentative clusters. Application of the forward search to the proposed individual clusters leads to the establishment of cluster membership through the identification of non-cluster members as outlying. The method requires no prior information on the number of clusters and does not seek to classify all observations. These properties are illustrated by the analysis of 200 six-dimensional observations on Swiss banknotes. The importance of linked plots and brushing in elucidating data structures is illustrated. We also provide an automatic method for determining cluster centres and compare the behaviour of our method with model-based clustering. In a simulated example with eight clusters our method provides more stable and accurate solutions than model-based clustering. We consider the computational requirements of both procedures. 相似文献

13.

A multivariate finite mixture latent trajectory model with application to dementia studies

Dongbing Lai Huiping Xu Daniel Koller Tatiana Foroud 《Journal of applied statistics》2016,43(14):2503-2523

Dementia patients exhibit considerable heterogeneity in individual trajectories of cognitive decline, with some patients showing rapid decline following diagnoses while others exhibiting slower decline or remaining stable for several years. Dementia studies often collect longitudinal measures of multiple neuropsychological tests aimed to measure patients’ decline across a number of cognitive domains. We propose a multivariate finite mixture latent trajectory model to identify distinct longitudinal patterns of cognitive decline simultaneously in multiple cognitive domains, each of which is measured by multiple neuropsychological tests. EM algorithm is used for parameter estimation and posterior probabilities are used to predict latent class membership. We present results of a simulation study demonstrating adequate performance of our proposed approach and apply our model to the Uniform Data Set from the National Alzheimer's Coordinating Center to identify cognitive decline patterns among dementia patients. 相似文献

14.

A mixture of experts latent position cluster model for social network data

Isobel Claire Gormley Thomas Brendan Murphy 《Statistical Methodology》2010,7(3):385-405

Social network data represent the interactions between a group of social actors. Interactions between colleagues and friendship networks are typical examples of such data.The latent space model for social network data locates each actor in a network in a latent (social) space and models the probability of an interaction between two actors as a function of their locations. The latent position cluster model extends the latent space model to deal with network data in which clusters of actors exist — actor locations are drawn from a finite mixture model, each component of which represents a cluster of actors.A mixture of experts model builds on the structure of a mixture model by taking account of both observations and associated covariates when modeling a heterogeneous population. Herein, a mixture of experts extension of the latent position cluster model is developed. The mixture of experts framework allows covariates to enter the latent position cluster model in a number of ways, yielding different model interpretations.Estimates of the model parameters are derived in a Bayesian framework using a Markov Chain Monte Carlo algorithm. The algorithm is generally computationally expensive — surrogate proposal distributions which shadow the target distributions are derived, reducing the computational burden.The methodology is demonstrated through an illustrative example detailing relationships between a group of lawyers in the USA. 相似文献

15.

Segmental dynamic factor analysis for time series of curves

Allou Samé Gérard Govaert 《Statistics and Computing》2017,27(6):1617-1637

A new approach is introduced in this article for describing and visualizing time series of curves, where each curve has the particularity of being subject to changes in regime. For this purpose, the curves are represented by a regression model including a latent segmentation, and their temporal evolution is modeled through a Gaussian random walk over low-dimensional factors of the regression coefficients. The resulting model is nothing else than a particular state-space model involving discrete and continuous latent variables, whose parameters are estimated across a sequence of curves through a dedicated variational Expectation-Maximization algorithm. The experimental study conducted on simulated data and real time series of curves has shown encouraging results in terms of visualization of their temporal evolution and forecasting. 相似文献

16.

Space‐time cluster identification in point processes

Renato Assunçäo Andréa Tavares Thais Correa Martin Kulldorff 《Revue canadienne de statistique》2007,35(1):9-25

The authors propose a new type of scan statistic to test for the presence of space‐time clusters in point processes data, when the goal is to identify and evaluate the statistical significance of localized clusters. Their method is based only on point patterns for cases; it does not require any specific knowledge of the underlying population. The authors propose to scan the three‐dimensional space with a score test statistic under the null hypothesis that the underlying point process is an inhomogeneous Poisson point process with space and time separable intensity. The alternative is that there are one or more localized space‐time clusters. Their method has been implemented in a computationally efficient way so that it can be applied routinely. They illustrate their method with space‐time crime data from Belo Horizonte, a Brazilian city, in addition to presenting a Monte Carlo study to analyze the power of their new test. 相似文献

17.

Detecting clusters with increased mean using scan windows with variable radius

Chen-ju Lin Yi-chun Shu 《Journal of applied statistics》2015,42(11):2420-2431

Applying spatiotemporal scan statistics is an effective method to detect the clustering of mean shifts in many application fields. Although several exponentially weighted moving average (EWMA) based scan statistics have been proposed, the existing methods generally require a fixed scan window size or apply the weighting technique across the temporal axis only. However, the size of shift coverage is often unavailable in practical problems. Using a mismatching scan radius may mislead the size of cluster coverage in space or delay the time to detection. This research proposed an stEWMA method by applying the weighting technique across both temporal and spatial axes with variable scan radius. The simulation analysis showed that the stEWMA method can have a significantly shorter time to detection than the likelihood ratio-based scan statistic using variable scan radius, especially when cluster coverage size is small. The application to detecting the increase of male thyroid cancer in the New Mexico state also showed the effectiveness of the proposed method. 相似文献

18.

Robust estimation of a dynamic spatio-temporal model with structural change

Stephen Jun V. Villejo Joseph Ryan G. Lansangan 《Journal of Statistical Computation and Simulation》2017,87(3):505-518

We postulate a dynamic spatio-temporal model with constant covariate effect but with varying spatial effect over time and varying temporal effect across locations. To mitigate the effect of temporary structural change, the model can be estimated using the backfitting algorithm embedded with forward search algorithm and bootstrap. A simulation study is designed to evaluate structural optimality of the model with the estimation procedure. The fitted model exhibit superior predictive ability relative to the linear model. The proposed algorithm also consistently produced lower relative bias and standard errors for the spatial parameter estimates. While additional neighbourhoods do not necessarily improve predictive ability of the model, it trims down relative bias on the parameter estimates, specially for spatial parameter. Location of the temporary structural change along with the degree of structural change contributes to lower relative bias of parameter estimates and in better predictive ability of the model. The estimation procedure is able to produce parameter estimates that are robust to the occurrence of temporary structural change. 相似文献

19.

Multivariate regression analysis of panel data with binary outcomes applied to unemployment data

Claudia Czado 《Statistical Papers》2000,41(3):281-304

Summary In panel studies binary outcome measures together with time stationary and time varying explanatory variables are collected over time on the same individual. Therefore, a regression analysis for this type of data must allow for the correlation among the outcomes of an individual. The multivariate probit model of Ashford and Sowden (1970) was the first regression model for multivariate binary responses. However, a likelihood analysis of the multivariate probit model with general correlation structure for higher dimensions is intractable due to the maximization over high dimensional integrals thus severely restricting ist applicability so far. Czado (1996) developed a Markov Chain Monte Carlo (MCMC) algorithm to overcome this difficulty. In this paper we present an application of this algorithm to unemployment data from the Panel Study of Income Dynamics involving 11 waves of the panel study. In addition we adapt Bayesian model checking techniques based on the posterior predictive distribution (see for example Gelman et al. (1996)) for the multivariate probit model. These help to identify mean and correlation specification which fit the data well. C. Czado was supported by research grant OGP0089858 of the Natural Sciences and Engineering Research Council of Canada. 相似文献

20.

<Emphasis Type="BoldItalic">MAD-STEC</Emphasis>: a method for multiple automatic detection of space-time emerging clusters

Bráulio M. Veloso Thais R. Correa Marcos O. Prates Gabriel F. Oliveira Andréa I. Tavares 《Statistics and Computing》2017,27(4):1099-1110

Crime or disease surveillance commonly rely in space-time clustering methods to identify emerging patterns. The goal is to detect spatial-temporal clusters as soon as possible after its occurrence and to control the rate of false alarms. With this in mind, a spatio-temporal multiple cluster detection method was developed as an extension of a previous proposal based on a spatial version of the Shiryaev–Roberts statistic. Besides the capability of multiple cluster detection, the method have less input parameter than the previous proposal making its use more intuitive to practitioners. To evaluate the new methodology a simulation study is performed in several scenarios and enlighten many advantages of the proposed method. Finally, we present a case study to a crime data-set in Belo Horizonte, Brazil. 相似文献