首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 937 毫秒
1.
Detecting local spatial clusters for count data is an important task in spatial epidemiology. Two broad approaches—moving window and disease mapping methods—have been suggested in some of the literature to find clusters. However, the existing methods employ somewhat arbitrarily chosen tuning parameters, and the local clustering results are sensitive to the choices. In this paper, we propose a penalized likelihood method to overcome the limitations of existing local spatial clustering approaches for count data. We start with a Poisson regression model to accommodate any type of covariates, and formulate the clustering problem as a penalized likelihood estimation problem to find change points of intercepts in two-dimensional space. The cost of developing a new algorithm is minimized by modifying an existing least absolute shrinkage and selection operator algorithm. The computational details on the modifications are shown, and the proposed method is illustrated with Seoul tuberculosis data.  相似文献   

2.
A Bayesian multi-category kernel classification method is proposed. The algorithm performs the classification of the projections of the data to the principal axes of the feature space. The advantage of this approach is that the regression coefficients are identifiable and sparse, leading to large computational savings and improved classification performance. The degree of sparsity is regulated in a novel framework based on Bayesian decision theory. The Gibbs sampler is implemented to find the posterior distributions of the parameters, thus probability distributions of prediction can be obtained for new data points, which gives a more complete picture of classification. The algorithm is aimed at high dimensional data sets where the dimension of measurements exceeds the number of observations. The applications considered in this paper are microarray, image processing and near-infrared spectroscopy data.  相似文献   

3.
Estimation in mixed linear models is, in general, computationally demanding, since applied problems may involve extensive data sets and large numbers of random effects. Existing computer algorithms are slow and/or require large amounts of memory. These problems are compounded in generalized linear mixed models for categorical data, since even approximate methods involve fitting of a linear mixed model within steps of an iteratively reweighted least squares algorithm. Only in models in which the random effects are hierarchically nested can the computations for fitting these models to large data sets be carried out rapidly. We describe a data augmentation approach to these computational difficulties in which we repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as 'offsets'. The submodels are chosen so that they have a nested random-effect structure, thus allowing maximum exploitation of the computational efficiency which is available in this case. Examples of the use of the algorithm for both metric and discrete responses are discussed, all calculations being carried out using macros within the MLwiN program.  相似文献   

4.
Summary.  The retrieval of wind vectors from satellite scatterometer observations is a non-linear inverse problem. A common approach to solving inverse problems is to adopt a Bayesian framework and to infer the posterior distribution of the parameters of interest given the observations by using a likelihood model relating the observations to the parameters, and a prior distribution over the parameters. We show how Gaussian process priors can be used efficiently with a variety of likelihood models, using local forward (observation) models and direct inverse models for the scatterometer. We present an enhanced Markov chain Monte Carlo method to sample from the resulting multimodal posterior distribution. We go on to show how the computational complexity of the inference can be controlled by using a sparse, sequential Bayes algorithm for estimation with Gaussian processes. This helps to overcome the most serious barrier to the use of probabilistic, Gaussian process methods in remote sensing inverse problems, which is the prohibitively large size of the data sets. We contrast the sampling results with the approximations that are found by using the sparse, sequential Bayes algorithm.  相似文献   

5.
We consider a linear regression model where there are group structures in covariates. The group LASSO has been proposed for group variable selections. Many nonconvex penalties such as smoothly clipped absolute deviation and minimax concave penalty were extended to group variable selection problems. The group coordinate descent (GCD) algorithm is used popularly for fitting these models. However, the GCD algorithms are hard to be applied to nonconvex group penalties due to computational complexity unless the design matrix is orthogonal. In this paper, we propose an efficient optimization algorithm for nonconvex group penalties by combining the concave convex procedure and the group LASSO algorithm. We also extend the proposed algorithm for generalized linear models. We evaluate numerical efficiency of the proposed algorithm compared to existing GCD algorithms through simulated data and real data sets.  相似文献   

6.
Pettitt  A. N.  Weir  I. S.  Hart  A. G. 《Statistics and Computing》2002,12(4):353-367
A Gaussian conditional autoregressive (CAR) formulation is presented that permits the modelling of the spatial dependence and the dependence between multivariate random variables at irregularly spaced sites so capturing some of the modelling advantages of the geostatistical approach. The model benefits not only from the explicit availability of the full conditionals but also from the computational simplicity of the precision matrix determinant calculation using a closed form expression involving the eigenvalues of a precision matrix submatrix. The introduction of covariates into the model adds little computational complexity to the analysis and thus the method can be straightforwardly extended to regression models. The model, because of its computational simplicity, is well suited to application involving the fully Bayesian analysis of large data sets involving multivariate measurements with a spatial ordering. An extension to spatio-temporal data is also considered. Here, we demonstrate use of the model in the analysis of bivariate binary data where the observed data is modelled as the sign of the hidden CAR process. A case study involving over 450 irregularly spaced sites and the presence or absence of each of two species of rain forest trees at each site is presented; Markov chain Monte Carlo (MCMC) methods are implemented to obtain posterior distributions of all unknowns. The MCMC method works well with simulated data and the tree biodiversity data set.  相似文献   

7.
In this paper, we consider a Bayesian mixture model that allows us to integrate out the weights of the mixture in order to obtain a procedure in which the number of clusters is an unknown quantity. To determine clusters and estimate parameters of interest, we develop an MCMC algorithm denominated by sequential data-driven allocation sampler. In this algorithm, a single observation has a non-null probability to create a new cluster and a set of observations may create a new cluster through the split-merge movements. The split-merge movements are developed using a sequential allocation procedure based in allocation probabilities that are calculated according to the Kullback–Leibler divergence between the posterior distribution using the observations previously allocated and the posterior distribution including a ‘new’ observation. We verified the performance of the proposed algorithm on the simulated data and then we illustrate its use on three publicly available real data sets.  相似文献   

8.
Mixture models are flexible tools in density estimation and classification problems. Bayesian estimation of such models typically relies on sampling from the posterior distribution using Markov chain Monte Carlo. Label switching arises because the posterior is invariant to permutations of the component parameters. Methods for dealing with label switching have been studied fairly extensively in the literature, with the most popular approaches being those based on loss functions. However, many of these algorithms turn out to be too slow in practice, and can be infeasible as the size and/or dimension of the data grow. We propose a new, computationally efficient algorithm based on a loss function interpretation, and show that it can scale up well in large data set scenarios. Then, we review earlier solutions which can scale up well for large data set, and compare their performances on simulated and real data sets. We conclude with some discussions and recommendations of all the methods studied.  相似文献   

9.
In spatial statistics, models are often constructed based on some common, but possible restrictive assumptions for the underlying spatial process, including Gaussianity as well as stationarity and isotropy. However, these assumptions are frequently violated in applied problems. In order to simultaneously handle skewness and non-homogeneity (i.e., non-stationarity and anisotropy), we develop the fixed rank kriging model through the use of skew-normal distribution for its non-spatial latent variables. Our approach to spatial modeling is easy to implement and also provides a great flexibility in adjusting to skewed and large datasets with heterogeneous correlation structures. We adopt a Bayesian framework for our analysis, and describe a simple MCMC algorithm for sampling from the posterior distribution of the model parameters and performing spatial prediction. Through a simulation study, we demonstrate that the proposed model could detect departures from normality and, for illustration, we analyze a synthetic dataset of CO\(_2\) measurements. Finally, to deal with multivariate spatial data showing some degree of skewness, a multivariate extension of the model is also provided.  相似文献   

10.
We propose a two-stage algorithm for computing maximum likelihood estimates for a class of spatial models. The algorithm combines Markov chain Monte Carlo methods such as the Metropolis–Hastings–Green algorithm and the Gibbs sampler, and stochastic approximation methods such as the off-line average and adaptive search direction. A new criterion is built into the algorithm so stopping is automatic once the desired precision has been set. Simulation studies and applications to some real data sets have been conducted with three spatial models. We compared the algorithm proposed with a direct application of the classical Robbins–Monro algorithm using Wiebe's wheat data and found that our procedure is at least 15 times faster.  相似文献   

11.
The computational demand required to perform inference using Markov chain Monte Carlo methods often obstructs a Bayesian analysis. This may be a result of large datasets, complex dependence structures, or expensive computer models. In these instances, the posterior distribution is replaced by a computationally tractable approximation, and inference is based on this working model. However, the error that is introduced by this practice is not well studied. In this paper, we propose a methodology that allows one to examine the impact on statistical inference by quantifying the discrepancy between the intractable and working posterior distributions. This work provides a structure to analyse model approximations with regard to the reliability of inference and computational efficiency. We illustrate our approach through a spatial analysis of yearly total precipitation anomalies where covariance tapering approximations are used to alleviate the computational demand associated with inverting a large, dense covariance matrix.  相似文献   

12.
We propose a Bayesian approach for estimating the hazard functions under the constraint of a monotone hazard ratio. We construct a model for the monotone hazard ratio utilizing the Cox’s proportional hazards model with a monotone time-dependent coefficient. To reduce computational complexity, we use a signed gamma process prior for the time-dependent coefficient and the Bayesian bootstrap prior for the baseline hazard function. We develope an efficient MCMC algorithm and illustrate the proposed method on simulated and real data sets.  相似文献   

13.
We propose a method for the analysis of a spatial point pattern, which is assumed to arise as a set of observations from a spatial nonhomogeneous Poisson process. The spatial point pattern is observed in a bounded region, which, for most applications, is taken to be a rectangle in the space where the process is defined. The method is based on modeling a density function, defined on this bounded region, that is directly related with the intensity function of the Poisson process. We develop a flexible nonparametric mixture model for this density using a bivariate Beta distribution for the mixture kernel and a Dirichlet process prior for the mixing distribution. Using posterior simulation methods, we obtain full inference for the intensity function and any other functional of the process that might be of interest. We discuss applications to problems where inference for clustering in the spatial point pattern is of interest. Moreover, we consider applications of the methodology to extreme value analysis problems. We illustrate the modeling approach with three previously published data sets. Two of the data sets are from forestry and consist of locations of trees. The third data set consists of extremes from the Dow Jones index over a period of 1303 days.  相似文献   

14.
We consider estimation of the unknown parameters of Chen distribution [Chen Z. A new two-parameter lifetime distribution with bathtub shape or increasing failure rate function. Statist Probab Lett. 2000;49:155–161] with bathtub shape using progressive-censored samples. We obtain maximum likelihood estimates by making use of an expectation–maximization algorithm. Different Bayes estimates are derived under squared error and balanced squared error loss functions. It is observed that the associated posterior distribution appears in an intractable form. So we have used an approximation method to compute these estimates. A Metropolis–Hasting algorithm is also proposed and some more approximate Bayes estimates are obtained. Asymptotic confidence interval is constructed using observed Fisher information matrix. Bootstrap intervals are proposed as well. Sample generated from MH algorithm are further used in the construction of HPD intervals. Finally, we have obtained prediction intervals and estimates for future observations in one- and two-sample situations. A numerical study is conducted to compare the performance of proposed methods using simulations. Finally, we analyse real data sets for illustration purposes.  相似文献   

15.
Ordinary differential equations are arguably the most popular and useful mathematical tool for describing physical and biological processes in the real world. Often, these physical and biological processes are observed with errors, in which case the most natural way to model such data is via regression where the mean function is defined by an ordinary differential equation believed to provide an understanding of the underlying process. These regression based dynamical models are called differential equation models. Parameter inference from differential equation models poses computational challenges mainly due to the fact that analytic solutions to most differential equations are not available. In this paper, we propose an approximation method for obtaining the posterior distribution of parameters in differential equation models. The approximation is done in two steps. In the first step, the solution of a differential equation is approximated by the general one-step method which is a class of numerical numerical methods for ordinary differential equations including the Euler and the Runge-Kutta procedures; in the second step, nuisance parameters are marginalized using Laplace approximation. The proposed Laplace approximated posterior gives a computationally fast alternative to the full Bayesian computational scheme (such as Makov Chain Monte Carlo) and produces more accurate and stable estimators than the popular smoothing methods (called collocation methods) based on frequentist procedures. For a theoretical support of the proposed method, we prove that the Laplace approximated posterior converges to the actual posterior under certain conditions and analyze the relation between the order of numerical error and its Laplace approximation. The proposed method is tested on simulated data sets and compared with the other existing methods.  相似文献   

16.
Dissemination of information derived from large contingency tables formed from confidential data is a major responsibility of statistical agencies. In this paper we present solutions to several computational and algorithmic problems that arise in the dissemination of cross-tabulations (marginal sub-tables) from a single underlying table. These include data structures that exploit sparsity to support efficient computation of marginals and algorithms such as iterative proportional fitting, as well as a generalized form of the shuttle algorithm that computes sharp bounds on (small, confidentiality threatening) cells in the full table from arbitrary sets of released marginals. We give examples illustrating the techniques.  相似文献   

17.
The EM algorithm is a popular method for parameter estimation in situations where the data can be viewed as being incomplete. As each E-step visits each data point on a given iteration, the EM algorithm requires considerable computation time in its application to large data sets. Two versions, the incremental EM (IEM) algorithm and a sparse version of the EM algorithm, were proposed recently by Neal R.M. and Hinton G.E. in Jordan M.I. (Ed.), Learning in Graphical Models, Kluwer, Dordrecht, 1998, pp. 355–368 to reduce the computational cost of applying the EM algorithm. With the IEM algorithm, the available n observations are divided into B (B n) blocks and the E-step is implemented for only a block of observations at a time before the next M-step is performed. With the sparse version of the EM algorithm for the fitting of mixture models, only those posterior probabilities of component membership of the mixture that are above a specified threshold are updated; the remaining component-posterior probabilities are held fixed. In this paper, simulations are performed to assess the relative performances of the IEM algorithm with various number of blocks and the standard EM algorithm. In particular, we propose a simple rule for choosing the number of blocks with the IEM algorithm. For the IEM algorithm in the extreme case of one observation per block, we provide efficient updating formulas, which avoid the direct calculation of the inverses and determinants of the component-covariance matrices. Moreover, a sparse version of the IEM algorithm (SPIEM) is formulated by combining the sparse E-step of the EM algorithm and the partial E-step of the IEM algorithm. This SPIEM algorithm can further reduce the computation time of the IEM algorithm.  相似文献   

18.
SubBag is a technique by combining bagging and random subspace methods to generate ensemble classifiers with good generalization capability. In practice, a hyperparameter K of SubBag—the number of randomly selected features to create each base classifier—should be specified beforehand. In this article, we propose to employ the out-of-bag instances to determine the optimal value of K in SubBag. The experiments conducted with some UCI real-world data sets show that the proposed method can make SubBag achieve the optimal performance in nearly all the considered cases. Meanwhile, it occupied less computational sources than cross validation procedure.  相似文献   

19.
Effective component relabeling in Bayesian analyses of mixture models is critical to the routine use of mixtures in classification with analysis based on Markov chain Monte Carlo methods. The classification-based relabeling approach here is computationally attractive and statistically effective, and scales well with sample size and number of mixture components concordant with enabling routine analyses of increasingly large data sets. Building on the best of existing methods, practical relabeling aims to match data:component classification indicators in MCMC iterates with those of a defined reference mixture distribution. The method performs as well as or better than existing methods in small dimensional problems, while being practically superior in problems with larger data sets as the approach is scalable. We describe examples and computational benchmarks, and provide supporting code with efficient computational implementation of the algorithm that will be of use to others in practical applications of mixture models.  相似文献   

20.
Summary. Many geophysical regression problems require the analysis of large (more than 104 values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing bounded influence or high breakdown point estimators frequently lack the ability to eliminate extremely influential data and/or the computational efficiency to handle large data sets. A new bounded influence estimator is proposed that combines high asymptotic efficiency for normal data, high breakdown point behaviour with contaminated data and computational simplicity for large data sets. The algorithm combines a standard M -estimator to downweight data corresponding to extreme regression residuals and removal of overly influential predictor values (leverage points) on the basis of the statistics of the hat matrix diagonal elements. For this, the exact distribution of the hat matrix diagonal elements p ii for complex multivariate Gaussian predictor data is shown to be β ( p ii ,  m ,  N − m ), where N is the number of data and m is the number of parameters. Real geophysical data from an auroral zone magnetotelluric study which exhibit severe outlier and leverage point contamination are used to illustrate the estimator's performance. The examples also demonstrate the utility of looking at both the residual and the hat matrix distributions through quantile–quantile plots to diagnose robust regression problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号