首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
This paper considers fitting generalized linear models to binary data in nonstandard settings such as case–control samples, studies with misclassified responses and misspecified models. We develop simple methods for fitting models to case–control data and show that a closure property holds for generalized linear models in the nonstandard settings, i.e. if the responses follow a generalized linear model in the population of interest, then so will the observed response in the non-standard setting, but with a modified link function. These results imply that we can analyse data and study problems in the non-standard settings by using classical generalized linear model methods such as the iteratively reweighted least squares algorithm. Example data illustrate the results.  相似文献   

2.
To model extreme spatial events, a general approach is to use the generalized extreme value (GEV) distribution with spatially varying parameters such as spatial GEV models and latent variable models. In the literature, this approach is mostly used to capture spatial dependence for only one type of event. This limits the applications to air pollutants data as different pollutants may chemically interact with each other. A recent advancement in spatial extremes modelling for multiple variables is the multivariate max-stable processes. Similarly to univariate max-stable processes, the multivariate version also assumes standard distributions such as unit-Fréchet as margins. Additional modelling is required for applications such as spatial prediction. In this paper, we extend the marginal methods such as spatial GEV models and latent variable models into a multivariate setting based on copulas so that it is capable of handling both the spatial dependence and the dependence among multiple pollutants. We apply our proposed model to analyse weekly maxima of nitrogen dioxide, sulphur dioxide, respirable suspended particles, fine suspended particles, and ozone collected in Pearl River Delta in China.  相似文献   

3.
In functional magnetic resonance imaging, spatial activation patterns are commonly estimated using a non-parametric smoothing approach. Significant peaks or clusters in the smoothed image are subsequently identified by testing the null hypothesis of lack of activation in every volume element of the scans. A weakness of this approach is the lack of a model for the activation pattern; this makes it difficult to determine the variance of estimates, to test specific neuroscientific hypotheses or to incorporate prior information about the brain area under study in the analysis. These issues may be addressed by formulating explicit spatial models for the activation and using simulation methods for inference. We present one such approach, based on a marked point process prior. Informally, one may think of the points as centres of activation, and the marks as parameters describing the shape and area of the surrounding cluster. We present an MCMC algorithm for making inference in the model and compare the approach with a traditional non-parametric method, using both simulated and visual stimulation data. Finally we discuss extensions of the model and the inferential framework to account for non-stationary responses and spatio-temporal correlation.  相似文献   

4.
Summary. Motivated by the autologistic model for the analysis of spatial binary data on the two-dimensional lattice, we develop efficient computational methods for calculating the normalizing constant for models for discrete data defined on the cylinder and lattice. Because the normalizing constant is generally unknown analytically, statisticians have developed various ad hoc methods to overcome this difficulty. Our aim is to provide computationally and statistically efficient methods for calculating the normalizing constant so that efficient likelihood-based statistical methods are then available for inference. We extend the so-called transition method to find a feasible computational method of obtaining the normalizing constant for the cylinder boundary condition. To extend the result to the free-boundary condition on the lattice we use an efficient path sampling Markov chain Monte Carlo scheme. The methods are generally applicable to association patterns other than spatial, such as clustered binary data, and to variables taking three or more values described by, for example, Potts models.  相似文献   

5.
Large spatial datasets are typically modelled through a small set of knot locations; often these locations are specified by the investigator by arbitrary criteria. Existing methods of estimating the locations of knots assume their number is known a priori, or are otherwise computationally intensive. We develop a computationally efficient method of estimating both the location and number of knots for spatial mixed effects models. Our proposed algorithm, Threshold Knot Selection (TKS), estimates knot locations by identifying clusters of large residuals and placing a knot in the centroid of those clusters. We conduct a simulation study showing TKS in relation to several comparable methods of estimating knot locations. Our case study utilizes data of particulate matter concentrations collected during the course of the response and clean-up effort from the 2010 Deepwater Horizon oil spill in the Gulf of Mexico.  相似文献   

6.
Spatial data and non parametric methods arise frequently in studies of different areas and it is a common practice to analyze such data with semi-parametric spatial autoregressive (SPSAR) models. We propose the estimations of SPSAR models based on maximum likelihood estimation (MLE) and kernel estimation. The estimation of spatial regression coefficient ρ was done by optimizing the concentrated log-likelihood function with respect to ρ. Furthermore, under appropriate conditions, we derive the limiting distributions of our estimators for both the parametric and non parametric components in the model.  相似文献   

7.
We demonstrate the use of auxiliary (or latent) variables for sampling non-standard densities which arise in the context of the Bayesian analysis of non-conjugate and hierarchical models by using a Gibbs sampler. Their strategic use can result in a Gibbs sampler having easily sampled full conditionals. We propose such a procedure to simplify or speed up the Markov chain Monte Carlo algorithm. The strength of this approach lies in its generality and its ease of implementation. The aim of the paper, therefore, is to provide an alternative sampling algorithm to rejection-based methods and other sampling approaches such as the Metropolis–Hastings algorithm.  相似文献   

8.
Spatio-temporal processes are often high-dimensional, exhibiting complicated variability across space and time. Traditional state-space model approaches to such processes in the presence of uncertain data have been shown to be useful. However, estimation of state-space models in this context is often problematic since parameter vectors and matrices are of high dimension and can have complicated dependence structures. We propose a spatio-temporal dynamic model formulation with parameter matrices restricted based on prior scientific knowledge and/or common spatial models. Estimation is carried out via the expectation–maximization (EM) algorithm or general EM algorithm. Several parameterization strategies are proposed and analytical or computational closed form EM update equations are derived for each. We apply the methodology to a model based on an advection–diffusion partial differential equation in a simulation study and also to a dimension-reduced model for a Palmer Drought Severity Index (PDSI) data set.  相似文献   

9.
Mixture models are flexible tools in density estimation and classification problems. Bayesian estimation of such models typically relies on sampling from the posterior distribution using Markov chain Monte Carlo. Label switching arises because the posterior is invariant to permutations of the component parameters. Methods for dealing with label switching have been studied fairly extensively in the literature, with the most popular approaches being those based on loss functions. However, many of these algorithms turn out to be too slow in practice, and can be infeasible as the size and/or dimension of the data grow. We propose a new, computationally efficient algorithm based on a loss function interpretation, and show that it can scale up well in large data set scenarios. Then, we review earlier solutions which can scale up well for large data set, and compare their performances on simulated and real data sets. We conclude with some discussions and recommendations of all the methods studied.  相似文献   

10.
Cluster analysis is one of the most widely used method in statistical analyses, in which homogeneous subgroups are identified in a heterogeneous population. Due to the existence of the continuous and discrete mixed data in many applications, so far, some ordinary clustering methods such as, hierarchical methods, k-means and model-based methods have been extended for analysis of mixed data. However, in the available model-based clustering methods, by increasing the number of continuous variables, the number of parameters increases and identifying as well as fitting an appropriate model may be difficult. In this paper, to reduce the number of the parameters, for the model-based clustering mixed data of continuous (normal) and nominal data, a set of parsimonious models is introduced. Models in this set are extended, using the general location model approach, for modeling distribution of mixed variables and applying factor analyzer structure for covariance matrices. The ECM algorithm is used for estimating the parameters of these models. In order to show the performance of the proposed models for clustering, results from some simulation studies and analyzing two real data sets are presented.  相似文献   

11.
Abstract.  Much recent methodological progress in the analysis of infectious disease data has been due to Markov chain Monte Carlo (MCMC) methodology. In this paper, it is illustrated that rejection sampling can also be applied to a family of inference problems in the context of epidemic models, avoiding the issues of convergence associated with MCMC methods. Specifically, we consider models for epidemic data arising from a population divided into households. The models allow individuals to be potentially infected both from outside and from within the household. We develop methodology for selection between competing models via the computation of Bayes factors. We also demonstrate how an initial sample can be used to adjust the algorithm and improve efficiency. The data are assumed to consist of the final numbers ultimately infected within a sample of households in some community. The methods are applied to data taken from outbreaks of influenza.  相似文献   

12.
Nearest Neighbor Adjusted Best Linear Unbiased Prediction   总被引:1,自引:0,他引:1  
Statistical inference for linear models has classically focused on either estimation or hypothesis testing of linear combinations of fixed effects or of variance components for random effects. A third form of inference—prediction of linear combinations of fixed and random effects—has important advantages over conventional estimators in many applications. None of these approaches will result in accurate inference if the data contain strong, unaccounted for local gradients, such as spatial trends in field-plot data. Nearest neighbor methods to adjust for such trends have been widely discussed in recent literature. So far, however, these methods have been developed exclusively for classical estimation and hypothesis testing. In this article a method of obtaining nearest neighbor adjusted (NNA) predictors, along the lines of “best linear unbiased prediction,” or BLUP, is developed. A simulation study comparing “NNABLUP” to conventional NNA methods and to non-NNA alternatives suggests considerable potential for improved efficiency.  相似文献   

13.
Linear mixed models are regularly applied to animal and plant breeding data to evaluate genetic potential. Residual maximum likelihood (REML) is the preferred method for estimating variance parameters associated with this type of model. Typically an iterative algorithm is required for the estimation of variance parameters. Two algorithms which can be used for this purpose are the expectation‐maximisation (EM) algorithm and the parameter expanded EM (PX‐EM) algorithm. Both, particularly the EM algorithm, can be slow to converge when compared to a Newton‐Raphson type scheme such as the average information (AI) algorithm. The EM and PX‐EM algorithms require specification of the complete data, including the incomplete and missing data. We consider a new incomplete data specification based on a conditional derivation of REML. We illustrate the use of the resulting new algorithm through two examples: a sire model for lamb weight data and a balanced incomplete block soybean variety trial. In the cases where the AI algorithm failed, a REML PX‐EM based on the new incomplete data specification converged in 28% to 30% fewer iterations than the alternative REML PX‐EM specification. For the soybean example a REML EM algorithm using the new specification converged in fewer iterations than the current standard specification of a REML PX‐EM algorithm. The new specification integrates linear mixed models, Henderson's mixed model equations, REML and the REML EM algorithm into a cohesive framework.  相似文献   

14.
In this paper we propose a new robust technique for the analysis of spatial data through simultaneous autoregressive (SAR) models, which extends the Forward Search approach of Cerioli and Riani (1999) and Atkinson and Riani (2000). Our algorithm starts from a subset of outlier-free observations and then selects additional observations according to their degree of agreement with the postulated model. A number of useful diagnostics which are monitored along the search help to identify masked spatial outliers and high leverage sites. In contrast to other robust techniques, our method is particularly suited for the analysis of complex multidimensional systems since each step is performed through statistically and computationally efficient procedures, such as maximum likelihood. The main contribution of this paper is the development of joint robust estimation of both trend and autocorrelation parameters in spatial linear models. For this purpose we suggest a novel definition of the elemental sets of the Forward Search, which relies on blocks of contiguous spatial locations.  相似文献   

15.
Important progress has been made with model averaging methods over the past decades. For spatial data, however, the idea of model averaging has not been applied well. This article studies model averaging methods for the spatial geostatistical linear model. A spatial Mallows criterion is developed to choose weights for the model averaging estimator. The resulting estimator can achieve asymptotic optimality in terms of L2 loss. Simulation experiments reveal that our proposed estimator is superior to the model averaging estimator by the Mallows criterion developed for ordinary linear models [Hansen, 2007] and the model selection estimator using the corrected Akaike's information criterion, developed for geostatistical linear models [Hoeting et al., 2006]. The Canadian Journal of Statistics 47: 336–351; 2019 © 2019 Statistical Society of Canada  相似文献   

16.
Model choice is one of the most crucial aspect in any statistical data analysis. It is well known that most models are just an approximation to the true data-generating process but among such model approximations, it is our goal to select the ‘best’ one. Researchers typically consider a finite number of plausible models in statistical applications, and the related statistical inference depends on the chosen model. Hence, model comparison is required to identify the ‘best’ model among several such candidate models. This article considers the problem of model selection for spatial data. The issue of model selection for spatial models has been addressed in the literature by the use of traditional information criteria-based methods, even though such criteria have been developed based on the assumption of independent observations. We evaluate the performance of some of the popular model selection critera via Monte Carlo simulation experiments using small to moderate samples. In particular, we compare the performance of some of the most popular information criteria such as Akaike information criterion (AIC), Bayesian information criterion, and corrected AIC in selecting the true model. The ability of these criteria to select the correct model is evaluated under several scenarios. This comparison is made using various spatial covariance models ranging from stationary isotropic to nonstationary models.  相似文献   

17.
We consider analysis of complex stochastic models based upon partial information. MCMC and reversible jump MCMC are often the methods of choice for such problems, but in some situations they can be difficult to implement; and suffer from problems such as poor mixing, and the difficulty of diagnosing convergence. Here we review three alternatives to MCMC methods: importance sampling, the forward-backward algorithm, and sequential Monte Carlo (SMC). We discuss how to design good proposal densities for importance sampling, show some of the range of models for which the forward-backward algorithm can be applied, and show how resampling ideas from SMC can be used to improve the efficiency of the other two methods. We demonstrate these methods on a range of examples, including estimating the transition density of a diffusion and of a discrete-state continuous-time Markov chain; inferring structure in population genetics; and segmenting genetic divergence data.  相似文献   

18.
In this paper, we consider a new mixture of varying coefficient models, in which each mixture component follows a varying coefficient model and the mixing proportions and dispersion parameters are also allowed to be unknown smooth functions. We systematically study the identifiability, estimation and inference for the new mixture model. The proposed new mixture model is rather general, encompassing many mixture models as its special cases such as mixtures of linear regression models, mixtures of generalized linear models, mixtures of partially linear models and mixtures of generalized additive models, some of which are new mixture models by themselves and have not been investigated before. The new mixture of varying coefficient model is shown to be identifiable under mild conditions. We develop a local likelihood procedure and a modified expectation–maximization algorithm for the estimation of the unknown non‐parametric functions. Asymptotic normality is established for the proposed estimator. A generalized likelihood ratio test is further developed for testing whether some of the unknown functions are constants. We derive the asymptotic distribution of the proposed generalized likelihood ratio test statistics and prove that the Wilks phenomenon holds. The proposed methodology is illustrated by Monte Carlo simulations and an analysis of a CO2‐GDP data set.  相似文献   

19.
Summary. Varying-coefficient linear models arise from multivariate nonparametric regression, non-linear time series modelling and forecasting, functional data analysis, longitudinal data analysis and others. It has been a common practice to assume that the varying coefficients are functions of a given variable, which is often called an index . To enlarge the modelling capacity substantially, this paper explores a class of varying-coefficient linear models in which the index is unknown and is estimated as a linear combination of regressors and/or other variables. We search for the index such that the derived varying-coefficient model provides the least squares approximation to the underlying unknown multidimensional regression function. The search is implemented through a newly proposed hybrid backfitting algorithm. The core of the algorithm is the alternating iteration between estimating the index through a one-step scheme and estimating coefficient functions through one-dimensional local linear smoothing. The locally significant variables are selected in terms of a combined use of the t -statistic and the Akaike information criterion. We further extend the algorithm for models with two indices. Simulation shows that the methodology proposed has appreciable flexibility to model complex multivariate non-linear structure and is practically feasible with average modern computers. The methods are further illustrated through the Canadian mink–muskrat data in 1925–1994 and the pound–dollar exchange rates in 1974–1983.  相似文献   

20.
Spatial modeling is widely used in environmental sciences, biology, and epidemiology. Generalized linear mixed models are employed to account for spatial variations of point-referenced data called spatial generalized linear mixed models (SGLMMs). Frequentist analysis of these type of data is computationally difficult. On the other hand, the advent of the Markov chain Monte Carlo algorithm has made the Bayesian analysis of SGLMM computationally convenient. Recent introduction of the method of data cloning, which leads to maximum likelihood estimate, has made frequentist analysis of mixed models also equally computationally convenient. Recently, the data cloning was employed to estimate model parameters in SGLMMs, however, the prediction of spatial random effects and kriging are also very important. In this article, we propose a frequentist approach based on data cloning to predict (and provide prediction intervals) spatial random effects and kriging. We illustrate this approach using a real dataset and also by a simulation study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号