首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this study, an evaluation of Bayesian hierarchical models is made based on simulation scenarios to compare single-stage and multi-stage Bayesian estimations. Simulated datasets of lung cancer disease counts for men aged 65 and older across 44 wards in the London Health Authority were analysed using a range of spatially structured random effect components. The goals of this study are to determine which of these single-stage models perform best given a certain simulating model, how estimation methods (single- vs. multi-stage) compare in yielding posterior estimates of fixed effects in the presence of spatially structured random effects, and finally which of two spatial prior models – the Leroux or ICAR model, perform best in a multi-stage context under different assumptions concerning spatial correlation. Among the fitted single-stage models without covariates, we found that when there is low amount of variability in the distribution of disease counts, the BYM model is relatively robust to misspecification in terms of DIC, while the Leroux model is the least robust to misspecification. When these models were fit to data generated from models with covariates, we found that when there was one set of covariates – either spatially correlated or non-spatially correlated, changing the values of the fixed coefficients affected the ability of either the Leroux or ICAR model to fit the data well in terms of DIC. When there were multiple sets of spatially correlated covariates in the simulating model, however, we could not distinguish the goodness of fit to the data between these single-stage models. We found that the multi-stage modelling process via the Leroux and ICAR models generally reduced the variance of the posterior estimated fixed effects for data generated from models with covariates and a UH term compared to analogous single-stage models. Finally, we found the multi-stage Leroux model compares favourably to the multi-stage ICAR model in terms of DIC. We conclude that the mutli-stage Leroux model should be seriously considered in applications of Bayesian disease mapping when an investigator desires to fit a model with both fixed effects and spatially structured random effects to Poisson count data.  相似文献   

2.
It is common to fit generalized linear models with binomial and Poisson responses, where the data show a variability that is greater than the theoretical variability assumed by the model. This phenomenon, known as overdispersion, may spoil inferences about the model by considering significant parameters associated with variables that have no significant effect on the dependent variable. This paper explains some methods to detect overdispersion and presents and evaluates three well-known methodologies that have shown their usefulness in correcting this problem, using random mean models, quasi-likelihood methods and a double exponential family. In addition, it proposes some new Bayesian model extensions that have proved their usefulness in correcting the overdispersion problem. Finally, using the information provided by the National Demographic and Health Survey 2005, the departmental factors that have an influence on the mortality of children under 5 years and female postnatal period screening are determined. Based on the results, extensions that generalize some of the aforementioned models are also proposed, and their use is motivated by the data set under study. The results conclude that the proposed overdispersion models provide a better statistical fit of the data.  相似文献   

3.
Multivariate longitudinal or clustered data are commonly encountered in clinical trials and toxicological studies. Typically, there is no single standard endpoint to assess the toxicity or efficacy of the compound of interest, but co‐primary endpoints are available to assess the toxic effects or the working of the compound. Modeling the responses jointly is thus appealing to draw overall inferences using all responses and to capture the association among the responses. Non‐Gaussian outcomes are often modeled univariately using exponential family models. To accommodate both the overdispersion and hierarchical structure in the data, Molenberghs et al. A family of generalized linear models for repeated measures with normal and conjugate random effects. Statistical Science 2010; 25:325–347 proposed using two separate sets of random effects. This papers considers a model for multivariate data with hierarchically clustered and overdispersed non‐Gaussian data. Gamma random effect for the over‐dispersion and normal random effects for the clustering in the data are being used. The two outcomes are jointly analyzed by assuming that the normal random effects for both endpoints are correlated. The association structure between the response is analytically derived. The fit of the joint model to data from a so‐called comet assay are compared with the univariate analysis of the two outcomes. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
Advances in computation mean that it is now possible to fit a wide range of complex models to data, but there remains the problem of selecting a model on which to base reported inferences. Following an early suggestion of Box & Tiao, it seems reasonable to seek 'inference robustness' in reported models, so that alternative assumptions that are reasonably well supported would not lead to substantially different conclusions. We propose a four-stage modelling strategy in which we iteratively assess and elaborate an initial model, measure the support for each of the resulting family of models, assess the influence of adopting alternative models on the conclusions of primary interest, and identify whether an approximate model can be reported. The influence-support plot is then introduced as a tool to aid model comparison. The strategy is semi-formal, in that it could be embedded in a decision-theoretic framework but requires substantive input for any specific application. The one restriction of the strategy is that the quantity of interest, or 'focus', must retain its interpretation across all candidate models. It is, therefore, applicable to analyses whose goal is prediction, or where a set of common model parameters are of interest and candidate models make alternative distributional assumptions. The ideas are illustrated by two examples. Technical issues include the calibration of the Kullback-Leibler divergence between marginal distributions, and the use of alternative measures of support for the range of models fitted.  相似文献   

5.
This paper presents a methodology for model fitting and inference in the context of Bayesian models of the type f(Y | X,θ)f(X|θ)f(θ), where Y is the (set of) observed data, θ is a set of model parameters and X is an unobserved (latent) stationary stochastic process induced by the first order transition model f(X (t+1)|X (t),θ), where X (t) denotes the state of the process at time (or generation) t. The crucial feature of the above type of model is that, given θ, the transition model f(X (t+1)|X (t),θ) is known but the distribution of the stochastic process in equilibrium, that is f(X|θ), is, except in very special cases, intractable, hence unknown. A further point to note is that the data Y has been assumed to be observed when the underlying process is in equilibrium. In other words, the data is not collected dynamically over time. We refer to such specification as a latent equilibrium process (LEP) model. It is motivated by problems in population genetics (though other applications are discussed), where it is of interest to learn about parameters such as mutation and migration rates and population sizes, given a sample of allele frequencies at one or more loci. In such problems it is natural to assume that the distribution of the observed allele frequencies depends on the true (unobserved) population allele frequencies, whereas the distribution of the true allele frequencies is only indirectly specified through a transition model. As a hierarchical specification, it is natural to fit the LEP within a Bayesian framework. Fitting such models is usually done via Markov chain Monte Carlo (MCMC). However, we demonstrate that, in the case of LEP models, implementation of MCMC is far from straightforward. The main contribution of this paper is to provide a methodology to implement MCMC for LEP models. We demonstrate our approach in population genetics problems with both simulated and real data sets. The resultant model fitting is computationally intensive and thus, we also discuss parallel implementation of the procedure in special cases.  相似文献   

6.
In this paper, we study the properties of a special class of frailty models when the frailty is common to several failure times. The models are closely linked to Archimedean copula models. We establish a useful formula for cumulative baseline hazard functions and develop a new estimator for cumulative baseline hazard functions in bivariate frailty regression models. Based on our proposed estimator, we present a graphical model checking procedure. We fit a leukemia data set using our model and end our paper with some discussions.  相似文献   

7.
The problem of modelling football data has become increasingly popular in the last few years and many different models have been proposed with the aim of estimating the characteristics that bring a team to lose or win a game, or to predict the score of a particular match. We propose a Bayesian hierarchical model to fulfil both these aims and test its predictive strength based on data about the Italian Serie A 1991–1992 championship. To overcome the issue of overshrinkage produced by the Bayesian hierarchical model, we specify a more complex mixture model that results in a better fit to the observed data. We test its performance using an example of the Italian Serie A 2007–2008 championship.  相似文献   

8.
In this paper, we propose to use a special class of bivariate frailty models to study dependent censored data. The proposed models are closely linked to Archimedean copula models. We give sufficient conditions for the identifiability of this type of competing risks models. The proposed conditions are derived based on a property shared by Archimedean copula models and satisfied by several well‐known bivariate frailty models. Compared with the models studied by Heckman and Honoré and Abbring and van den Berg, our models are more restrictive but can be identified with a discrete (even finite) covariate. Under our identifiability conditions, expectation–maximization (EM) algorithm provides us with consistent estimates of the unknown parameters. Simulation studies have shown that our estimation procedure works quite well. We fit a dependent censored leukaemia data set using the Clayton copula model and end our paper with some discussions. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics  相似文献   

9.
Statistical modelling of sports data has become more and more popular in the recent years and different types of models have been proposed to achieve a variety of objectives: from identifying the key characteristics which lead a team to win or lose to predicting the outcome of a game or the team rankings in national leagues. Although not as popular as football or basketball, volleyball is a team sport with both national and international level competitions in almost every country. However, there is almost no study investigating the prediction of volleyball game outcomes and team rankings in national leagues. We propose a Bayesian hierarchical model for the prediction of the rankings of volleyball national teams, which also allows to estimate the results of each match in the league. We consider two alternative model specifications of different complexity which are validated using data from the women''s volleyball Italian Serie A1 2017–2018 season.  相似文献   

10.
A bivariate generalized linear model is developed as a mixture distribution with one component of the mixture being discrete with probability mass only at the origin. The use of the proposed model is illustrated by analyzing local area meteorological measurements with constant correlation structure that incorporates predictor variables. The Monte Carlo study is performed to evaluate the inferential efficiency of model parameters for two types of true models. These results suggest that the estimates of regression parameters are consistent and the efficiency of the inference increases for the proposed model for ρ≥0.50 especially in larger samples. As an illustration of a bivariate generalized linear model, we analyze a precipitation monitoring data of adjacent local stations for Tokyo and Yokohama.  相似文献   

11.
ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size.  相似文献   

12.
A class of bivariate continuous-discrete distributions is proposed to fit Poisson dynamic models in a single unified framework via bivariate mixture transition distributions (BMTDs). Potential advantages of this class over the current models include its ability to capture stretches, bursts and nonlinear patterns characterized by Internet network traffic, high-frequency financial data and many others. It models the inter-arrival times and the number of arrivals (marks) in a single unified model which benefits from the dependence structure of the data. The continuous marginal distributions of this class include as special cases the exponential, gamma, Weibull and Rayleigh distributions (for the inter-arrival times), whereas the discrete marginal distributions are geometric and negative binomial. The conditional distributions are Poisson and Erlang. Maximum-likelihood estimation is discussed and parameter estimates are obtained using an expectation–maximization algorithm, while the standard errors are estimated using the missing information principle. It is shown via real data examples that the proposed BMTD models appear to capture data features better than other competing models.  相似文献   

13.
The development of models and methods for cure rate estimation has recently burgeoned into an important subfield of survival analysis. Much of the literature focuses on the standard mixture model. Recently, process-based models have been suggested. We focus on several models based on first passage times for Wiener processes. Whitmore and others have studied these models in a variety of contexts. Lee and Whitmore (Stat Sci 21(4):501–513, 2006) give a comprehensive review of a variety of first hitting time models and briefly discuss their potential as cure rate models. In this paper, we study the Wiener process with negative drift as a possible cure rate model but the resulting defective inverse Gaussian model is found to provide a poor fit in some cases. Several possible modifications are then suggested, which improve the defective inverse Gaussian. These modifications include: the inverse Gaussian cure rate mixture model; a mixture of two inverse Gaussian models; incorporation of heterogeneity in the drift parameter; and the addition of a second absorbing barrier to the Wiener process, representing an immunity threshold. This class of process-based models is a useful alternative to the standard model and provides an improved fit compared to the standard model when applied to many of the datasets that we have studied. Implementation of this class of models is facilitated using expectation-maximization (EM) algorithms and variants thereof, including the gradient EM algorithm. Parameter estimates for each of these EM algorithms are given and the proposed models are applied to both real and simulated data, where they perform well.  相似文献   

14.
The purpose of this paper is threefold. First, we obtain the asymptotic properties of the modified model selection criteria proposed by Hurvich et al. (1990. Improved estimators of Kullback-Leibler information for autoregressive model selection in small samples. Biometrika 77, 709–719) for autoregressive models. Second, we provide some highlights on the better performance of this modified criteria. Third, we extend the modification introduced by these authors to model selection criteria commonly used in the class of self-exciting threshold autoregressive (SETAR) time series models. We show the improvements of the modified criteria in their finite sample performance. In particular, for small and medium sample size the frequency of selecting the true model improves for the consistent criteria and the root mean square error (RMSE) of prediction improves for the efficient criteria. These results are illustrated via simulation with SETAR models in which we assume that the threshold and the parameters are unknown.  相似文献   

15.
We propose models to analyze animal growth data with the aim of estimating and predicting quantities of biological and economical interest such as the maturing rate and asymptotic weight. It is also studied the effect of environmental factors of relevant influence in the growth process. The models considered in this paper are based on an extension and specialization of the dynamic hierarchical model (Gamerman & Migon, 1993) to a non–linear growth curve setting, where some of the growth curve parameters are considered exchangeable among the units. The inference for these models are approximate conjugate analysis based on Taylor series expansions and linear Bayes procedures  相似文献   

16.
We consider settings where it is of interest to fit and assess regression submodels that arise as various explanatory variables are excluded from a larger regression model. The larger model is referred to as the full model; the submodels are the reduced models. We show that a computationally efficient approximation to the regression estimates under any reduced model can be obtained from a simple weighted least squares (WLS) approach based on the estimated regression parameters and covariance matrix from the full model. This WLS approach can be considered an extension to unbiased estimating equations of a first-order Taylor series approach proposed by Lawless and Singhal. Using data from the 2010 Nationwide Inpatient Sample (NIS), a 20% weighted, stratified, cluster sample of approximately 8 million hospital stays from approximately 1000 hospitals, we illustrate the WLS approach when fitting interval censored regression models to estimate the effect of type of surgery (robotic versus nonrobotic surgery) on hospital length-of-stay while adjusting for three sets of covariates: patient-level characteristics, hospital characteristics, and zip-code level characteristics. Ordinarily, standard fitting of the reduced models to the NIS data takes approximately 10 hours; using the proposed WLS approach, the reduced models take seconds to fit.  相似文献   

17.
The paper proposes a Bayesian quantile regression method for hierarchical linear models. Existing approaches of hierarchical linear quantile regression models are scarce and most of them were not from the perspective of Bayesian thoughts, which is important for hierarchical models. In this paper, based on Bayesian theories and Markov Chain Monte Carlo methods, we introduce Asymmetric Laplace distributed errors to simulate joint posterior distributions of population parameters and across-unit parameters and then derive their posterior quantile inferences. We run a simulation as the proposed method to examine the effects on parameters induced by units and quantile levels; the method is also applied to study the relationship between Chinese rural residents' family annual income and their cultivated areas. Both the simulation and real data analysis indicate that the method is effective and accurate.  相似文献   

18.
Even though integer-valued time series are common in practice, the methods for their analysis have been developed only in recent past. Several models for stationary processes with discrete marginal distributions have been proposed in the literature. Such processes assume the parameters of the model to remain constant throughout the time period. However, this need not be true in practice. In this paper, we introduce non-stationary integer-valued autoregressive (INAR) models with structural breaks to model a situation, where the parameters of the INAR process do not remain constant over time. Such models are useful while modelling count data time series with structural breaks. The Bayesian and Markov Chain Monte Carlo (MCMC) procedures for the estimation of the parameters and break points of such models are discussed. We illustrate the model and estimation procedure with the help of a simulation study. The proposed model is applied to the two real biometrical data sets.  相似文献   

19.
For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes.The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and with empirical examples from archeology and the psychology of choice.  相似文献   

20.
The family of weighted Poisson distributions offers great flexibility in modeling discrete data due to its potential to capture over/under-dispersion by an appropriate selection of the weight function. In this paper, we introduce a flexible weighted Poisson distribution and further study its properties by using it in the context of cure rate modeling under a competing cause scenario. A special case of the new distribution is the COM-Poisson distribution which in turn encompasses the Bernoulli, Poisson, and geometric distributions; hence, many of the well-studied cure rate models may be seen as special cases of the proposed model. We focus on the estimation, through the maximum likelihood method, of the cured proportion and the properties of the failure time of the susceptibles/non cured individuals; a profile likelihood approach is also adopted for estimating the parameters of the weighted Poisson distribution. A Monte Carlo simulation study demonstrates the accuracy of the proposed inferential method. Finally, as an illustration, we fit the proposed model to a cutaneous melanoma data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号