首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
Overcoming biases and misconceptions in ecological studies   总被引:2,自引:1,他引:1  
The aggregate data study design provides an alternative group level analysis to ecological studies in the estimation of individual level health risks. An aggregate model is derived by aggregating a plausible individual level relative rate model within groups, such that population-based disease rates are modelled as functions of individual level covariate data. We apply an aggregate data method to a series of fictitious examples from a review paper by Greenland and Robins which illustrated the problems that can arise when using the results of ecological studies to make inference about individual health risks. We use simulated data based on their examples to demonstrate that the aggregate data approach can address many of the sources of bias that are inherent in typical ecological analyses, even though the limited between-region covariate variation in these examples reduces the efficiency of the aggregate study. The aggregate method has the potential to estimate exposure effects of interest in the presence of non-linearity, confounding at individual and group levels, effect modification, classical measurement error in the exposure and non-differential misclassification in the confounder.  相似文献   

2.
Benzene is classified as a group 1 human carcinogen by the International Agency for Research on Cancer, and it is now accepted that occupational exposure is associated with an increased risk of various leukaemias. However, occupational exposure accounts for less than 1% of all benzene exposures, the major sources being cigarette smoking and vehicle exhaust emissions. Whether such low level exposures to environmental benzene are also associated with the risk of leukaemia is currently not known. In this study, we investigate the relationship between benzene emissions arising from outdoor sources (predominantly road traffic and petrol stations) and the incidence of childhood leukaemia in Greater London. An ecological design was used because of the rarity of the disease, the difficulty of obtaining individual level measurements of benzene exposure and the availability of data. However, some methodological difficulties were encountered, including problems of case registration errors, the choice of geographical areas for analysis, exposure measurement errors and ecological bias. We use a Bayesian hierarchical modelling framework to address these issues, and we investigate the sensitivity of our inference to various modelling assumptions.  相似文献   

3.
The focus of geographical studies in epidemiology has recently moved towards looking for effects of exposures based on data taken at local levels of aggregation (i.e. small areas). This paper investigates how regression coefficients measuring covariate effects at the point level are modified under aggregation. Changing the level of aggregation can lead to completely different conclusions about exposure–effect relationships, a phenomenon often referred to as ecological bias. With partial knowledge of the within‐area distribution of the exposure variable, the notion of maximum entropy can be used to approximate that part of the distribution that is unknown. From the approximation, an expression for the ecological bias is obtained; simulations and an example show that the maximum‐entropy approximation is often better than other commonly used approximations.  相似文献   

4.
Over the past decades, various principles for causal effect estimation have been proposed, all differing in terms of how they adjust for measured confounders: either via traditional regression adjustment, by adjusting for the expected exposure given those confounders (e.g., the propensity score), or by inversely weighting each subject's data by the likelihood of the observed exposure, given those confounders. When the exposure is measured with error, this raises the question whether these different estimation strategies might be differently affected and whether one of them is to be preferred for that reason. In this article, we investigate this by comparing inverse probability of treatment weighted (IPTW) estimators and doubly robust estimators for the exposure effect in linear marginal structural mean models (MSM) with G-estimators, propensity score (PS) adjusted estimators and ordinary least squares (OLS) estimators for the exposure effect in linear regression models. We find analytically that these estimators are equally affected when exposure misclassification is independent of the confounders, but not otherwise. Simulation studies reveal similar results for time-varying exposures and when the model of interest includes a logistic link.  相似文献   

5.
Abstract

Markov processes offer a useful basis for modeling the progression of organisms through successive stages of their life cycle. When organisms are examined intermittently in developmental studies, likelihoods can be constructed based on the resulting panel data in terms of transition probability functions. In some settings however, organisms cannot be tracked individually due to a difficulty in identifying distinct individuals, and in such cases aggregate counts of the number of organisms in different stages of development are recorded at successive time points. We consider the setting in which such aggregate counts are available for each of a number of tanks in a developmental study. We develop methods which accommodate clustering of the transition rates within tanks using a marginal modeling approach followed by robust variance estimation, and through use of a random effects model. Composite likelihood is proposed as a basis of inference in both settings. An extension which incorporates mortality is also discussed. The proposed methods are shown to perform well in empirical studies and are applied in an illustrative example on the growth of the Arabidopsis thaliana plant.  相似文献   

6.
Summary.  The difference, if any, between men's and women's voting patterns is of particular interest to historians of gender and politics. For elections that were held before the introduction of opinion surveying in the 1940s, little data are available with which to estimate such differences. We apply six methods for ecological inference to estimate men's and women's voting rates in New Zealand (NZ), 1893–1919. NZ is an interesting case-study, since it was the first self-governing country where women could vote. Furthermore, NZ officials recorded the voting rates of men and women at elections, making it possible to compare estimates produced by methods for ecological inference with known true values, thus testing the efficacy of different methods for ecological inference for this data set. We find that the most popular methods for ecological inference, namely Goodman's ecological regression and King's parametric method, give poor estimates, as does the much debated neighbourhood method. However, King's non-parametric method, Chambers and Steel's semiparametric method and the Steel, Beh and Chambers homogeneous approach all gave good estimates that were close to the known values, with the homogeneous approach performing best overall. The success of these methods in this example suggests that ecological inference may be a viable option when investigating gender and voting. Moreover, researchers using ecological inference in other fields may do well to consider a range of statistical methods. This work is a significant NZ contribution to historical politics and the first quantitative contribution, in the area of NZ gender and politics.  相似文献   

7.
M-quantile models with application to poverty mapping   总被引:1,自引:0,他引:1  
Over the last decade there has been growing demand for estimates of population characteristics at small area level. Unfortunately, cost constraints in the design of sample surveys lead to small sample sizes within these areas and as a result direct estimation, using only the survey data, is inappropriate since it yields estimates with unacceptable levels of precision. Small area models are designed to tackle the small sample size problem. The most popular class of models for small area estimation is random effects models that include random area effects to account for between area variations. However, such models also depend on strong distributional assumptions, require a formal specification of the random part of the model and do not easily allow for outlier robust inference. An alternative approach to small area estimation that is based on the use of M-quantile models was recently proposed by Chambers and Tzavidis (Biometrika 93(2):255–268, 2006) and Tzavidis and Chambers (Robust prediction of small area means and distributions. Working paper, 2007). Unlike traditional random effects models, M-quantile models do not depend on strong distributional assumption and automatically provide outlier robust inference. In this paper we illustrate for the first time how M-quantile models can be practically employed for deriving small area estimates of poverty and inequality. The methodology we propose improves the traditional poverty mapping methods in the following ways: (a) it enables the estimation of the distribution function of the study variable within the small area of interest both under an M-quantile and a random effects model, (b) it provides analytical, instead of empirical, estimation of the mean squared error of the M-quantile small area mean estimates and (c) it employs a robust to outliers estimation method. The methodology is applied to data from the 2002 Living Standards Measurement Survey (LSMS) in Albania for estimating (a) district level estimates of the incidence of poverty in Albania, (b) district level inequality measures and (c) the distribution function of household per-capita consumption expenditure in each district. Small area estimates of poverty and inequality show that the poorest Albanian districts are in the mountainous regions (north and north east) with the wealthiest districts, which are also linked with high levels of inequality, in the coastal (south west) and southern part of country. We discuss the practical advantages of our methodology and note the consistency of our results with results from previous studies. We further demonstrate the usefulness of the M-quantile estimation framework through design-based simulations based on two realistic survey data sets containing small area information and show that the M-quantile approach may be preferable when the aim is to estimate the small area distribution function.  相似文献   

8.
Deprivation, ill-health and the ecological fallacy   总被引:3,自引:2,他引:1  
The use of ecological studies in explaining the relationship between deprivation and ill-health is widespread in many health applications. However, inferences drawn from these studies about individuals are susceptible to serious bias known as the ecological fallacy. Our paper demonstrates the ecological fallacy effect in this context but also shows how it can be considerably reduced by taking into account different population structures at the aggregate level. Two regression analyses of limiting long-term illness are performed, one at the individual level and one at the electoral ward level, using the 1991 UK census sample of anonymized records and the small area statistics. The analyses compare several measures of deprivation including the standard Carstairs index, with the separate variables which make up the indices, to determine their effectiveness in explaining rates of illness. Two of the deprivation scores are constructed using latent variable modelling techniques which enable a score to be generated at the individual level as well as at the ward level. It is shown that, given the right choice of socioeconomic variables and taking into account the age structure of the population, it should be possible to construct a single aggregate deprivation index that will explain most of the variation in rates of illness across the study region.  相似文献   

9.
ABSTRACT

In ecological studies, individual inference is made based on results from ecological models. Interpretation of the results requires caution since ecological analysis on group level may not hold in the individual level within the groups, leading to ecological fallacy. Using an ecological regression example for analyzing voting behaviors, we highlight that the explicit use of individual-level models is crucial in understanding the results of ecological studies. In particular, we clarify three relevant statistical issues for each individual-level models: assessment of the uncertainty of parameter estimates obtained from a wrong model, the use of shrinkage estimation method for simultaneous estimation of many parameters, and the necessity of sensitivity analysis rather than adhering to one seemingly most compelling assumption.  相似文献   

10.
Innovation diffusion represents a central topic both for researchers and for managers and policy makers. Traditionally, it has been examined using the successful Bass models (BM, GBM), based on an aggregate differential approach, which assures flexibility and reliable forecasts. More recently, the rising interest towards adoptions at the individual level has suggested the use of agent based models, like Cellular Automata models (CA), that are generally implemented through computer simulations. In this paper we present a link between a particular kind of CA and a separable non autonomous Riccati equation, whose general structure includes the Bass models as a special case. Through this link we propose an alternative to direct computer simulations, based on real data, and a new aggregate model, which simultaneously considers birth and death processes within the diffusion. The main results, referred to the closed form solution, the identification and the statistical analysis of our new model, may be both of theoretical and empirical interest. In particular, we examine two applied case studies, illustrating some forecasting improvements obtained.  相似文献   

11.
High-throughput profiling is now common in biomedical research. In this paper we consider the layout of an etiology study composed of a failure time response, and gene expression measurements. In current practice, a widely adopted approach is to select genes according to a preliminary marginal screening and a follow-up penalized regression for model building. Confounders, including for example clinical risk factors and environmental exposures, usually exist and need to be properly accounted for. We propose covariate-adjusted screening and variable selection procedures under the accelerated failure time model. While penalizing the high-dimensional coefficients to achieve parsimonious model forms, our procedure also properly adjust the low-dimensional confounder effects to achieve more accurate estimation of regression coefficients. We establish the asymptotic properties of our proposed methods and carry out simulation studies to assess the finite sample performance. Our methods are illustrated with a real gene expression data analysis where proper adjustment of confounders produces more meaningful results.  相似文献   

12.
Summary.  We illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides two main benefits. First, by including the individual level subsample data, the biases that are associated with linear ecological inference can be eliminated. Second, available ecological data can be used to design optimal subsampling schemes that maximize information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree on wages, showing that small, optimally chosen subsamples can be combined with ecological data to generate precise estimates relative to a simple random subsample.  相似文献   

13.
Epidemiology studies increasingly examine multiple exposures in relation to disease by selecting the exposures of interest in a thematic manner. For example, sun exposure, sunburn, and sun protection behavior could be themes for an investigation of sun-related exposures. Several studies now use pre-defined linear combinations of the exposures pertaining to the themes to estimate the effects of the individual exposures. Such analyses may improve the precision of the exposure effects, but they can lead to inflated bias and type I errors when the linear combinations are inaccurate. We investigate preliminary test estimators and empirical Bayes type shrinkage estimators as alternative approaches when it is desirable to exploit the thematic choice of exposures, but the accuracy of the pre-defined linear combinations is unknown. We show that the two types of estimator are intimately related under certain assumptions. The shrinkage estimator derived under the assumption of an exchangeable prior distribution gives precise estimates and is robust to misspecifications of the user-defined linear combinations. The precision gains and robustness of the shrinkage estimation approach are illustrated using data from the SONIC study, where the exposures are the individual questionnaire items and the outcome is (log) total back nevus count.  相似文献   

14.
周巍等 《统计研究》2015,32(7):81-86
遥感影像是大数据的一种,利用遥感对农作物播种面积进行估算常采用回归估计量或校准估计量,通常都需要将地面样本数据与遥感分类信息相结合。但对于大多数回归估计量,对省级总体的农作物面积估算只能满足对省级总体的精度要求而不能分解到更小区域,比如县和乡级。本文利用黑龙江省2011年的地面实测样本数据结合遥感分类结果,构建了单元层次的多响应变量的多元回归形式的小域模型,并将小域效应设定为固定形式。这样基于回归估计方法,既可以估算分县的主要作物播种面积,也可以使得各县播种面积估计结果相加就等于回归模型含义下的省级总体的总量估计。对黑龙江省玉米、水稻、大豆分县小域估计结果的精度评价(变异系数C.V),平均而言均可以满足县级精度要求。本文的结果表明小域估计方法在解决省级总体对全省和分县的农作物种植面积多级估算问题中具有很好的应用。  相似文献   

15.
Intraclass correlation coefficients (ICC) are employed in a wide range of behavioral, biomedical, psychosocial, and health care related research for assessing reliability of continuous outcomes. The linear mixed-effects model (LMM) is the most popular approach for inference about the ICC. However, since LMM is a normal distribution-based model and non-normal data are the norm rather than the exception in most studies, its applications to real study data always beg the question of inference validity. In this paper, we propose a distribution-free alternative to provide robust inference based on the functional response models. We illustrate the performance of the new approach using both real and simulated data.  相似文献   

16.
For multivariate survival data, we study the generalized method of moments (GMM) approach to estimation and inference based on the marginal additive hazards model. We propose an efficient iterative algorithm using closed‐form solutions, which dramatically reduces the computational burden. Asymptotic normality of the proposed estimators is established, and the corresponding variance–covariance matrix can be consistently estimated. Inference procedures are derived based on the asymptotic chi‐squared distribution of the GMM objective function. Simulation studies are conducted to empirically examine the finite sample performance of the proposed method, and a real data example from a dental study is used for illustration.  相似文献   

17.
A common population characteristic of interest in animal ecology studies pertains to the selection of resources. That is, given the resources available to animals, what do they ultimately choose to use? A variety of statistical approaches have been employed to examine this question and each has advantages and disadvantages with respect to the form of available data and the properties of estimators given model assumptions. A wealth of high resolution telemetry data are now being collected to study animal population movement and space use and these data present both challenges and opportunities for statistical inference. We summarize traditional methods for resource selection and then describe several extensions to deal with measurement uncertainty and an explicit movement process that exists in studies involving high-resolution telemetry data. Our approach uses a correlated random walk movement model to obtain temporally varying use and availability distributions that are employed in a weighted distribution context to estimate selection coefficients. The temporally varying coefficients are then weighted by their contribution to selection and combined to provide inference at the population level. The result is an intuitive and accessible statistical procedure that uses readily available software and is computationally feasible for large datasets. These methods are demonstrated using data collected as part of a large-scale mountain lion monitoring study in Colorado, USA.  相似文献   

18.
The Log-Gaussian Cox process is a commonly used model for the analysis of spatial point pattern data. Fitting this model is difficult because of its doubly stochastic property, that is, it is a hierarchical combination of a Poisson process at the first level and a Gaussian process at the second level. Various methods have been proposed to estimate such a process, including traditional likelihood-based approaches as well as Bayesian methods. We focus here on Bayesian methods and several approaches that have been considered for model fitting within this framework, including Hamiltonian Monte Carlo, the Integrated nested Laplace approximation, and Variational Bayes. We consider these approaches and make comparisons with respect to statistical and computational efficiency. These comparisons are made through several simulation studies as well as through two applications, the first examining ecological data and the second involving neuroimaging data.  相似文献   

19.
This paper considers the analysis of time to event data in the presence of collinearity between covariates. In linear and logistic regression models, the ridge regression estimator has been applied as an alternative to the maximum likelihood estimator in the presence of collinearity. The advantage of the ridge regression estimator over the usual maximum likelihood estimator is that the former often has a smaller total mean square error and is thus more precise. In this paper, we generalized this approach for addressing collinearity to the Cox proportional hazards model. Simulation studies were conducted to evaluate the performance of the ridge regression estimator. Our approach was motivated by an occupational radiation study conducted at Oak Ridge National Laboratory to evaluate health risks associated with occupational radiation exposure in which the exposure tends to be correlated with possible confounders such as years of exposure and attained age. We applied the proposed methods to this study to evaluate the association of radiation exposure with all-cause mortality.  相似文献   

20.
Ecological studies are based on characteristics of groups of individuals, which are common in various disciplines including epidemiology. It is of great interest for epidemiologists to study the geographical variation of a disease by accounting for the positive spatial dependence between neighbouring areas. However, the choice of scale of the spatial correlation requires much attention. In view of a lack of studies in this area, this study aims to investigate the impact of differing definitions of geographical scales using a multilevel model. We propose a new approach – the grid-based partitions and compare it with the popular census region approach. Unexplained geographical variation is accounted for via area-specific unstructured random effects and spatially structured random effects specified as an intrinsic conditional autoregressive process. Using grid-based modelling of random effects in contrast to the census region approach, we illustrate conditions where improvements are observed in the estimation of the linear predictor, random effects, parameters, and the identification of the distribution of residual risk and the aggregate risk in a study region. The study has found that grid-based modelling is a valuable approach for spatially sparse data while the statistical local area-based and grid-based approaches perform equally well for spatially dense data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号