期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Overcoming biases and misconceptions in ecological studies 总被引：2，自引：1，他引：1

Katherine A. Guthrie & Lianne Sheppard 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(1):141-154

The aggregate data study design provides an alternative group level analysis to ecological studies in the estimation of individual level health risks. An aggregate model is derived by aggregating a plausible individual level relative rate model within groups, such that population-based disease rates are modelled as functions of individual level covariate data. We apply an aggregate data method to a series of fictitious examples from a review paper by Greenland and Robins which illustrated the problems that can arise when using the results of ecological studies to make inference about individual health risks. We use simulated data based on their examples to demonstrate that the aggregate data approach can address many of the sources of bias that are inherent in typical ecological analyses, even though the limited between-region covariate variation in these examples reduces the efficiency of the aggregate study. The aggregate method has the potential to estimate exposure effects of interest in the presence of non-linearity, confounding at individual and group levels, effect modification, classical measurement error in the exposure and non-differential misclassification in the confounder. 相似文献

2.

A parallel analysis of individual and ecological data on residential radon and lung cancer in south-west England

Sarah Darby Harz Deo Richard Doll & Elise Whitley 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(1):205-207

Parallel individual and ecological analyses of data on residential radon have been performed using information on cases of lung cancer and population controls from a recent study in south-west England. For the individual analysis the overall results indicated that the relative risk of lung cancer at 100 Bq m⁻³ compared with at 0 Bq m⁻³ was 1.12 (95% confidence interval (0.99, 1.27)) after adjusting for age, sex, smoking, county of residence and social class. In the ecological analysis substantial bias in the estimated effect of radon was present for one of the two counties involved unless an additional variable, urban–rural status, was included in the model, although this variable was not an important confounder in the individual level analysis. Most of the methods that have been recommended for overcoming the limitations of ecological studies would not in practice have proved useful in identifying this variable as an appreciable source of bias. 相似文献

3.

Deprivation, ill-health and the ecological fallacy 总被引：3，自引：2，他引：1

Gillian Lancaster & Mick Green 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2002,165(2):263-278

The use of ecological studies in explaining the relationship between deprivation and ill-health is widespread in many health applications. However, inferences drawn from these studies about individuals are susceptible to serious bias known as the ecological fallacy. Our paper demonstrates the ecological fallacy effect in this context but also shows how it can be considerably reduced by taking into account different population structures at the aggregate level. Two regression analyses of limiting long-term illness are performed, one at the individual level and one at the electoral ward level, using the 1991 UK census sample of anonymized records and the small area statistics. The analyses compare several measures of deprivation including the standard Carstairs index, with the separate variables which make up the indices, to determine their effectiveness in explaining rates of illness. Two of the deprivation scores are constructed using latent variable modelling techniques which enable a score to be generated at the individual level as well as at the ward level. It is shown that, given the right choice of socioeconomic variables and taking into account the age structure of the population, it should be possible to construct a single aggregate deprivation index that will explain most of the variation in rates of illness across the study region. 相似文献

4.

Bayes methods in the ecological fallacy cntext:estimation of individual correlation from aggregate data

Robert B. Bendel Bradley P. Carlin 《统计学通讯:理论与方法》2013,42(7):2595-2623

The ecological fallacy is related to Simpson's paradox (1951) where relationships among group means may be counterintuitive and substantially different from relationships within groups, where the groups are usually geographic entities such as census tracts. We consider the problem of estimating the correlation between two jointly normal random variables where only ecological data (group means) are available. Two empirical Bayes estimators and one fully Bayesian estimator are derived and compared with the usual ecological estimator, which is simply the Pearson correlation coefficient of the group sample means. We simulate the bias and mean squared error performance of these estimators, and also give an example employing a dataset where the individual level data are available for model checking. The results indicate superiority of the empirical Bayes estimators in a variety of practical situations where, though we lack individual level data, other relevant prior information is available. 相似文献

5.

Linkage bias in estimating the association between childhood exposures and propensity to become a mother: an example of simple sensitivity analyses

D. Nitsch B. L. DeStavola S. M. B. Morton D. A. Leon 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(3):493-505

Summary. Record linkage is a powerful tool to obtain individual follow-up information that is held in routinely collected databases. However, this method is potentially limited not only by the quality of the original data but also by the temporal and geographic coverage of the routine data. Migration in particular is a factor that might introduce systematic bias even in analyses of data covering relatively large geographical areas. We describe a linkage application where emigration bias might be an issue and use the sensitivity analysis approach that has been described by Molenberghs and co-workers and Kenward and co-workers to assess the extent of this bias. 相似文献

6.

Models for potentially biased evidence in meta-analysis using empirically based priors

N. J. Welton A. E. Ades J. B. Carlin D. G. Altman J. A. C. Sterne 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(1):119-136

Summary. We present models for the combined analysis of evidence from randomized controlled trials categorized as being at either low or high risk of bias due to a flaw in their conduct. We formulate a bias model that incorporates between-study and between-meta-analysis heterogeneity in bias, and uncertainty in overall mean bias. We obtain algebraic expressions for the posterior distribution of the bias-adjusted treatment effect, which provide limiting values for the information that can be obtained from studies at high risk of bias. The parameters of the bias model can be estimated from collections of previously published meta-analyses. We explore alternative models for such data, and alternative methods for introducing prior information on the bias parameters into a new meta-analysis. Results from an illustrative example show that the bias-adjusted treatment effect estimates are sensitive to the way in which the meta-epidemiological data are modelled, but that using point estimates for bias parameters provides an adequate approximation to using a full joint prior distribution. A sensitivity analysis shows that the gain in precision from including studies at high risk of bias is likely to be low, however numerous or large their size, and that little is gained by incorporating such studies, unless the information from studies at low risk of bias is limited. We discuss approaches that might increase the value of including studies at high risk of bias, and the acceptability of the methods in the evaluation of health care interventions. 相似文献

7.

Hazard ratio inference in stratified clinical trials with time‐to‐event endpoints and limited sample size

Rengyi Xu Devan V. Mehrotra Pamela A. Shaw 《Pharmaceutical statistics》2019,18(3):366-376

The stratified Cox model is commonly used for stratified clinical trials with time‐to‐event endpoints. The estimated log hazard ratio is approximately a weighted average of corresponding stratum‐specific Cox model estimates using inverse‐variance weights; the latter are optimal only under the (often implausible) assumption of a constant hazard ratio across strata. Focusing on trials with limited sample sizes (50‐200 subjects per treatment), we propose an alternative approach in which stratum‐specific estimates are obtained using a refined generalized logrank (RGLR) approach and then combined using either sample size or minimum risk weights for overall inference. Our proposal extends the work of Mehrotra et al, to incorporate the RGLR statistic, which outperforms the Cox model in the setting of proportional hazards and small samples. This work also entails development of a remarkably accurate plug‐in formula for the variance of RGLR‐based estimated log hazard ratios. We demonstrate using simulations that our proposed two‐step RGLR analysis delivers notably better results through smaller estimation bias and mean squared error and larger power than the stratified Cox model analysis when there is a treatment‐by‐stratum interaction, with similar performance when there is no interaction. Additionally, our method controls the type I error rate while the stratified Cox model does not in small samples. We illustrate our method using data from a clinical trial comparing two treatments for colon cancer. 相似文献

8.

Hierarchical related regression for combining aggregate and individual data in studies of socio-economic disease risk factors 总被引：2，自引：0，他引：2

Christopher Jackson Nicky Best Sylvia Richardson 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(1):159-178

Summary. To obtain information about the contribution of individual and area level factors to population health, it is desirable to use both data collected on areas, such as censuses, and on individuals, e.g. survey and cohort data. Recently developed models allow us to carry out simultaneous regressions on related data at the individual and aggregate levels. These can reduce 'ecological bias' that is caused by confounding, model misspecification or lack of information and increase power compared with analysing the data sets singly. We use these methods in an application investigating individual and area level sociodemographic predictors of the risk of hospital admissions for heart and circulatory disease in London. We discuss the practical issues that are encountered in this kind of data synthesis and demonstrate that this modelling framework is sufficiently flexible to incorporate a wide range of sources of data and to answer substantive questions. Our analysis shows that the variations that are observed are mainly attributable to individual level factors rather than the contextual effect of deprivation. 相似文献

9.

Estimating the negative binomial dispersion parameter with highly stratified surveys

N.G. Cadigan Jared Tobin 《Journal of statistical planning and inference》2010

We investigate several estimators of the negative binomial (NB) dispersion parameter for highly stratified count data for which the statistical model has a separate mean parameter for each stratum. If the number of samples per stratum is small then the model is highly parameterized and the maximum likelihood estimator (MLE) of the NB dispersion parameter can be biased and inefficient. Some of the estimators we investigate include adjustments for the number of mean parameters to reduce bias. We extend other estimators that were developed for the iid case, to reduce bias when there are many mean parameters. We demonstrate using simulations that an adjusted double extended quasi-likelihood estimator we proposed gives much improved estimates compared to the MLE. Adjusted extended quasi-likelihood and adjusted maximum likelihood estimators also give much-improved results. We illustrate the various estimators with stratified random bottom trawl survey data for cod (Gadus morhua) off the south coast of Newfoundland, Canada. 相似文献

10.

Increased Fisher’s information for parameters of association in count regression via extreme ranks

Daniel F. Linder Jingjing Yin Haresh Rochani Hani Samawi Sanjay Sethi 《统计学通讯:理论与方法》2018,47(5):1181-1203

The article details a sampling scheme which can lead to a reduction in sample size and cost in clinical and epidemiological studies of association between a count outcome and risk factor. We show that inference in two common generalized linear models for count data, Poisson and negative binomial regression, is improved by using a ranked auxiliary covariate, which guides the sampling procedure. This type of sampling has typically been used to improve inference on a population mean. The novelty of the current work is its extension to log-linear models and derivations showing that the sampling technique results in an increase in information as compared to simple random sampling. Specifically, we show that under the proposed sampling strategy the maximum likelihood estimate of the risk factor’s coefficient is improved through an increase in the Fisher’s information. A simulation study is performed to compare the mean squared error, bias, variance, and power of the sampling routine with simple random sampling under various data-generating scenarios. We also illustrate the merits of the sampling scheme on a real data set from a clinical setting of males with chronic obstructive pulmonary disease. Empirical results from the simulation study and data analysis coincide with the theoretical derivations, suggesting that a significant reduction in sample size, and hence study cost, can be realized while achieving the same precision as a simple random sample. 相似文献

11.

Ecological regression analysis of environmental benzene exposure and childhood leukaemia: sensitivity to data inaccuracies, geographical scale and ecological bias 总被引：3，自引：2，他引：1

Nicky Best Samantha Cockings James Bennett Jon Wakefield & Paul Elliott 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(1):155-174

Benzene is classified as a group 1 human carcinogen by the International Agency for Research on Cancer, and it is now accepted that occupational exposure is associated with an increased risk of various leukaemias. However, occupational exposure accounts for less than 1% of all benzene exposures, the major sources being cigarette smoking and vehicle exhaust emissions. Whether such low level exposures to environmental benzene are also associated with the risk of leukaemia is currently not known. In this study, we investigate the relationship between benzene emissions arising from outdoor sources (predominantly road traffic and petrol stations) and the incidence of childhood leukaemia in Greater London. An ecological design was used because of the rarity of the disease, the difficulty of obtaining individual level measurements of benzene exposure and the availability of data. However, some methodological difficulties were encountered, including problems of case registration errors, the choice of geographical areas for analysis, exposure measurement errors and ecological bias. We use a Bayesian hierarchical modelling framework to address these issues, and we investigate the sensitivity of our inference to various modelling assumptions. 相似文献

12.

A Generalized Chain Binomial Model with Aggregated Data

Ying Xu Paul S. F. Yip Richard M. Huggins 《统计学通讯:理论与方法》2013,42(18):3325-3338

In large cohort studies it can be impractical to report individual data that only summary or aggregated data are available. Using aggregated data from Bernoulli trials is expected to result in overdispersion so that a quasi-binomial approach would seem feasible. We show that when applied to aggregated data arising from cohorts of individuals according to a chain binomial model, the quasi-binomial model results in biased estimates. We propose an alternate calibration estimator and demonstrate its improved performance by simulations. The calibration method is then applied to model the probability of leaving a personal emergency link service in Hong Kong. 相似文献

13.

基于生存分析法的医疗电子病历系统采纳与扩散研究

陈渝路洋张枝子毛姗姗《统计与信息论坛》2017,(6):121-127

医疗电子病历系统作为中国医疗信息化建设的核心,关注其采纳与扩散机理,对推进医疗卫生信息化建设,以及实现有意义地使用具有重要的理论和实践意义。基于云南省222家医院信息化建设的调研数据,运用半参数生存分析法即Cox回归模型探讨医院采纳电子病历系统的影响因素及扩散机理。研究表明,教学状态、医院规模和建院时间均为有利因素,即这三个因素积极促进医院采纳电子病历系统;时间×规模为不利因素,即它对医院采纳电子病历系统有负向影响;地理位置在分层和删减样本的模型分析中均表现为不利因素;医院等级影响并不明显。相似文献

14.

Iterative Bias Correction of the Cross‐Validation Criterion

HIROKAZU YANAGIHARA HIRONORI FUJISAWA 《Scandinavian Journal of Statistics》2012,39(1):116-130

Abstract. The cross‐validation (CV) criterion is known to be asecond‐order unbiased estimator of the risk function measuring the discrepancy between the candidate model and the true model, as well as the generalized information criterion (GIC) and the extended information criterion (EIC). In the present article, we show that the 2kth‐order unbiased estimator can be obtained using a linear combination from the leave‐one‐out CV criterion to the leave‐k‐out CV criterion. The proposed scheme is unique in that a bias smaller than that of a jackknife method can be obtained without any analytic calculation, that is, it is not necessary to obtain the explicit form of several terms in an asymptotic expansion of the bias. Furthermore, the proposed criterion can be regarded as a finite correction of a bias‐corrected CV criterion by using scalar coefficients in a bias‐corrected EIC obtained by the bootstrap iteration. 相似文献

15.

Alleviating linear ecological bias and optimal design with subsample data

Adam N. Glynn Jon Wakefield Mark S. Handcock Thomas S. Richardson 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(1):179-202

Summary. We illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides two main benefits. First, by including the individual level subsample data, the biases that are associated with linear ecological inference can be eliminated. Second, available ecological data can be used to design optimal subsampling schemes that maximize information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree on wages, showing that small, optimally chosen subsamples can be combined with ecological data to generate precise estimates relative to a simple random subsample. 相似文献

16.

A multilevel Bayesian model for contextual effect of material deprivation

Annibale?Biggeri Email author Emanuela?Dreassi Marco?Marchi 《Statistical Methods and Applications》2004,13(1):89-103

The relationship between socioeconomic factors and health has been studied in many circumstances. Whether the association takes place at individual level only, or also at population level (contextual effect) is still unclear. We present a multilevel hierarchical Bayesian model to investigate the joint contribution of individual and population-based socioeconomic factors to mortality, using data from the census cohort of the general population of the city of Florence, Italy (Tuscany Longitudinal Study, 1991-1995). Evidence supporting a contextual effect of deprivation on mortality at the very fine level of aggregation is found. Inappropriate modelling of individual and aggregate variables could strongly bias effect estimates.Received: 10 January 2002, Revised: 23 June 2003, The research on Tuscany Longitudinal Study (Studio Longitudinale Toscano, SLTo) was supported by the Regione Toscana Servizio Statistica. 相似文献

17.

On Local Polynomial Modelling of the Additive Risk Model

Wanrong Liu Yongcheng Qi 《统计学通讯:理论与方法》2013,42(11):1958-1981

The additive risk model provides an alternative modelling technique for failure time data to the proportional hazards model. In this article, we consider the additive risk model with a nonparametric risk effect. We study estimation of the risk function and its derivatives with a parametric and an unspecified baseline hazard function respectively. The resulting estimators are the local likelihood and the local score estimators. We establish the asymptotic normality of the estimators and show that both methods have the same formula for asymptotic bias but different formula for variance. It is found that, in some special cases, the local score estimator is of the same efficiency as the local likelihood estimator though it does not use the information about the baseline hazard function. Another advantage of the local score estimator is that it has a closed form and is easy to implement. Some simulation studies are conducted to evaluate and compare the performance of the two estimators. A numerical example is used for illustration. 相似文献

18.

Inference about regression parameters using highly stratified survey count data with over-dispersion and repeated measurements

S. Wang H. P. Benoît 《Journal of applied statistics》2017,44(6):1013-1030

We study methods to estimate regression and variance parameters for over-dispersed and correlated count data from highly stratified surveys. Our application involves counts of fish catches from stratified research surveys and we propose a novel model in fisheries science to address changes in survey protocols. A challenge with this model is the large number of nuisance parameters which leads to computational issues and biased statistical inferences. We use a computationally efficient profile generalized estimating equation method and compare it to marginal maximum likelihood (MLE) and restricted MLE (REML) methods. We use REML to address bias and inaccurate confidence intervals because of many nuisance parameters. The marginal MLE and REML approaches involve intractable integrals and we used a new R package that is designed for estimating complex nonlinear models that may include random effects. We conclude from simulation analyses that the REML method provides more reliable statistical inferences among the three methods we investigated. 相似文献

19.

A mixed effects model for analyzing area under the curve of longitudinally measured biomarkers with missing data

Luoxi Shi Dorothy K. Hatsukami Joseph S. Koopmeiners Chap T. Le Neal L. Benowitz Eric C. Donny Xianghua Luo 《Pharmaceutical statistics》2021,20(6):1249-1264

A simple approach for analyzing longitudinally measured biomarkers is to calculate summary measures such as the area under the curve (AUC) for each individual and then compare the mean AUC between treatment groups using methods such as t test. This two-step approach is difficult to implement when there are missing data since the AUC cannot be directly calculated for individuals with missing measurements. Simple methods for dealing with missing data include the complete case analysis and imputation. A recent study showed that the estimated mean AUC difference between treatment groups based on the linear mixed model (LMM), rather than on individually calculated AUCs by simple imputation, has negligible bias under random missing assumptions and only small bias when missing is not at random. However, this model assumes the outcome to be normally distributed, which is often violated in biomarker data. In this paper, we propose to use a LMM on log-transformed biomarkers, based on which statistical inference for the ratio, rather than difference, of AUC between treatment groups is provided. The proposed method can not only handle the potential baseline imbalance in a randomized trail but also circumvent the estimation of the nuisance variance parameters in the log-normal model. The proposed model is applied to a recently completed large randomized trial studying the effect of nicotine reduction on biomarker exposure of smokers. 相似文献

20.

Discovering hidden statistical issues through individual-level models in ecological studies

Soeun Kim 《Journal of applied statistics》2019,46(14):2540-2552

ABSTRACT

In ecological studies, individual inference is made based on results from ecological models. Interpretation of the results requires caution since ecological analysis on group level may not hold in the individual level within the groups, leading to ecological fallacy. Using an ecological regression example for analyzing voting behaviors, we highlight that the explicit use of individual-level models is crucial in understanding the results of ecological studies. In particular, we clarify three relevant statistical issues for each individual-level models: assessment of the uncertainty of parameter estimates obtained from a wrong model, the use of shrinkage estimation method for simultaneous estimation of many parameters, and the necessity of sensitivity analysis rather than adhering to one seemingly most compelling assumption. 相似文献