共查询到20条相似文献,搜索用时 15 毫秒
1.
Obtaining cancer risk factor prevalence estimates in small areas: combining data from two surveys 总被引:2,自引:1,他引:2
Michael R. Elliott William W. Davis 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(3):595-609
Summary. Cancer surveillance research requires accurate estimates of risk factors at the small area level. These risk factors are often obtained from surveys such as the National Health Interview Survey (NHIS) or the Behavioral Risk Factors Surveillance System (BRFSS). The NHIS is a nationally representative, face-to-face survey with a high response rate; however, it cannot produce state or substate estimates of risk factor prevalence because the sample sizes are too small and small area identifiers are unavailable to the public. The BRFSS is a state level telephone survey that excludes non-telephone households and has a lower response rate, but it does provide reasonable sample sizes in all states and many counties and has publicly available small area identifiers (counties). We propose a novel extension of dual-frame estimation using propensity scores that allows the complementary strengths of each survey to compensate for the weakness of the other. We apply this method to obtain 1999–2000 county level estimates of adult male smoking prevalence and mammogram usage rates among females who were 40 years old and older. We consider evidence that these NHIS-adjusted estimates reduce the effects of selection bias and non-telephone coverage in the BRFSS. Data from the Current Population Survey Tobacco Use Supplement are also used to evaluate the performance of this approach. A hybrid estimator that selects one of the two estimators on the basis of the mean-square error is also considered. 相似文献
2.
3.
In many socio-economic surveys the objective is estimation of total or proportion of persons with a particular attribute. Multi-stage area samples are drawn from geographic strata and population within areal units is used as an auxiliary variable in ratio estimation. For large administrative areas, the auxiliary variable totals are available as population projections based on the last census. However, for small areas population changes are significantly affected by non-demographic factors and hence projections with high enough reliability are not available for small areas. In such situations the efficiency of design-based estimators for small areas can be improved by a ratio adjustment based on the auxiliary variable total for a large area. An inequality on the efficiency of the ratio adjusted estimator is established and its bias and variance is investigated 相似文献
4.
5.
Hypertension is a highly prevalent cardiovascular disease. It marks a considerable cost factor to many national health systems. Despite its prevalence, regional disease distributions are often unknown and must be estimated from survey data. However, health surveys frequently lack in regional observations due to limited resources. Obtained prevalence estimates suffer from unacceptably large sampling variances and are not reliable. Small area estimation solves this problem by linking auxiliary data from multiple regions in suitable regression models. Typically, either unit- or area-level observations are considered for this purpose. But with respect to hypertension, both levels should be used. Hypertension has characteristic comorbidities and is strongly related to lifestyle features, which are unit-level information. It is also correlated with socioeconomic indicators that are usually measured on the area-level. But the level combination is challenging as it requires multi-level model parameter estimation from small samples. We use a multi-level small area model with level-specific penalization to overcome this issue. Model parameter estimation is performed via stochastic coordinate gradient descent. A jackknife estimator of the mean squared error is presented. The methodology is applied to combine health survey data and administrative records to estimate regional hypertension prevalence in Germany. 相似文献
6.
Randhir Singh 《Journal of statistical planning and inference》1985,11(2):163-170
In longitudinal surveys where a number of observations have to be made on the same sampling unit at specified time intervals, it is not uncommon that observations for some of the time stages for some of the sampled units are found missing. In the present investigation an estimation procedure for estimating the population total based on such incomplete data from multiple observations is suggested which makes use of all the available information and is seen to be more efficient than the one based on only completely observed units. Estimators are also proposed for two other situations; firstly when data is collected only for a sample of time stages and secondly when data is observed for only one time stage per sampled unit. 相似文献
7.
This paper considers the application of Stein-type estimation procedure for the coefficients in a linear regression model when data are available from replicated experiment. Two families of estimators characterized by a single scalar are proposed and their large sample asymptotic properties are derived. These are utilized for comparing the performances of the two estimators along with the conventional estimator and conditions for the superiority of one estimator over the other are deduced. 相似文献
8.
Data from sample surveys conducted between 1978 and 1981 are used to examine the fertility of women in second and subsequent marriages in the USSR. The results indicate that women up to age 25 who have been married more than once have higher fertility than women in a first marriage. However, total fertility is higher for women in uninterrupted marriages. The analysis is presented separately for various cohorts and for socioeconomic characteristics such as educational status and rural or urban residence. 相似文献
9.
Before releasing survey data, statistical agencies usually perturb the original data to keep each survey unit''s information confidential. One significant concern in releasing survey microdata is identity disclosure, which occurs when an intruder correctly identifies the records of a survey unit by matching the values of some key (or pseudo-identifying) variables. We examine a recently developed post-randomization method for a strict control of identification risks in releasing survey microdata. While that procedure well preserves the observed frequencies and hence statistical estimates in case of simple random sampling, we show that in general surveys, it may induce considerable bias in commonly used survey-weighted estimators. We propose a modified procedure that better preserves weighted estimates. The procedure is illustrated and empirically assessed with an application to a publicly available US Census Bureau data set. 相似文献
10.
Maura Mezzetti 《Statistical Methods and Applications》2012,21(1):49-74
A hierarchical Bayesian factor model for multivariate spatially correlated data is proposed. Multiple cancer incidence data in Scotland are jointly analyzed, looking for common components, able to detect etiological factors of diseases hidden behind the data. The proposed method searches factor scores incorporating a dependence within observations due to a geographical structure. The great flexibility of the Bayesian approach allows the inclusion of prior opinions about adjacent regions having highly correlated observable and latent variables. The proposed model is an extension of a model proposed by Rowe (2003a) and starts from the introduction of separable covariance matrix for the observations. A Gibbs sampling algorithm is implemented to sample from the posterior distributions. 相似文献
11.
Hierarchical related regression for combining aggregate and individual data in studies of socio-economic disease risk factors 总被引:2,自引:0,他引:2
Christopher Jackson Nicky Best Sylvia Richardson 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(1):159-178
Summary. To obtain information about the contribution of individual and area level factors to population health, it is desirable to use both data collected on areas, such as censuses, and on individuals, e.g. survey and cohort data. Recently developed models allow us to carry out simultaneous regressions on related data at the individual and aggregate levels. These can reduce 'ecological bias' that is caused by confounding, model misspecification or lack of information and increase power compared with analysing the data sets singly. We use these methods in an application investigating individual and area level sociodemographic predictors of the risk of hospital admissions for heart and circulatory disease in London. We discuss the practical issues that are encountered in this kind of data synthesis and demonstrate that this modelling framework is sufficiently flexible to incorporate a wide range of sources of data and to answer substantive questions. Our analysis shows that the variations that are observed are mainly attributable to individual level factors rather than the contextual effect of deprivation. 相似文献
12.
We propose a survey weighted quadratic inference function method for the analysis of data collected from longitudinal surveys, as an alternative to the survey weighted generalized estimating equation method. The procedure yields estimators of model parameters, which are shown to be consistent and have a limiting normal distribution. Furthermore, based on the inference function, a pseudolikelihood ratio type statistic for testing a composite hypothesis on model parameters and a statistic for testing the goodness of fit of the assumed model are proposed. We establish their asymptotic distributions as weighted sums of independent chi‐squared random variables and obtain Rao‐Scott corrections to those statistics leading to a chi‐squared distribution, approximately. We examine the performance of the proposed methods in a simulation study. 相似文献
13.
In this paper, we analyze data from the Italian National Register of Rare Diseases (NRRD) focusing, in particular, on the geo-temporal distribution of patients affected by neurofibromatosis type 1 (NF1, ICD9CM code 237.71). The aim is at deriving a corrected measure of incidence for the period 2007–2009 using a single source, and to provide NF1 prevalence estimates for the period 2001–2006 through the use of capture–recapture methods over two sources. In the first case, a reverse hazard estimator for the delay in diagnosis of NF1 is used to estimate the probability that a generic unit belonging to the population of interest has been registered by the archive of reference. For the second purpose, two-source capture–recapture methods have been used to estimate the number of NF1 prevalent units in Italy for the period 2001–2006, matching information provided by the NRRD and the national register of hospital discharge, Scheda di Dimissione Ospedaliera (in the following SDO), archives. 相似文献
14.
15.
Gabriele B. Durrant Fiona Steele 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(2):361-381
Summary. We analyse household unit non-response in six major UK Government surveys by using a multilevel multinomial modelling approach. The models are guided by current conceptual frameworks and theories of survey participation. One key feature of the analysis is the investigation of the extent to which effects of household characteristics are survey specific. The analysis is based on the 2001 UK Census Link Study, which is a unique source of data containing an unusually rich set of auxiliary variables. The study contains the response outcome of six surveys, linked to census data and interviewer observations for both respondents and non-respondents. 相似文献
16.
17.
When variable selection with stepwise regression and model fitting are conducted on the same data set, competition for inclusion
in the model induces a selection bias in coefficient estimators away from zero. In proportional hazards regression with right-censored
data, selection bias inflates the absolute value of parameter estimate of selected parameters, while the omission of other
variables may shrink coefficients toward zero. This paper explores the extent of the bias in parameter estimates from stepwise
proportional hazards regression and proposes a bootstrap method, similar to those proposed by Miller (Subset Selection in
Regression, 2nd edn. Chapman & Hall/CRC, 2002) for linear regression, to correct for selection bias. We also use bootstrap
methods to estimate the standard error of the adjusted estimators. Simulation results show that substantial biases could be
present in uncorrected stepwise estimators and, for binary covariates, could exceed 250% of the true parameter value. The
simulations also show that the conditional mean of the proposed bootstrap bias-corrected parameter estimator, given that a
variable is selected, is moved closer to the unconditional mean of the standard partial likelihood estimator in the chosen
model, and to the population value of the parameter. We also explore the effect of the adjustment on estimates of log relative
risk, given the values of the covariates in a selected model. The proposed method is illustrated with data sets in primary
biliary cirrhosis and in multiple myeloma from the Eastern Cooperative Oncology Group. 相似文献
18.
Jonathan J. Forster Emily L. Webb 《Journal of the Royal Statistical Society. Series C, Applied statistics》2007,56(5):551-570
Summary. We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focused on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty in our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set. 相似文献
19.
20.
John D. Emerson David C. Hoaglin Frederick Mosteller 《Statistical Methods and Applications》1993,2(3):269-290
Summary Meta-analyses of sets of clinical trials often combine risk differences from several 2×2 tables according to a random-effects
model. The DerSimonian-Laird random-effects procedure, widely used for estimating the populaton mean risk difference, weights
the risk difference from each primary study inversely proportional to an estimate of its variance (the sum of the between-study
variance and the conditional within-study variance). Because those weights are not independent of the risk differences, however,
the procedure sometimes exhibits bias and unnatural behavior. The present paper proposes a modified weighting scheme that
uses the unconditional within-study variance to avoid this source of bias. The modified procedure has variance closer to that
available from weighting by ideal weights when such weights are known. We studied the modified procedure in extensive simulation
experiments using situations whose parameters resemble those of actual studies in medical research. For comparison we also
included two unbiased procedures, the unweighted mean and a sample-size-weighted mean; their relative variability depends
on the extent of heterogeneity among the primary studies. An example illustrates the application of the procedures to actual
data and the differences among the results.
This research was supported by Grant HS 05936 from the Agency for Health Care Policy and Research to Harvard University. 相似文献