首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Two classes of methods properly account for clustering of data: design-based methods and model-based methods. Estimates from both methods have been shown to be approximately equal with large samples. However, both classes are known to produce biased standard error estimates with small samples. This paper compares the bias of standard errors and statistical power of marginal effects for generalized estimating equations (a design-based method) and generalized/linear mixed effects models (model-based methods) with small sample sizes via a simulation study. Provided that the distributional assumptions are met, model-based methods produced the least-biased standard error estimates and greater relative statistical power.  相似文献   

2.
This paper describes an application of small area estimation (SAE) techniques under area-level spatial random effect models when only area (or district or aggregated) level data are available. In particular, the SAE approach is applied to produce district-level model-based estimates of crop yield for paddy in the state of Uttar Pradesh in India using the data on crop-cutting experiments supervised under the Improvement of Crop Statistics scheme and the secondary data from the Population Census. The diagnostic measures are illustrated to examine the model assumptions as well as reliability and validity of the generated model-based small area estimates. The results show a considerable gain in precision in model-based estimates produced applying SAE. Furthermore, the model-based estimates obtained by exploiting spatial information are more efficient than the one obtained by ignoring this information. However, both of these model-based estimates are more efficient than the direct survey estimate. In many districts, there is no survey data and therefore it is not possible to produce direct survey estimates for these districts. The model-based estimates generated using SAE are still reliable for such districts. These estimates produced by using SAE will provide invaluable information to policy-analysts and decision-makers.  相似文献   

3.
This article reviews four area-level linear mixed models that borrow strength by exploiting the possible correlation among the neighboring areas or/and past time periods. Its main goal is to study if there are efficiency gains when a spatial dependence or/and a temporal autocorrelation among random-area effects are included into the models. The Fay–Herriot estimator is used as benchmark. A design-based simulation study based on real data collected from a longitudinal survey conducted by a statistical office is presented. Our results show that models that explore both spatial and chronological association considerably improve the efficiency of small area estimates.  相似文献   

4.
Model-based estimators are becoming very popular in statistical offices because Governments require accurate estimates for small domains that were not planned when the study was designed, as their inclusion would have produced an increase in the cost of the study. The sample sizes in these domains are very small or even zero; consequently, traditional direct design-based estimators lead to unacceptably large standard errors. In this regard, model-based estimators that 'borrow information' from related areas by using auxiliary information are appropriate. This paper reviews, under the model-based approach, a BLUP synthetic and an EBLUP estimator. The goal is to obtain estimators of domain totals when there are several domains with very small sample sizes or without sampled units. We also provide detailed expressions of the mean squared error at different levels of aggregation. The results are illustrated with real data from the Basque Country Business Survey.  相似文献   

5.
Despite the popularity and importance, there is limited work on modelling data which come from complex survey design using finite mixture models. In this work, we explored the use of finite mixture regression models when the samples were drawn using a complex survey design. In particular, we considered modelling data collected based on stratified sampling design. We developed a new design-based inference where we integrated sampling weights in the complete-data log-likelihood function. The expectation–maximisation algorithm was developed accordingly. A simulation study was conducted to compare the new methodology with the usual finite mixture of a regression model. The comparison was done using bias-variance components of mean square error. Additionally, a simulation study was conducted to assess the ability of the Bayesian information criterion to select the optimal number of components under the proposed modelling approach. The methodology was implemented on real data with good results.  相似文献   

6.
Approaches that use the pseudolikelihood to perform multilevel modelling on survey data have been presented in the literature. To avoid biased estimates due to unequal selection probabilities, conditional weights can be introduced at each level. Less-biased estimators can also be obtained in a two-level linear model if the level-1 weights are scaled. In this paper, we studied several level-2 weights that can be introduced into the pseudolikelihood when the sampling design and the hierarchical structure of the multilevel model do not match. Two-level and three-level models were studied. The present work was motivated by a study that aims to estimate the contributions of lead sources to polluting the interior floor dust of the rooms within dwellings. We performed a simulation study using the real data collected from a French survey to achieve our objective. We conclude that it is preferable to use unweighted analyses or, at the most, to use conditional level-2 weights in a two-level or a three-level model. We state some warnings and make some recommendations.  相似文献   

7.
The efficient use of surrogate or auxiliary information has been investigated within both model-based and design-based approaches to data analysis, particularly in the context of missing data. Here we consider the use of such data in epidemiological studies of disease incidence in which surrogate measures of disease status are available for all subjects at two time points, but definitive diagnoses are available only in stratified subsamples. We briefly review methods for the analysis of two-phase studies of disease prevalence at a single time point, and we discuss the extension of four of these methods to the analysis of incidence studies. Their performance is compared with special reference to a study of the incidence of senile dementia.  相似文献   

8.
Multilevel modeling is an important tool for analyzing large-scale assessment data. However, the standard multilevel modeling will typically give biased results for such complex survey data. This bias can be eliminated by introducing design weights which must be used carefully as they can affect the results. The aim of this paper is to examine different approaches and to give recommendations concerning handling design weights in multilevel models when analyzing large-scale assessments such as TIMSS (The Trends in International Mathematics and Science Study). To achieve the goal of the paper, we examined real data from two countries and included a simulation study. The analyses in the empirical study showed that using no weights or only level 1 weights sometimes could lead to misleading conclusions. The simulation study only showed small differences in estimation of the weighted and unweighted models when informative design weights were used. The use of unscaled or not rescaled weights however caused significant differences in some parameter estimates.  相似文献   

9.
We show how register data combined at person-level with survey data can be used to conduct a novel type of nonresponse analysis in a panel survey. The availability of register data provides a unique opportunity to directly test the type of the missingness mechanism as well as estimate the size of bias due to initial nonresponse and attrition. We are also able to study in-depth the determinants of initial nonresponse and attrition. We use the Finnish subset of the European Community Household Panel (FI ECHP) data combined with register panel data and unemployment spells as outcome variables of interest. Our results show that initial nonresponse and attrition are clearly different processes driven by different background variables. Both the initial nonresponse and attrition mechanisms are nonignorable with respect to analysis of unemployment spells. Finally, our results suggest that initial nonresponse may play a role at least as important as attrition in causing bias. This result challenges the common view of attrition being the main threat to the value of panel data.  相似文献   

10.
Rival estimators of the distribution function of a finite population, derived from the model-based and design-based approaches to sampling inference, are compared. A design-based estimator due to Rao, Kovar & Mantel (1990) has the desirable property from a model-based viewpoint, of being model-unbiased under misspecification of model, when the sample meets certain conditions. A modified version of this estimator is suggested; it relies less on design-based ingredients, and in particular avoids second-order inclusion probabilities. It is the preferred estimator when, as is often the case in sampling practice, the model is adopted without thorough consideration of goodness of fit. However, if standard regression procedures of model construction and criticism are employed, then a strictly model-based estimator does better.  相似文献   

11.
We investigate the impacts of complex sampling on point and standard error estimates in latent growth curve modelling of survey data. Methodological issues are illustrated with empirical evidence from the analysis of longitudinal data on life satisfaction trajectories using data from the British Household Panel Survey, a national representative survey in Great Britain. A multi-process second-order latent growth curve model with conditional linear growth is used to study variation in the two perceived life satisfaction latent factors considered. The benefits of accounting for the complex survey design are considered, including obtaining unbiased both point and standard error estimates, and therefore correctly specified confidence intervals and statistical tests. We conclude that, even for the rather elaborated longitudinal data models that were considered, estimation procedures are affected by variance-inflating impacts of complex sampling.  相似文献   

12.
This paper describes small area estimation (SAE) of proportions under a spatial dependent generalized linear mixed model using aggregated level data. The SAE is also applied to produce reliable district level estimates and mapping of incidence of indebtedness in the State of Uttar Pradesh in India using debt and investment survey data collected by National Sample Survey Office (NSSO) and the secondary data from the Census. The results show a significant improvement in precision of model-based estimates generated by SAE as compared to direct estimates. The estimates generated by incorporating spatial information are more efficient than the one generated by ignoring this information.  相似文献   

13.
In many medical studies, patients are followed longitudinally and interest is on assessing the relationship between longitudinal measurements and time to an event. Recently, various authors have proposed joint modeling approaches for longitudinal and time-to-event data for a single longitudinal variable. These joint modeling approaches become intractable with even a few longitudinal variables. In this paper we propose a regression calibration approach for jointly modeling multiple longitudinal measurements and discrete time-to-event data. Ideally, a two-stage modeling approach could be applied in which the multiple longitudinal measurements are modeled in the first stage and the longitudinal model is related to the time-to-event data in the second stage. Biased parameter estimation due to informative dropout makes this direct two-stage modeling approach problematic. We propose a regression calibration approach which appropriately accounts for informative dropout. We approximate the conditional distribution of the multiple longitudinal measurements given the event time by modeling all pairwise combinations of the longitudinal measurements using a bivariate linear mixed model which conditions on the event time. Complete data are then simulated based on estimates from these pairwise conditional models, and regression calibration is used to estimate the relationship between longitudinal data and time-to-event data using the complete data. We show that this approach performs well in estimating the relationship between multivariate longitudinal measurements and the time-to-event data and in estimating the parameters of the multiple longitudinal process subject to informative dropout. We illustrate this methodology with simulations and with an analysis of primary biliary cirrhosis (PBC) data.  相似文献   

14.
Absolute risk is the probability that a cause-specific event occurs in a given time interval in the presence of competing events. We present methods to estimate population-based absolute risk from a complex survey cohort that can accommodate multiple exposure-specific competing risks. The hazard function for each event type consists of an individualized relative risk multiplied by a baseline hazard function, which is modeled nonparametrically or parametrically with a piecewise exponential model. An influence method is used to derive a Taylor-linearized variance estimate for the absolute risk estimates. We introduce novel measures of the cause-specific influences that can guide modeling choices for the competing event components of the model. To illustrate our methodology, we build and validate cause-specific absolute risk models for cardiovascular and cancer deaths using data from the National Health and Nutrition Examination Survey. Our applications demonstrate the usefulness of survey-based risk prediction models for predicting health outcomes and quantifying the potential impact of disease prevention programs at the population level.  相似文献   

15.
Summary.  Previous research has proposed a design-based analysis procedure for experiments that are embedded in complex sampling designs in which the ultimate sampling units of an on-going sample survey are randomized over different treatments according to completely randomized designs or randomized block designs. Design-based Wald and t -statistics are applied to test whether sample means that are observed under various survey implementations are significantly different. This approach is generalized to experimental designs in which clusters of sampling units are randomized over the different treatments. Furthermore, test statistics are derived to test differences between ratios of two sample estimates that are observed under alternative survey implementations. The methods are illustrated with a simulation study and real life applications of experiments that are embedded in the Dutch Labour Force Survey. The functionality of a software package that was developed to conduct these analyses is described.  相似文献   

16.
Summary.  In sample surveys of finite populations, subpopulations for which the sample size is too small for estimation of adequate precision are referred to as small domains. Demand for small domain estimates has been growing in recent years among users of survey data. We explore the possibility of enhancing the precision of domain estimators by combining comparable information collected in multiple surveys of the same population. For this, we propose a regression method of estimation that is essentially an extended calibration procedure whereby comparable domain estimates from the various surveys are calibrated to each other. We show through analytic results and an empirical study that this method may greatly improve the precision of domain estimators for the variables that are common to these surveys, as these estimators make effective use of increased sample size for the common survey items. The design-based direct estimators proposed involve only domain-specific data on the variables of interest. This is in contrast with small domain (mostly small area) indirect estimators, based on a single survey, which incorporate through modelling data that are external to the targeted small domains. The approach proposed is also highly effective in handling the closely related problem of estimation for rare population characteristics.  相似文献   

17.
Assessing dose response from flexible‐dose clinical trials is problematic. The true dose effect may be obscured and even reversed in observed data because dose is related to both previous and subsequent outcomes. To remove selection bias, we propose marginal structural models, inverse probability of treatment‐weighting (IPTW) methodology. Potential clinical outcomes are compared across dose groups using a marginal structural model (MSM) based on a weighted pooled repeated measures analysis (generalized estimating equations with robust estimates of standard errors), with dose effect represented by current dose and recent dose history, and weights estimated from the data (via logistic regression) and determined as products of (i) inverse probability of receiving dose assignments that were actually received and (ii) inverse probability of remaining on treatment by this time. In simulations, this method led to almost unbiased estimates of true dose effect under various scenarios. Results were compared with those obtained by unweighted analyses and by weighted analyses under various model specifications. The simulation showed that the IPTW MSM methodology is highly sensitive to model misspecification even when weights are known. Practitioners applying MSM should be cautious about the challenges of implementing MSM with real clinical data. Clinical trial data are used to illustrate the methodology. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

18.
The main goal in small area estimation is to use models to ‘borrow strength’ from the ensemble because the direct estimates of small area parameters are generally unreliable. However, model-based estimates from the small areas do not usually match the value of the single estimate for the large area. Benchmarking is done by applying a constraint, internally or externally, to ensure that the ‘total’ of the small areas matches the ‘grand total’. This is particularly useful because it is difficult to check model assumptions owing to the sparseness of the data. We use a Bayesian nested error regression model, which incorporates unit-level covariates and sampling weights, to develop a method to internally benchmark the finite population means of small areas. We use two examples to illustrate our method. We also perform a simulation study to further assess the properties of our method.  相似文献   

19.
We discuss event histories from the point of view of longitudinal data analysis, comparing several possible inferential objectives. We show that the Nelson–Aalen estimate of a cumulative intensity may be derived as a limiting solution to a sequence of generalized estimating equations for intermittently observed longitudinal count data. We outline a potential use for the theory in interval-censored recurrent-event models, and demonstrate its applicability using data from a Toronto arthritis clinic. We also discuss connections with rate models, along with some implications for the longitudinal analyst.  相似文献   

20.
A common problem for longitudinal data analyses is that subjects follow-up is irregular, often related to the past outcome or other factors associated with the outcome measure that are not included in the regression model. Analyses unadjusted for outcome-dependent follow-up yield biased estimates. We propose a longitudinal data analysis that can provide consistent estimates in regression models that are subject to outcome-dependent follow-up. We focus on semiparametric marginal log-link regression with arbitrary unspecified baseline function. Based on estimating equations, the proposed class of estimators are root n consistent and asymptotically normal. We present simulation studies that assess the performance of the estimators under finite samples. We illustrate our approach using data from a health services research study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号