首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 46 毫秒
In large epidemiological studies, budgetary or logistical constraints will typically preclude study investigators from measuring all exposures, covariates and outcomes of interest on all study subjects. We develop a flexible theoretical framework that incorporates a number of familiar designs such as case control and cohort studies, as well as multistage sampling designs. Our framework also allows for designed missingness and includes the option for outcome dependent designs. Our formulation is based on maximum likelihood and generalizes well known results for inference with missing data to the multistage setting. A variety of techniques are applied to streamline the computation of the Hessian matrix for these designs, facilitating the development of an efficient software tool to implement a wide variety of designs.  相似文献   

The efficient use of surrogate or auxiliary information has been investigated within both model-based and design-based approaches to data analysis, particularly in the context of missing data. Here we consider the use of such data in epidemiological studies of disease incidence in which surrogate measures of disease status are available for all subjects at two time points, but definitive diagnoses are available only in stratified subsamples. We briefly review methods for the analysis of two-phase studies of disease prevalence at a single time point, and we discuss the extension of four of these methods to the analysis of incidence studies. Their performance is compared with special reference to a study of the incidence of senile dementia.  相似文献   

Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)’s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.  相似文献   

Missing data form a ubiquitous problem in scientific research, especially since most statistical analyses require complete data. To evaluate the performance of methods dealing with missing data, researchers perform simulation studies. An important aspect of these studies is the generation of missing values in a simulated, complete data set: the amputation procedure. We investigated the methodological validity and statistical nature of both the current amputation practice and a newly developed and implemented multivariate amputation procedure. We found that the current way of practice may not be appropriate for the generation of intuitive and reliable missing data problems. The multivariate amputation procedure, on the other hand, generates reliable amputations and allows for a proper regulation of missing data problems. The procedure has additional features to generate any missing data scenario precisely as intended. Hence, the multivariate amputation procedure is an efficient method to accurately evaluate missing data methodology.  相似文献   


Weighted distributions, as an example of informative sampling, work appropriately under the missing at random mechanism since they neglect missing values and only completely observed subjects are used in the study plan. However, length-biased distributions, as a special case of weighted distributions, remove the subjects with short length deliberately, which surely meet the missing not at random mechanism. Accordingly, applying length-biased distributions jeopardizes the results by producing biased estimates. Hence, an alternate method has to be used such that the results are improved by means of valid inferences. We propose methods that are based on weighted distributions and joint modelling procedure and compare them in analysing longitudinal data. After introducing three methods in use, a set of simulation studies and analysis of two real longitudinal datasets affirm our claim.  相似文献   

In this paper, we consider a regression analysis for a missing data problem in which the variables of primary interest are unobserved under a general biased sampling scheme, an outcome‐dependent sampling (ODS) design. We propose a semiparametric empirical likelihood method for accessing the association between a continuous outcome response and unobservable interesting factors. Simulation study results show that ODS design can produce more efficient estimators than the simple random design of the same sample size. We demonstrate the proposed approach with a data set from an environmental study for the genetic effects on human lung function in COPD smokers. The Canadian Journal of Statistics 40: 282–303; 2012 © 2012 Statistical Society of Canada  相似文献   

Missing covariates data with censored outcomes put a challenge in the analysis of clinical data especially in small sample settings. Multiple imputation (MI) techniques are popularly used to impute missing covariates and the data are then analyzed through methods that can handle censoring. However, techniques based on MI are available to impute censored data also but they are not much in practice. In the present study, we applied a method based on multiple imputation by chained equations to impute missing values of covariates and also to impute censored outcomes using restricted survival time in small sample settings. The complete data were then analyzed using linear regression models. Simulation studies and a real example of CHD data show that the present method produced better estimates and lower standard errors when applied on the data having missing covariate values and censored outcomes than the analysis of the data having censored outcome but excluding cases with missing covariates or the analysis when cases with missing covariate values and censored outcomes were excluded from the data (complete case analysis).  相似文献   

Density function is a fundamental concept in data analysis. Non-parametric methods including kernel smoothing estimate are available if the data is completely observed. However, in studies such as diagnostic studies following a two-stage design the membership of some of the subjects may be missing. Simply ignoring those subjects with unknown membership is valid only in the MCAR situation. In this paper, we consider kernel smoothing estimate of the density functions, using the inverse probability approaches to address the missing values. We illustrate the approaches with simulation studies and real study data in mental health.  相似文献   

Diagnostic tests are used in a wide range of behavioral, medical, psychosocial, and healthcare-related research. Test sensitivity and specificity are the most popular measures of accuracy for diagnostic tests. Available methods for analyzing longitudinal study designs assume fixed gold or reference standards and as such do not apply to studies with dynamically changing reference standards, which are especially popular in psychosocial research. In this article, we develop a novel approach to address missing data and other related issues for modeling sensitivity and specificity within such a time-varying reference standard setting. The approach is illustrated with real as well as simulated data.  相似文献   

Summary.  In longitudinal studies, missingness of data is often an unavoidable problem. Estimators from the linear mixed effects model assume that missing data are missing at random. However, estimators are biased when this assumption is not met. In the paper, theoretical results for the asymptotic bias are established under non-ignorable drop-out, drop-in and other missing data patterns. The asymptotic bias is large when the drop-out subjects have only one or no observation, especially for slope-related parameters of the linear mixed effects model. In the drop-in case, intercept-related parameter estimators show substantial asymptotic bias when subjects enter late in the study. Eight other missing data patterns are considered and these produce asymptotic biases of a variety of magnitudes.  相似文献   

Patterns of sexual mixing and the sexual partner network are important determinants of the spread of all sexually transmitted diseases (STDs), including the human immunodeficiency virus. Novel statistical problems arise in the analysis and interpretation of studies aimed at measuring patterns of sexual mixing and sexual partner networks. Samples of mixing patterns and network structures derived from randomly sampling individuals are not themselves random samples of measures of partnerships or networks. In addition, the sensitive nature of questions on sexual activity will result in the introduction of non-response biases, which in estimating network structures are likely to be non-ignorable. Adjusting estimates for these biases by using standard statistical approaches is complicated by the complex interactions between the mechanisms generating bias and the non-independent nature of network data. Using a two-step Monte Carlo simulation approach, we have shown that measures of mixing patterns and the network structure that do not account for missing data and non-random sampling are severely biased. Here, we use this approach to adjust raw estimates in data to incorporate these effects. The results suggest that the risk for transmission of STDs in empirical data is underestimated by ignoring missing data and non-random sampling.  相似文献   

Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right.  相似文献   

Crossover designs are used often in clinical trials. It is not uncommon that subjects discontinue before completing all treatment periods in a crossover study. Despite availability of statistical methodologies utilizing all available data and software for obtaining valid inferences under the assumption of missing at random (MAR), naïve approaches, such as the complete case (CC) analysis, which is only valid with a strong assumption of missing completely at random are still widely used in practice. In this article, we obtain the analytical form of the estimation bias of treatment effects with CC for linear mixed models. We use simulation studies to examine the inflation of Type I error and efficiency loss in the inferences with CC under MAR. Invalidity and inefficiency of two other commonly used approaches for defining analyzed data in the presence of missing data, including data from at least two periods in three period crossover and available cases for a specific comparison of interest, are also demonstrated through simulation studies.  相似文献   

Outliers are commonly observed in psychosocial research, generally resulting in biased estimates when comparing group differences using popular mean-based models such as the analysis of variance model. Rank-based methods such as the popular Mann–Whitney–Wilcoxon (MWW) rank sum test are more effective to address such outliers. However, available methods for inference are limited to cross-sectional data and cannot be applied to longitudinal studies under missing data. In this paper, we propose a generalized MWW test for comparing multiple groups with covariates within a longitudinal data setting, by utilizing the functional response models. Inference is based on a class of U-statistics-based weighted generalized estimating equations, providing consistent and asymptotically normal estimates not only under complete but missing data as well. The proposed approach is illustrated with both real and simulated study data.  相似文献   

In longitudinal surveys where a number of observations have to be made on the same sampling unit at specified time intervals, it is not uncommon that observations for some of the time stages for some of the sampled units are found missing. In the present investigation an estimation procedure for estimating the population total based on such incomplete data from multiple observations is suggested which makes use of all the available information and is seen to be more efficient than the one based on only completely observed units. Estimators are also proposed for two other situations; firstly when data is collected only for a sample of time stages and secondly when data is observed for only one time stage per sampled unit.  相似文献   


In general, survival data are time-to-event data, such as time to death, time to appearance of a tumor, or time to recurrence of a disease. Models for survival data have frequently been based on the proportional hazards model, proposed by Cox. The Cox model has intensive application in the field of social, medical, behavioral and public health sciences. In this paper we propose a more efficient sampling method of recruiting subjects for survival analysis. We propose using a Moving Extreme Ranked Set Sampling (MERSS) scheme with ranking based on an easy-to-evaluate baseline auxiliary variable known to be associated with survival time. This paper demonstrates that this approach provides a more powerful testing procedure as well as a more efficient estimate of hazard ratio than that based on simple random sampling (SRS). Theoretical derivation and simulation studies are provided. The Iowa 65+ Rural study data are used to illustrate the methods developed in this paper.  相似文献   

In longitudinal studies, missing responses and mismeasured covariates are commonly seen due to the data collection process. Without cautiousness in data analysis, inferences from the standard statistical approaches may lead to wrong conclusions. In order to improve the estimation for longitudinal data analysis, a doubly robust estimation method for partially linear models, which can simultaneously account for the missing responses and mismeasured covariates, is proposed. Imprecisions of covariates are corrected by taking advantage of the independence between replicate measurement errors, and missing responses are handled by the doubly robust estimation under the mechanism of missing at random. The asymptotic properties of the proposed estimators are established under regularity conditions, and simulation studies demonstrate desired properties. Finally, the proposed method is applied to data from the Lifestyle Education for Activity and Nutrition study.  相似文献   

Missing data and, more generally, imperfections in implementing a study design are an endemic problem in large scale studies involving human subjects. We present an analysis of an experiment in the interaction between general practitioners and their patients, in which the issue of missing data is addressed by a sensitivity analysis using multiple imputation. Instead of specifying a model for missingness we explore certain extreme ways of departing from the assumption of data missing at random and establish the largest extent of such departures which would still fail to supplant the evidence about the studied effect. An important advantage of the approach is that the algorithm intended for the complete data, to fit generalized linear models with random effects, is used without any alteration.  相似文献   

This article proposes a Bayesian approach, which can simultaneously obtain the Bayesian estimates of unknown parameters and random effects, to analyze nonlinear reproductive dispersion mixed models (NRDMMs) for longitudinal data with nonignorable missing covariates and responses. The logistic regression model is employed to model the missing data mechanisms for missing covariates and responses. A hybrid sampling procedure combining the Gibber sampler and the Metropolis-Hastings algorithm is presented to draw observations from the conditional distributions. Because missing data mechanism is not testable, we develop the logarithm of the pseudo-marginal likelihood, deviance information criterion, the Bayes factor, and the pseudo-Bayes factor to compare several competing missing data mechanism models in the current considered NRDMMs with nonignorable missing covaraites and responses. Three simulation studies and a real example taken from the paediatric AIDS clinical trial group ACTG are used to illustrate the proposed methodologies. Empirical results show that our proposed methods are effective in selecting missing data mechanism models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号