Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets. 相似文献
By employing all the observed information and the optimal augmentation term, we propose an augmented inverse probability weighted fractional imputation method (AFI) to handle covariates missing at random in quantile regression. Compared with the existing completely case analysis, inverse probability weighting, multiple imputation and fractional imputation based on quantile regression model with missing covarites, we carry out simulation study to investigate its performance in estimation accuracy and efficiency, computational efficiency and estimation robustness. We also talk about the influence of imputation replicates in our AFI. Finally, we apply our methodology to part of the National Health and Nutrition Examination Survey data. 相似文献
AbstractIn longitudinal studies data are collected on the same set of units for more than one occasion. In medical studies it is very common to have mixed Poisson and continuous longitudinal data. In such studies, for different reasons, some intended measurements might not be available resulting in a missing data setting. When the probability of missingness is related to the missing values, the missingness mechanism is termed nonrandom. The stochastic expectation-maximization (SEM) algorithm and the parametric fractional imputation (PFI) method are developed to handle nonrandom missingness in mixed discrete and continuous longitudinal data assuming different covariance structures for the continuous outcome. The proposed techniques are evaluated using simulation studies. Also, the proposed techniques are applied to the interstitial cystitis data base (ICDB) data. 相似文献
There are two generations of Gibbs sampling methods for semiparametric models involving the Dirichlet process. The first generation suffered from a severe drawback: the locations of the clusters, or groups of parameters, could essentially become fixed, moving only rarely. Two strategies that have been proposed to create the second generation of Gibbs samplers are integration and appending a second stage to the Gibbs sampler wherein the cluster locations are moved. We show that these same strategies are easily implemented for the sequential importance sampler, and that the first strategy dramatically improves results. As in the case of Gibbs sampling, these strategies are applicable to a much wider class of models. They are shown to provide more uniform importance sampling weights and lead to additional Rao-Blackwellization of estimators. 相似文献
Suppose that a residential neighborhood may have been contaminated by a nearby abandoned hazardous waste site. The suspected contamination consists of elevated soil concentrations of chemicals that are also found in the absence of site-related contamination. How should a risk manager decide which residential properties to sample and which ones to clean? This paper introduces an adaptive spatial sampling approach which uses initial observations to guide subsequent search. Unlike some recent model-based spatial data analysis methods, it does not require any specific statistical model for the spatial distribution of hazards, but instead constructs an increasingly accurate nonparametric approximation to it as sampling proceeds. Possible cost-effective sampling and cleanup decision rules are described by decision parameters such as the number of randomly selected locations used to initialize the process, the number of highest-concentration locations searched around, the number of samples taken at each location, a stopping rule, and a remediation action threshold. These decision parameters are optimized by simulating the performance of each decision rule. The simulation is performed using the data collected so far to impute multiple probable values of unknown soil concentration distributions during each simulation run. This optimized adaptive spatial sampling technique has been applied to real data using error probabilities for wrongly cleaning or wrongly failing to clean each location (compared to the action that would be taken if perfect information were available) as evaluation criteria. It provides a practical approach for quantifying trade-offs between these different types of errors and expected cost. It also identifies strategies that are undominated with respect to all of these criteria.
In the past, many clinical trials have withdrawn subjects from the study when they prematurely stopped their randomised treatment and have therefore only collected ‘on‐treatment’ data. Thus, analyses addressing a treatment policy estimand have been restricted to imputing missing data under assumptions drawn from these data only. Many confirmatory trials are now continuing to collect data from subjects in a study even after they have prematurely discontinued study treatment as this event is irrelevant for the purposes of a treatment policy estimand. However, despite efforts to keep subjects in a trial, some will still choose to withdraw. Recent publications for sensitivity analyses of recurrent event data have focused on the reference‐based imputation methods commonly applied to continuous outcomes, where imputation for the missing data for one treatment arm is based on the observed outcomes in another arm. However, the existence of data from subjects who have prematurely discontinued treatment but remained in the study has now raised the opportunity to use this ‘off‐treatment’ data to impute the missing data for subjects who withdraw, potentially allowing more plausible assumptions for the missing post‐study‐withdrawal data than reference‐based approaches. In this paper, we introduce a new imputation method for recurrent event data in which the missing post‐study‐withdrawal event rate for a particular subject is assumed to reflect that observed from subjects during the off‐treatment period. The method is illustrated in a trial in chronic obstructive pulmonary disease (COPD) where the primary endpoint was the rate of exacerbations, analysed using a negative binomial model. 相似文献
This study investigates the formation of endogamous and exogamous marriages among immigrants and their descendants in the United Kingdom. We apply event history analysis to data from the Understanding Society study and use multiple imputation to determine the type of marriage for individuals with missing information on the origin of their spouse. The analysis shows, first, significant differences among immigrants and their descendants in the likelihood of marrying within and outside their ethnic groups. While immigrants from European countries have relatively high exogamous marriage rates, South Asians exhibit a high likelihood of marrying a partner from their own ethnic group; Caribbean people hold an intermediate position. Second, the descendants of immigrants have lower endogamous and higher exogamous marriage rates than their parents; however, for some ethnic groups, particularly South Asians, the differences across generations are small, suggesting that changes in marriage patterns have been slower than expected. 相似文献
Missing data in clinical trials are inevitable. We highlight the ICH guidelines and CPMP points to consider on missing data. Specifically, we outline how we should consider missing data issues when designing, planning and conducting studies to minimize missing data impact. We also go beyond the coverage of the above two documents, provide a more detailed review of the basic concepts of missing data and frequently used terminologies, and examples of the typical missing data mechanism, and discuss technical details and literature for several frequently used statistical methods and associated software. Finally, we provide a case study where the principles outlined in this paper are applied to one clinical program at protocol design, data analysis plan and other stages of a clinical trial. 相似文献