首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
HOW TO IMPUTE INTERACTIONS, SQUARES, AND OTHER TRANSFORMED VARIABLES   总被引:1,自引:0,他引:1  
Researchers often carry out regression analysis using data that have missing values. Missing values can be filled in using multiple imputation, but imputation is tricky if the regression includes interactions, squares, or other transformations of the regressors. In this paper, we examine different approaches to imputing transformed variables; and we find one simple method that works well across a variety of circumstances. Our recommendation is to transform, then impute —i.e., calculate the interactions or squares in the incomplete data and then impute these transformations like any other variable. The transform-then-impute method yields good regression estimates, even though the imputed values are often inconsistent with one another. It is tempting to try and "fix" the inconsistencies in the imputed values, but methods that do so lead to biased regression estimates. Such biased methods include the passive imputation strategy implemented by the popular ice command for Stata.  相似文献   

2.
Less than optimum strategies for missing values can produce biased estimates, distorted statistical power, and invalid conclusions. After reviewing traditional approaches (listwise, pairwise, and mean substitution), selected alternatives are covered including single imputation, multiple imputation, and full information maximum likelihood estimation. The effects of missing values are illustrated for a linear model, and a series of recommendations is provided. When missing values cannot be avoided, multiple imputation and full information methods offer substantial improvements over traditional approaches. Selected results using SPSS, NORM, Stata (mvis/micombine), and Mplus are included as is a table of available software and an appendix with examples of programs for Stata and Mplus.  相似文献   

3.
Multiple imputation (MI), a two-stage process whereby missing data are imputed multiple times and the resulting estimates of the parameter(s) of interest are combined across the completed datasets, is becoming increasingly popular for handling missing data. However, MI can result in biased inference if not carried out appropriately or if the underlying assumptions are not justifiable. Despite this, there remains a scarcity of guidelines for carrying out MI. In this paper we provide a tutorial on the main issues involved in employing MI, as well as highlighting some common pitfalls and misconceptions, and areas requiring further development. When contemplating using MI we must first consider whether it is likely to offer gains (reduced bias or increased precision) over alternative methods of analysis. Once it has been decided to use MI, there are a number of decisions that must be made during the imputation process; we discuss the extent to which these decisions can be guided by the current literature. Finally we highlight the importance of checking the fit of the imputation model. This process is illustrated using a case study in which we impute missing outcome data in a five-wave longitudinal study that compared extremely preterm individuals with term-born controls.  相似文献   

4.
This article offers an applied review of key issues and methods for the analysis of longitudinal panel data in the presence of missing values. The authors consider the unique challenges associated with attrition (survey dropout), incomplete repeated measures, and unknown observations of time. Using simulated data based on 4 waves of the Marital Instability Over the Life Course Study (n = 2,034), they applied a fixed effect regression model and an event‐history analysis with time‐varying covariates. They then compared results for analyses with nonimputed missing data and with imputed data both in long and in wide structures. Imputation produced improved estimates in the event‐history analysis but only modest improvements in the estimates and standard errors of the fixed effects analysis. Factors responsible for differences in the value of imputation are examined, and recommendations for handling missing values in panel data are presented.  相似文献   

5.
6.
ABSTRACT

Social science datasets usually have missing cases, and missing values. All such missing data has the potential to bias future research findings. However, many research reports ignore the issue of missing data, only consider some aspects of it, or do not report how it is handled. This paper rehearses the damage caused by missing data. The paper then briefly considers eight different approaches to handling missing data so as to minimise that damage, their underlying assumptions and the likely costs and benefits. These approaches include complete case analysis, complete variable analysis, single imputation, multiple imputation, maximum likelihood estimation, default replacement values, weighting, and sensitivity analyses. Using only complete cases should be avoided wherever possible. The paper suggests that the more complex, modelling approaches to replacing missing data are based on questionable methodological and philosophical assumptions. And they may anyway not have clear advantages over simpler approaches like default replacements. It makes sense to report all possible forms of missing data, report everything that is known about the characteristics of cases missing values, conduct simple sensitivity analyses of the potential impact of missing data on the substantive results, and retain the knowledge of missingness when using any form of replacement value.  相似文献   

7.
Household surveys often contain coarse data, which consist of a mixture of missing values, interval-censored values and point (fully-observed) values, making it difficult to construct a continuous money-metric measure of wellbeing. This paper assesses the sensitivity of poverty and inequality estimates to the multiple imputation of coarse earnings data and reported zero values using the 2001?C2006 South African Labour Force Surveys. Estimates of poverty amongst the employed are shown not to be sensitive to multiple imputation of missing and interval-censored data, but are sensitive to the treatment of workers reporting zero earnings. Poverty trends are generally robust to the choice of method, and a significant decline in poverty is evident. Inequality estimates, on the other hand, appear more sensitive to the treatment of zero values and the choice of imputation methods, and, overall, no particular trends in inequality could be discerned.  相似文献   

8.
We propose using latent class analysis as an alternative to log-linear analysis for the multiple imputation of incomplete categorical data. Similar to log-linear models, latent class models can be used to describe complex association structures between the variables used in the imputation model. However, unlike log-linear models, latent class models can be used to build large imputation models containing more than a few categorical variables. To obtain imputations reflecting uncertainty about the unknown model parameters, we use a nonparametric bootstrap procedure as an alternative to the more common full Bayesian approach. The proposed multiple imputation method, which is implemented in Latent GOLD software for latent class analysis, is illustrated with two examples. In a simulated data example, we compare the new method to well-established methods such as maximum likelihood estimation with incomplete data and multiple imputation using a saturated log-linear model. This example shows that the proposed method yields unbiased parameter estimates and standard errors. The second example concerns an application using a typical social sciences data set. It contains 79 variables that are all included in the imputation model. The proposed method is especially useful for such large data sets because standard methods for dealing with missing data in categorical variables break down when the number of variables is so large.  相似文献   

9.
Secondary respondent data are underutilized because researchers avoid using these data in the presence of substantial missing data. The authors reviewed, evaluated, and tested solutions to this problem. Five strategies of dealing with missing partner data were reviewed: (a) complete case analysis, (b) inverse probability weighting, (c) correction with a Heckman selection model, (d) maximum likelihood estimation, and (e) multiple imputation. Two approaches were used to evaluate the performance of these methods. First, the authors used data from the National Survey of Fertility Barriers (n = 1,666) to estimate a model predicting marital quality based on characteristics of women and their husbands. Second, they conducted a simulation testing the 5 methods and compared the results to estimates where the true value was known. They found that the maximum likelihood and multiple imputation methods were advantageous because they allow researchers to utilize all of the available information as well as produce less biased and more efficient estimates.  相似文献   

10.
A randomized trial tested the efficacy of three curriculum versions teaching drug resistance strategies, one modeled on Mexican American culture; another modeled on European American and African American culture; and a multicultural version. Self-report data at baseline and 14 months post-intervention were obtained from 3, 402 Mexican heritage students in 35 Arizona middle schools, including 11 control sites. Tests for intervention effects used simultaneous regression models, multiple imputation of missing data, and adjustments for random effects. Compared with controls, students in the Latino version reported less overall substance use and marijuana use, stronger intentions to refuse substances, greater confidence they could do so, and lower estimates of substance-using peers. Students in the multicultural version reported less alcohol, marijuana, and overall substance use. Although program effects were confined to the Latino and multicultural versions, tests of their relative efficacy compared with the non-Latino version found no significant differences. Implications for evidence-based practice and prevention program designs are discussed, including the role of school social workers in culturally grounded prevention.  相似文献   

11.
Although several methods have been developed to allow for the analysis of data in the presence of missing values, no clear guide exists to help family researchers in choosing among the many options and procedures available. We delineate these options and examine the sensitivity of the findings in a regression model estimated in three random samples from the National Survey of Families and Households (n = 250–2,000). These results, combined with findings from simulation studies, are used to guide answers to a set of 10 common questions asked by researchers when selecting a missing data approach. Modern missing data techniques were found to perform better than traditional ones, but differences between the types of modern approaches had minor effects on the estimates and substantive conclusions. Our findings suggest that the researcher has considerable flexibility in selecting among modern options for handling missing data.  相似文献   

12.
The measurement of inequality of opportunity has hitherto not been attempted in a number of countries because of data limitations. This paper proposes two alternative approaches to circumventing the missing data problems in countries where a demographic and health survey (DHS) and an ancillary household expenditure survey are available. One method relies only on the DHS, and constructs a wealth index as a measure of economic advantage. The alternative method imputes consumption from the ancillary survey into the DHS. In both cases, we compute a lower bound estimator of the share of (ex-ante) inequality of opportunity in total inequality. Parametric and non-parametric estimates are calculated for each method, and the parametric approach is shown to yield preferable lower-bound measures. In an application to the sample of ever-married women aged 30–49 in Turkey, inequality of opportunity accounts for at least 26% (31%) of overall inequality in imputed consumption (the wealth index).  相似文献   

13.
Recent developments have made model-based imputation of network data feasible in principle, but the extant literature provides few practical examples of its use. In this paper, we consider 14 schools from the widely used In-School Survey of Add Health (Harris et al., 2009), applying an ERGM-based estimation and simulation approach to impute the network missing data for each school. Add Health's complex study design leads to multiple types of missingness, and we introduce practical techniques for handing each. We also develop a cross-validation based method – Held-Out Predictive Evaluation (HOPE) – for assessing this approach. Our results suggest that ERGM-based imputation of edge variables is a viable approach to the analysis of complex studies such as Add Health, provided that care is used in understanding and accounting for the study design.  相似文献   

14.
I examine evidence on private sector union wage gaps in the United States. The consensus opinion among labor economists of an average union premium of roughly 15 percent is called into question. Two forms of measurement error bias downward standard wage gap estimates. Match bias results from Census earnings imputation procedures that do not include union status as a match criterion. Downward bias is roughly equal to the proportion of workers with imputed earnings, currently about 30 percent. Misclassification of union status causes additional attenuation in union gap measures. This bias has worsened as private sector density has declined, since an increasing proportion of workers designated as union are instead nonunion workers. Corrections for misclassification and match bias lead to estimated union gaps substantially higher than standard estimates, but with less of a downward trend since the mid 1980s. Private sector union gaps corrected for these biases are estimated from the CPS for 1973–2001. The uncorrected estimate for 2001 is .13 log points. Correction for match bias increases the gap to .18 log points; further correction for misclassification bias, based on an assumed 2 percent error rate, increases the gap to .24. Reexamination of the skill-upgrading hypothesis leads to the conclusion that higher union gap estimates are plausible. The conventional wisdom of a 15 percent union wage premium warrants reexamination.  相似文献   

15.
The first objective of the current study was to examine the relationship between childhood maltreatment, trauma-related symptoms and motivation for treatment in girls in compulsory residential treatment facilities. The second objective was to examine the extent to which various forms of childhood maltreatment, trauma-related symptoms and motivation for treatment predicted (time to) dropout from these facilities. Participants were 154 adolescent girls recruited from three residential treatment settings in The Netherlands. Multiple linear regression analysis revealed that age and ethnicity were associated with motivation for treatment. Furthermore, emotional abuse contributed to motivation for treatment. In addition, internalizing symptoms (e.g., anxiety and depression) significantly predicted level of distress; symptoms of dissociation predicted doubt about treatment. Logistic regression analyses with multiple imputation and competing risk regression analyses revealed no significant predictors for (time to) dropout. The findings suggest that clinicians and therapists should focus on experiences of emotional abuse, traumatic symptoms and treatment motivation in girls in compulsory residential care settings.  相似文献   

16.
17.
When evaluating the effects of public health intervention, larger units, or clusters, of individuals are often the unit of randomization and implementation. Ignoring dependency in the data due to clustering can misrepresent intervention effects. Random-effects models (REMs) may be a useful way to analyze such data. The present study compares results of analyses of data from a nutrition intervention program using four different methods: (a) usual multiple regression analysis using individual subject data, (b) usual multiple regression analysis using the classroom cluster as the unit of analysis, (c) two-level REM model with subjects clustered within classrooms, and (d) two-level REM model with subjects clustered within sites.  相似文献   

18.
In this study, we show how use of the hedonic imputation method complicates the price index problem. In addition to the usual choice between formulas such as Fisher and Törnqvist, the fact that index compilers have some discretion over which prices are imputed implies that it is necessary to choose as well between different varieties of each formula. The functional form of the hedonic model must also be taken into account. We illustrate the importance of these issues in a housing context using house price data for three regions in Sydney over a 3‐yr period. (JEL C43, E31, O47, R31)  相似文献   

19.
A reverse regression method of estimating the union-nonunion wage differential is developed using a multiple indicator model. The method provides a test for the multiple indicator model’s validity and suggests that conventional estimation techniques should underestimate the union-nonunion differential. Empirical estimates show that the reverse regression estimates are larger than conventional estimates and that the multiple indicator model cannot be rejected. The author wishes to thank Robert J. Flanagan and H. G. Lewis for their valuable comments on earlier versions of this paper.  相似文献   

20.
Accurate estimates of biomass in urban forests can help improve strategies for enhancing ecosystem services. Landscape heterogeneity, such as land-cover types and their spatial arrangements, greatly affects biomass growth, and it complicates the estimation of biomass. Application of LiDAR data is a typical approach for mapping forest biomass and carbon stocks across heterogeneous landscapes. However, little is known about how urban land uses and pattern impact biomass and estimates derived from LiDAR analysis. In this study, we examined the relationship between LiDAR-derived biomass and dominant land-cover types using field-measured estimates of aboveground forest biomass in an urbanized region of North Carolina, USA. Three objectives drove this research: 1) we examined the local effects of dominant land cover types on urban forest biomass; 2) we identified the spatial scale at which dominant land cover influences biomass estimates; 3) we investigated whether the fine-scale, spatial heterogeneity of the urban landscape contributed to forest biomass. We used multiple linear regression to relate field-measured biomass to LiDAR metrics and land cover densities derived from Landsat and LiDAR data. The biomass model developed from variables derived from LiDAR first returns produced biomass estimates similar to using all LiDAR returns. Although three land-cover types (impervious surface, managed clearings, and farmland) exhibited a negative relationship with biomass, only impervious surface was statistically significant. The biomass model that used impervious surface densities between 100 m and 175 m radial buffers produced the highest adjusted R 2 with lower RMSE values. Our study suggests that impervious surface impacted forest biomass estimates considerably in urbanizing landscapes with the greatest effect between 100 and 175 m from a forest stand. Managed clearing and farmland types negatively impacted biomass estimation albeit not as strongly as impervious surface. Overall, we found that accounting for impervious surface density and its proximity to forest in biomass models may improve urban forest biomass estimates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号