首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Less than optimum strategies for missing values can produce biased estimates, distorted statistical power, and invalid conclusions. After reviewing traditional approaches (listwise, pairwise, and mean substitution), selected alternatives are covered including single imputation, multiple imputation, and full information maximum likelihood estimation. The effects of missing values are illustrated for a linear model, and a series of recommendations is provided. When missing values cannot be avoided, multiple imputation and full information methods offer substantial improvements over traditional approaches. Selected results using SPSS, NORM, Stata (mvis/micombine), and Mplus are included as is a table of available software and an appendix with examples of programs for Stata and Mplus.  相似文献   

2.
We propose using latent class analysis as an alternative to log-linear analysis for the multiple imputation of incomplete categorical data. Similar to log-linear models, latent class models can be used to describe complex association structures between the variables used in the imputation model. However, unlike log-linear models, latent class models can be used to build large imputation models containing more than a few categorical variables. To obtain imputations reflecting uncertainty about the unknown model parameters, we use a nonparametric bootstrap procedure as an alternative to the more common full Bayesian approach. The proposed multiple imputation method, which is implemented in Latent GOLD software for latent class analysis, is illustrated with two examples. In a simulated data example, we compare the new method to well-established methods such as maximum likelihood estimation with incomplete data and multiple imputation using a saturated log-linear model. This example shows that the proposed method yields unbiased parameter estimates and standard errors. The second example concerns an application using a typical social sciences data set. It contains 79 variables that are all included in the imputation model. The proposed method is especially useful for such large data sets because standard methods for dealing with missing data in categorical variables break down when the number of variables is so large.  相似文献   

3.
This article offers an applied review of key issues and methods for the analysis of longitudinal panel data in the presence of missing values. The authors consider the unique challenges associated with attrition (survey dropout), incomplete repeated measures, and unknown observations of time. Using simulated data based on 4 waves of the Marital Instability Over the Life Course Study (n = 2,034), they applied a fixed effect regression model and an event‐history analysis with time‐varying covariates. They then compared results for analyses with nonimputed missing data and with imputed data both in long and in wide structures. Imputation produced improved estimates in the event‐history analysis but only modest improvements in the estimates and standard errors of the fixed effects analysis. Factors responsible for differences in the value of imputation are examined, and recommendations for handling missing values in panel data are presented.  相似文献   

4.
ABSTRACT

Social science datasets usually have missing cases, and missing values. All such missing data has the potential to bias future research findings. However, many research reports ignore the issue of missing data, only consider some aspects of it, or do not report how it is handled. This paper rehearses the damage caused by missing data. The paper then briefly considers eight different approaches to handling missing data so as to minimise that damage, their underlying assumptions and the likely costs and benefits. These approaches include complete case analysis, complete variable analysis, single imputation, multiple imputation, maximum likelihood estimation, default replacement values, weighting, and sensitivity analyses. Using only complete cases should be avoided wherever possible. The paper suggests that the more complex, modelling approaches to replacing missing data are based on questionable methodological and philosophical assumptions. And they may anyway not have clear advantages over simpler approaches like default replacements. It makes sense to report all possible forms of missing data, report everything that is known about the characteristics of cases missing values, conduct simple sensitivity analyses of the potential impact of missing data on the substantive results, and retain the knowledge of missingness when using any form of replacement value.  相似文献   

5.
Household surveys often contain coarse data, which consist of a mixture of missing values, interval-censored values and point (fully-observed) values, making it difficult to construct a continuous money-metric measure of wellbeing. This paper assesses the sensitivity of poverty and inequality estimates to the multiple imputation of coarse earnings data and reported zero values using the 2001?C2006 South African Labour Force Surveys. Estimates of poverty amongst the employed are shown not to be sensitive to multiple imputation of missing and interval-censored data, but are sensitive to the treatment of workers reporting zero earnings. Poverty trends are generally robust to the choice of method, and a significant decline in poverty is evident. Inequality estimates, on the other hand, appear more sensitive to the treatment of zero values and the choice of imputation methods, and, overall, no particular trends in inequality could be discerned.  相似文献   

6.
7.
Using data from the evaluation of the Fast Track intervention, this article illustrates three methods for handling attrition. Multiple imputation and ignorable maximum likelihood estimation produce estimates that are similar to those based on listwise-deleted data. A panel selection model that allows for selective dropout reveals that highly aggressive boys accumulate in the treatment group over time and produces a larger estimate of treatment effect. In contrast, this model produces a smaller treatment effect for girls. The article's conclusion discusses the strengths and weaknesses of the alternative approaches and outlines ways in which researchers might improve their handling of attrition.  相似文献   

8.
HOW TO IMPUTE INTERACTIONS, SQUARES, AND OTHER TRANSFORMED VARIABLES   总被引:1,自引:0,他引:1  
Researchers often carry out regression analysis using data that have missing values. Missing values can be filled in using multiple imputation, but imputation is tricky if the regression includes interactions, squares, or other transformations of the regressors. In this paper, we examine different approaches to imputing transformed variables; and we find one simple method that works well across a variety of circumstances. Our recommendation is to transform, then impute —i.e., calculate the interactions or squares in the incomplete data and then impute these transformations like any other variable. The transform-then-impute method yields good regression estimates, even though the imputed values are often inconsistent with one another. It is tempting to try and "fix" the inconsistencies in the imputed values, but methods that do so lead to biased regression estimates. Such biased methods include the passive imputation strategy implemented by the popular ice command for Stata.  相似文献   

9.
When fitting a generalized linear model—such as linear regression, logistic regression, or hierarchical linear modeling—analysts often wonder how to handle missing values of the dependent variable Y . If missing values have been filled in using multiple imputation, the usual advice is to use the imputed Y values in analysis. We show, however, that using imputed Y s can add needless noise to the estimates. Better estimates can usually be obtained using a modified strategy that we call multiple imputation, then deletion (MID). Under MID, all cases are used for imputation but, following imputation, cases with imputed Y values are excluded from the analysis. When there is something wrong with the imputed Y values, MID protects the estimates from the problematic imputations. And when the imputed Y values are acceptable, MID usually offers somewhat more efficient estimates than an ordinary MI strategy.  相似文献   

10.
Multiple imputation (MI), a two-stage process whereby missing data are imputed multiple times and the resulting estimates of the parameter(s) of interest are combined across the completed datasets, is becoming increasingly popular for handling missing data. However, MI can result in biased inference if not carried out appropriately or if the underlying assumptions are not justifiable. Despite this, there remains a scarcity of guidelines for carrying out MI. In this paper we provide a tutorial on the main issues involved in employing MI, as well as highlighting some common pitfalls and misconceptions, and areas requiring further development. When contemplating using MI we must first consider whether it is likely to offer gains (reduced bias or increased precision) over alternative methods of analysis. Once it has been decided to use MI, there are a number of decisions that must be made during the imputation process; we discuss the extent to which these decisions can be guided by the current literature. Finally we highlight the importance of checking the fit of the imputation model. This process is illustrated using a case study in which we impute missing outcome data in a five-wave longitudinal study that compared extremely preterm individuals with term-born controls.  相似文献   

11.
Although several methods have been developed to allow for the analysis of data in the presence of missing values, no clear guide exists to help family researchers in choosing among the many options and procedures available. We delineate these options and examine the sensitivity of the findings in a regression model estimated in three random samples from the National Survey of Families and Households (n = 250–2,000). These results, combined with findings from simulation studies, are used to guide answers to a set of 10 common questions asked by researchers when selecting a missing data approach. Modern missing data techniques were found to perform better than traditional ones, but differences between the types of modern approaches had minor effects on the estimates and substantive conclusions. Our findings suggest that the researcher has considerable flexibility in selecting among modern options for handling missing data.  相似文献   

12.
The authors have developed and tested scale-up methods, based on a simple social network theory, to estimate the size of hard-to-count subpopulations. The authors asked a nationally representative sample of respondents how many people they knew in a list of 32 subpopulations, including 29 subpopulations of known size and 3 of unknown size. Using these responses, the authors produced an effectively unbiased maximum likelihood estimate of the number of people each respondent knows. These estimates were then used to back-estimate the size of the three populations of unknown size. Maximum likelihood values and 95% confidence intervals are found for seroprevalence, 800,000 +/- 43,000; for homeless, 526,000 +/- 35,000; and for women raped in the last 12 months, 194,000 +/- 21,000. The estimate for seroprevalence agrees strikingly with medical estimates, the homeless estimate is well within the published estimates, and the authors' estimate lies in the middle of the published range for rape victims.  相似文献   

13.
Survey and longitudinal studies in the social and behavioral sciences generally contain missing data. Mean and covariance structure models play an important role in analyzing such data. Two promising methods for dealing with missing data are a direct maximum-likelihood and a two-stage approach based on the unstructured mean and covariance estimates obtained by the EM-algorithm. Typical assumptions under these two methods are ignorable nonresponse and normality of data. However, data sets in social and behavioral sciences are seldom normal, and experience with these procedures indicates that normal theory based methods for nonnormal data very often lead to incorrect model evaluations. By dropping the normal distribution assumption, we develop more accurate procedures for model inference. Based on the theory of generalized estimating equations, a way to obtain consistent standard errors of the two-stage estimates is given. The asymptotic efficiencies of different estimators are compared under various assumptions. We also propose a minimum chi-square approach and show that the estimator obtained by this approach is asymptotically at least as efficient as the two likelihood-based estimators for either normal or nonnormal data. The major contribution of this paper is that for each estimator, we give a test statistic whose asymptotic distribution is chi-square as long as the underlying sampling distribution enjoys finite fourth-order moments. We also give a characterization for each of the two likelihood ratio test statistics when the underlying distribution is nonnormal. Modifications to the likelihood ratio statistics are also given. Our working assumption is that the missing data mechanism is missing completely at random. Examples and Monte Carlo studies indicate that, for commonly encountered nonnormal distributions, the procedures developed in this paper are quite reliable even for samples with missing data that are missing at random.  相似文献   

14.
A randomized trial tested the efficacy of three curriculum versions teaching drug resistance strategies, one modeled on Mexican American culture; another modeled on European American and African American culture; and a multicultural version. Self-report data at baseline and 14 months post-intervention were obtained from 3, 402 Mexican heritage students in 35 Arizona middle schools, including 11 control sites. Tests for intervention effects used simultaneous regression models, multiple imputation of missing data, and adjustments for random effects. Compared with controls, students in the Latino version reported less overall substance use and marijuana use, stronger intentions to refuse substances, greater confidence they could do so, and lower estimates of substance-using peers. Students in the multicultural version reported less alcohol, marijuana, and overall substance use. Although program effects were confined to the Latino and multicultural versions, tests of their relative efficacy compared with the non-Latino version found no significant differences. Implications for evidence-based practice and prevention program designs are discussed, including the role of school social workers in culturally grounded prevention.  相似文献   

15.
Much attention has been devoted to the relationship between Hispanic immigration and violent offending at the macro‐level, including how it varies across racial and ethnic groups. Unfortunately, little attention has been paid to the conditioning effect of the race/ethnicity of the victim, or how Hispanic immigration is associated with crime by one racial/ethnic group against members of the same or different groups. Using National Incident‐Based Reporting System offending estimates and American Community Survey data, we examine the association between Hispanic immigration and black intra‐ and intergroup (black‐on‐white and black‐on‐Hispanic) homicide, robbery, and serious index violence in over 350 U.S. communities. We employ advanced imputation methods to address missing data that have constrained much prior research, as well as utilize crime measures adjusted for the likelihood of random contact between groups. Findings suggest that (1) Hispanic immigration has a positive association with black violence on the whole, but that (2) this association is conditioned by the race/ethnicity of the victim. Our results reinforce the importance of distinguishing across offender–victim dyads in research on the immigration–crime nexus, particularly in light of competing theoretical expectations. Directions for future research and policy are discussed.  相似文献   

16.
The analyses described in this article investigated the association between adolescent fertility expectations and college enrollment (N = 7,838). They also explored the potential impact of fertility expectations and events on college persistence among 4‐year (n = 2,605) and 2‐year (n = 1,962) college students. The analysis, which used data from the National Longitudinal Survey of Youth 1997 cohort, showed a significant association between expectations for early parenthood and the likelihood of going to a 4‐year college or 2‐year college for both men and women. In addition, the authors found that pregnancies were associated with an increased risk of college dropout for women; however, if all of the estimated effect of pregnancies on the risk of dropout were causal, they would still not be a major factor contributing to educational attainment because fertile pregnancies among college women are so rare.  相似文献   

17.
As part of a nationwide study of growing socio-spatial inequality, researchers collaborated with a Toronto youth shelter and a theatre company, Project: Humanity, to use drama methods to explore local manifestations of poverty and social polarization. Together, shelter-dwelling youths and researchers challenged understandings of ‘resilience’ beyond their normative framings that fail to consider youth perspectives. Provoking affective encounters, the drama methodology activated a youth critique of structural inequalities and a peer mentoring for developing tactics to confront incidents such as unwelcome police interactions. The authors propose the concept of creative resilience, which draws from the idea of the ‘ensemble’ in drama, to collectively devise and rehearse strategies of survival and resistance for application in the real world. Such creative and critical improvised encounters catalyse, they further argue, a critical-affective stance in participants, facilitators, and researchers. Such a critical-affective stance demands theoretical sophistication in the analysis of empirical accounts because it values affect as constitutive to knowledge production. Using an illustrative case, the authors put forward a new theoretical frame for youths resilience as an ensemble practice. The findings of this study support the experiential and cultural knowledge of youth as critics and agents of resistance in the face of growing global socio-spatial inequality.  相似文献   

18.
Recent developments have made model-based imputation of network data feasible in principle, but the extant literature provides few practical examples of its use. In this paper, we consider 14 schools from the widely used In-School Survey of Add Health (Harris et al., 2009), applying an ERGM-based estimation and simulation approach to impute the network missing data for each school. Add Health's complex study design leads to multiple types of missingness, and we introduce practical techniques for handing each. We also develop a cross-validation based method – Held-Out Predictive Evaluation (HOPE) – for assessing this approach. Our results suggest that ERGM-based imputation of edge variables is a viable approach to the analysis of complex studies such as Add Health, provided that care is used in understanding and accounting for the study design.  相似文献   

19.
Few college students meet fruit and vegetable intake recommended requirements, and most receive no information from their institutions about this issue. The avoidable disease burden among students is large, the necessary information infrastructure exists, and Healthy People 2010 objectives indicate efforts should be taken to increase intake. Objective: The authors examined the association of high-risk behaviors and fruit and vegetable intake to inform design of multiple risk factor interventions. Participants and Methods: The authors obtained data from a sample of 40,209 18- to 25-year-old college students who completed the American College Health Association-National College Health Assessment during the spring 2002 and 2003 semesters. Results: Predictors of high fruit and vegetable intake for men and women included better: seatbelt and helmet use, physical activity, perceived health, sleep, self-care behaviors, and grades. Other notable predictors of high intake were reduced likelihood of cigarette smoking, alcohol use, drinking and driving, and feeling hopeless in both sexes; reduced likelihood of drinking and driving among men; and a greater likelihood of anorexia among women. Conclusions: The authors discuss implications of these findings.  相似文献   

20.
Typically authors explain how they conduct interpretative phenomenological analysis (IPA), but fail to explain how they ensured that their analytical process was trustworthy. For example, a minority mention that they ‘reached consensus’ after having engaged in a shared analysis of the data, but do not explain how they did so. In this article, we report on our experience of engaging in a shared analysis and aim to stimulate discussion about the process of ensuring the trustworthiness of one’s data when employing IPA. Our key recommendation is that all researchers involved in analysis should listen to the audio recordings; failure to do so increases the potential for researchers to superimpose their own presuppositions or interpretative bias onto the data. We also suggest that audio recordings should be kept for a longer duration in case secondary analysis is required. We finish our article with a series of tips developed from our experience of shared analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号