首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
k-POD: A Method for k-Means Clustering of Missing Data   总被引:1,自引:0,他引:1  
The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.

[Received November 2014. Revised August 2015.]  相似文献   

2.
The aim of this study is to determine the effect of informative priors for variables with missing value and to compare Bayesian Cox regression and Cox regression analysis. For this purpose, firstly simulated data sets with different sample size within different missing rate were generated and each of data sets were analysed by Cox regression and Bayesian Cox regression with informative prior. Secondly lung cancer data set as real data set was used for analysis. Consequently, using informative priors for variables with missing value solved the missing data problem.  相似文献   

3.
In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology.  相似文献   

4.
In the presence of missing values, researchers may be interested in the rates of missing information. The rates of missing information are (a) important for assessing how the missing information contributes to inferential uncertainty about, Q, the population quantity of interest, (b) are an important component in the decision of the number of imputations, and (c) can be used to test model uncertainty and model fitting. In this article I will derive the asymptotic distribution of the rates of missing information in two scenarios: the conventional multiple imputation (MI), and the two-stage MI. Numerically I will show that the proposed asymptotic distribution agrees with the simulated one. I will also suggest the number of imputations needed to obtain reliable missing information rate estimates for each method, based on the asymptotic distribution.  相似文献   

5.
A general nonparametric imputation procedure, based on kernel regression, is proposed to estimate points as well as set- and function-indexed parameters when the data are missing at random (MAR). The proposed method works by imputing a specific function of a missing value (and not the missing value itself), where the form of this specific function is dictated by the parameter of interest. Both single and multiple imputations are considered. The associated empirical processes provide the right tool to study the uniform convergence properties of the resulting estimators. Our estimators include, as special cases, the imputation estimator of the mean, the estimator of the distribution function proposed by Cheng and Chu [1996. Kernel estimation of distribution functions and quantiles with missing data. Statist. Sinica 6, 63–78], imputation estimators of a marginal density, and imputation estimators of regression functions.  相似文献   

6.
Missing data form a ubiquitous problem in scientific research, especially since most statistical analyses require complete data. To evaluate the performance of methods dealing with missing data, researchers perform simulation studies. An important aspect of these studies is the generation of missing values in a simulated, complete data set: the amputation procedure. We investigated the methodological validity and statistical nature of both the current amputation practice and a newly developed and implemented multivariate amputation procedure. We found that the current way of practice may not be appropriate for the generation of intuitive and reliable missing data problems. The multivariate amputation procedure, on the other hand, generates reliable amputations and allows for a proper regulation of missing data problems. The procedure has additional features to generate any missing data scenario precisely as intended. Hence, the multivariate amputation procedure is an efficient method to accurately evaluate missing data methodology.  相似文献   

7.
In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms.  相似文献   

8.
金蛟等 《统计研究》2021,38(11):150-160
回归模型在经济学、生物医学、流行病学、工农业生产等众多领域有着广泛的应用,而在实际数据收集时常常出现无法获得变量的精确数据或全部数据的情况,即常碰到测量误差数据、缺失数据等复杂数据情形。对于回归模型中存在测量误差的情况,如在参数估计时不加以修正,则易产生估计偏差,使得估计精度下降。对于数据缺失情形,如不采取合理的处理方法也会导致模型分析结果不佳。故此,本文研究含有测量误差数据时,解释变量具有随机缺失时的线性测量误差模型和部分线性测量误差模型的稳健参数估计问题。本文提出了一种在测量误差服从拉普拉斯分布时参数的损失修正估计,通过蒙特卡洛模拟和医学研究中的实证分析,显示本文所提的估计方法具有偏差小、精度高、稳健性强的优势。  相似文献   

9.
In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.  相似文献   

10.
Although Fan showed that the mixed-effects model for repeated measures (MMRM) is appropriate to analyze complete longitudinal binary data in terms of the rate difference, they focused on using the generalized estimating equations (GEE) to make statistical inference. The current article emphasizes validity of the MMRM when the normal-distribution-based pseudo likelihood approach is used to make inference for complete longitudinal binary data. For incomplete longitudinal binary data with missing at random missing mechanism, however, the MMRM, using either the GEE or the normal-distribution-based pseudo likelihood inferential procedure, gives biased results in general and should not be used for analysis.  相似文献   

11.
Missing data, a common but challenging issue in most studies, may lead to biased and inefficient inferences if handled inappropriately. As a natural and powerful way for dealing with missing data, Bayesian approach has received much attention in the literature. This paper reviews the recent developments and applications of Bayesian methods for dealing with ignorable and non-ignorable missing data. We firstly introduce missing data mechanisms and Bayesian framework for dealing with missing data, and then introduce missing data models under ignorable and non-ignorable missing data circumstances based on the literature. After that, important issues of Bayesian inference, including prior construction, posterior computation, model comparison and sensitivity analysis, are discussed. Finally, several future issues that deserve further research are summarized and concluded.  相似文献   

12.
A controlled clinical trial was conducted to investigate the efficacy effect of a chemical compound in the treatment of Premenstrual Dysphoric Disorder (PMDD). The data from the trial showed a non-monotone pattern of missing data and an ante-dependence covariance structure. A new analytical method for imputing the missing data with the ante-dependence covariance is proposed. The PMDD data are analysed by the non-imputation method and two imputation methods: the proposed method and the MCMC method.  相似文献   

13.
基于聚类关联规则的缺失数据处理研究   总被引:2,自引:1,他引:2       下载免费PDF全文
 本文提出了基于聚类和关联规则的缺失数据处理新方法,通过聚类方法将含有缺失数据的数据集相近的记录归到一类,然后利用改进后的关联规则方法对各子数据集挖掘变量间的关联性,并利用这种关联性来填补缺失数据。通过实例分析,发现该方法对缺失数据处理,尤其是海量数据集具有较好的效果。  相似文献   

14.
Inverse probability weighting (IPW) can deal with confounding in non randomized studies. The inverse weights are probabilities of treatment assignment (propensity scores), estimated by regressing assignment on predictors. Problems arise if predictors can be missing. Solutions previously proposed include assuming assignment depends only on observed predictors and multiple imputation (MI) of missing predictors. For the MI approach, it was recommended that missingness indicators be used with the other predictors. We determine when the two MI approaches, (with/without missingness indicators) yield consistent estimators and compare their efficiencies.We find that, although including indicators can reduce bias when predictors are missing not at random, it can induce bias when they are missing at random. We propose a consistent variance estimator and investigate performance of the simpler Rubin’s Rules variance estimator. In simulations we find both estimators perform well. IPW is also used to correct bias when an analysis model is fitted to incomplete data by restricting to complete cases. Here, weights are inverse probabilities of being a complete case. We explain how the same MI methods can be used in this situation to deal with missing predictors in the weight model, and illustrate this approach using data from the National Child Development Survey.  相似文献   

15.
Under an assumption that missing values occur randomly in a matrix, formulae are developed for the expected value and variance of six statistics that summarize the number and location of the missing values. For a seventh statistic, a regression model based on simulated data yields an estimate of the expected value. The results can be used in the development of methods to control the Type I error and approximate power and sample size for multilevel and longitudinal studies with missing data.  相似文献   

16.
Summary.  Social data often contain missing information. The problem is inevitably severe when analysing historical data. Conventionally, researchers analyse complete records only. Listwise deletion not only reduces the effective sample size but also may result in biased estimation, depending on the missingness mechanism. We analyse household types by using population registers from ancient China (618–907 AD) by comparing a simple classification, a latent class model of the complete data and a latent class model of the complete and partially missing data assuming four types of ignorable and non-ignorable missingness mechanisms. The findings show that either a frequency classification or a latent class analysis using the complete records only yielded biased estimates and incorrect conclusions in the presence of partially missing data of a non-ignorable mechanism. Although simply assuming ignorable or non-ignorable missing data produced consistently similarly higher estimates of the proportion of complex households, a specification of the relationship between the latent variable and the degree of missingness by a row effect uniform association model helped to capture the missingness mechanism better and improved the model fit.  相似文献   

17.
When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769–782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.  相似文献   

18.
Fractional regression hot deck imputation (FRHDI) imputes multiple values for each instance of a missing dependent variable. The imputed values are equal to the predicted value plus multiple random residuals. Fractional weights enable variance estimation and preserve correlations. In some circumstances with some starting weight values, existing procedures for computing FRHDI weights can produce negative values. We discuss procedures for constructing non-negative adjusted fractional weights for FRHDI and study performance of the algorithm using simulation. The algorithm can be used effectively with FRDHI procedures for handling missing data in the context of a complex sample survey.  相似文献   

19.
ABSTRACT

We propose an extension of parametric product partition models. We name our proposal nonparametric product partition models because we associate a random measure instead of a parametric kernel to each set within a random partition. Our methodology does not impose any specific form on the marginal distribution of the observations, allowing us to detect shifts of behaviour even when dealing with heavy-tailed or skewed distributions. We propose a suitable loss function and find the partition of the data having minimum expected loss. We then apply our nonparametric procedure to multiple change-point analysis and compare it with PPMs and with other methodologies that have recently appeared in the literature. Also, in the context of missing data, we exploit the product partition structure in order to estimate the distribution function of each missing value, allowing us to detect change points using the loss function mentioned above. Finally, we present applications to financial as well as genetic data.  相似文献   

20.
The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号