首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right.  相似文献   

2.
Inequality-restricted hypotheses testing methods containing multivariate one-sided testing methods are useful in practice, especially in multiple comparison problems. In practice, multivariate and longitudinal data often contain missing values since it may be difficult to observe all values for each variable. However, although missing values are common for multivariate data, statistical methods for multivariate one-sided tests with missing values are quite limited. In this article, motivated by a dataset in a recent collaborative project, we develop two likelihood-based methods for multivariate one-sided tests with missing values, where the missing data patterns can be arbitrary and the missing data mechanisms may be non-ignorable. Although non-ignorable missing data are not testable based on observed data, statistical methods addressing this issue can be used for sensitivity analysis and might lead to more reliable results, since ignoring informative missingness may lead to biased analysis. We analyse the real dataset in details under various possible missing data mechanisms and report interesting findings which are previously unavailable. We also derive some asymptotic results and evaluate our new tests using simulations.  相似文献   

3.
Missing data in clinical trials is a well‐known problem, and the classical statistical methods used can be overly simple. This case study shows how well‐established missing data theory can be applied to efficacy data collected in a long‐term open‐label trial with a discontinuation rate of almost 50%. Satisfaction with treatment in chronically constipated patients was the efficacy measure assessed at baseline and every 3 months postbaseline. The improvement in treatment satisfaction from baseline was originally analyzed with a paired t‐test ignoring missing data and discarding the correlation structure of the longitudinal data. As the original analysis started from missing completely at random assumptions regarding the missing data process, the satisfaction data were re‐examined, and several missing at random (MAR) and missing not at random (MNAR) techniques resulted in adjusted estimate for the improvement in satisfaction over 12 months. Throughout the different sensitivity analyses, the effect sizes remained significant and clinically relevant. Thus, even for an open‐label trial design, sensitivity analysis, with different assumptions for the nature of dropouts (MAR or MNAR) and with different classes of models (selection, pattern‐mixture, or multiple imputation models), has been found useful and provides evidence towards the robustness of the original analyses; additional sensitivity analyses could be undertaken to further qualify robustness. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
Statistics, as one of the applied sciences, has great impacts in vast area of other sciences. Prediction of protein structures with great emphasize on their geometrical features using dihedral angles has invoked the new branch of statistics, known as directional statistics. One of the available biological techniques to predict is molecular dynamics simulations producing high-dimensional molecular structure data. Hence, it is expected that the principal component analysis (PCA) can response some related statistical problems particulary to reduce dimensions of the involved variables. Since the dihedral angles are variables on non-Euclidean space (their locus is the torus), it is expected that direct implementation of PCA does not provide great information in this case. The principal geodesic analysis is one of the recent methods to reduce the dimensions in the non-Euclidean case. A procedure to utilize this technique for reducing the dimension of a set of dihedral angles is highlighted in this paper. We further propose an extension of this tool, implemented in such way the torus is approximated by the product of two unit circle and evaluate its application in studying a real data set. A comparison of this technique with some previous methods is also undertaken.  相似文献   

5.
Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach.  相似文献   

6.
Missing data often complicate the analysis of scientific data. Multiple imputation is a general purpose technique for analysis of datasets with missing values. The approach is applicable to a variety of missing data patterns but often complicated by some restrictions like the type of variables to be imputed and the mechanism underlying the missing data. In this paper, the authors compare the performance of two multiple imputation methods, namely fully conditional specification and multivariate normal imputation in the presence of ordinal outcomes with monotone missing data patterns. Through a simulation study and an empirical example, the authors show that the two methods are indeed comparable meaning any of the two may be used when faced with scenarios, at least, as the ones presented here.  相似文献   

7.
We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑1=∑2, (2) multivariate normal with ∑1≠∑2 and (3) multivariate non-normal with ∑1=∑2. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement.  相似文献   

8.
The need to use rigorous, transparent, clearly interpretable, and scientifically justified methodology for preventing and dealing with missing data in clinical trials has been a focus of much attention from regulators, practitioners, and academicians over the past years. New guidelines and recommendations emphasize the importance of minimizing the amount of missing data and carefully selecting primary analysis methods on the basis of assumptions regarding the missingness mechanism suitable for the study at hand, as well as the need to stress‐test the results of the primary analysis under different sets of assumptions through a range of sensitivity analyses. Some methods that could be effectively used for dealing with missing data have not yet gained widespread usage, partly because of their underlying complexity and partly because of lack of relatively easy approaches to their implementation. In this paper, we explore several strategies for missing data on the basis of pattern mixture models that embody clear and realistic clinical assumptions. Pattern mixture models provide a statistically reasonable yet transparent framework for translating clinical assumptions into statistical analyses. Implementation details for some specific strategies are provided in an Appendix (available online as Supporting Information), whereas the general principles of the approach discussed in this paper can be used to implement various other analyses with different sets of assumptions regarding missing data. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
Non-response (or missing data) is often encountered in large-scale surveys. To enable the behavioural analysis of these data sets, statistical treatments are commonly applied to complete or remove these data. However, the correctness of such procedures critically depends on the nature of the underlying missingness generation process. Clearly, the efficacy of applying either case deletion or imputation procedures rests on the unknown missingness generation mechanism. The contribution of this paper is twofold. The study is the first to propose a simple sequential method to attempt to identify the form of missingness. Second, the effectiveness of the tests is assessed by generating (experimentally) nine missing data sets by imposed MCAR, MAR and NMAR processes, with data removed.  相似文献   

10.
Considerable statistical research has been performed in recent years to develop sophisticated statistical methods for handling missing data and dropouts in the analysis of clinical trial data. However, if statisticians and other study team members proactively set out at the trial initiation stage to assess the impact of missing data and investigate ways to reduce dropouts, there is considerable potential to improve the clarity and quality of trial results and also increase efficiency. This paper presents a Human Immunodeficiency Virus (HIV) case study where statisticians led a project to reduce dropouts. The first step was to perform a pooled analysis of past HIV trials investigating which patient subgroups are more likely to drop out. The second step was to educate internal and external trial staff at all levels about the patient types more likely to dropout, and the impact this has on data quality and sample sizes required. The final step was to work collaboratively with clinical trial teams to create proactive plans regarding focused retention efforts, identifying ways to increase retention particularly in patients most at risk. It is acknowledged that identifying the specific impact of new patient retention efforts/tools is difficult because patient retention can be influenced by overall study design, investigational product tolerability profile, current standard of care and treatment access for the disease under study, which may vary over time. However, the implementation of new retention strategies and efforts within clinical trial teams attests to the influence of the analyses described in this case study. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

11.
Missing data are a prevalent and widespread data analytic issue and previous studies have performed simulations to compare the performance of missing data methods in various contexts and for various models; however, one such context that has yet to receive much attention in the literature is the handling of missing data with small samples, particularly when the missingness is arbitrary. Prior studies have either compared methods for small samples with monotone missingness commonly found in longitudinal studies or have investigated the performance of a single method to handle arbitrary missingness with small samples but studies have yet to compare the relative performance of commonly implemented missing data methods for small samples with arbitrary missingness. This study conducts a simulation study to compare and assess the small sample performance of maximum likelihood, listwise deletion, joint multiple imputation, and fully conditional specification multiple imputation for a single-level regression model with a continuous outcome. Results showed that, provided assumptions are met, joint multiple imputation unanimously performed best of the methods examined in the conditions under study.  相似文献   

12.
Previous simulations have reported second order missing data estimators to be superior to the more straightforward first order procedures such as mean value replacement. These simulations however were based on deterministic comparisonsbetween regression criteria even though simulated sampling is a random procedure. In this paper a simulation structured asan experimental design allows statistical testing of the various missing data estimators for the various regression criteria as well as different regression specifications. Our results indicate that although no missing data estimator is globally best many of the computationally simpler first order methods perform as well as the more expensive higher order estimators, contrary to some previous findings.  相似文献   

13.
The Points to Consider Document on Missing Data was adopted by the Committee of Health and Medicinal Products (CHMP) in December 2001. In September 2007 the CHMP issued a recommendation to review the document, with particular emphasis on summarizing and critically appraising the pattern of drop‐outs, explaining the role and limitations of the ‘last observation carried forward’ method and describing the CHMP's cautionary stance on the use of mixed models. In preparation for the release of the updated guidance document, statisticians in the Pharmaceutical Industry held a one‐day expert group meeting in September 2008. Topics that were debated included minimizing the extent of missing data and understanding the missing data mechanism, defining the principles for handling missing data and understanding the assumptions underlying different analysis methods. A clear message from the meeting was that at present, biostatisticians tend only to react to missing data. Limited pro‐active planning is undertaken when designing clinical trials. Missing data mechanisms for a trial need to be considered during the planning phase and the impact on the objectives assessed. Another area for improvement is in the understanding of the pattern of missing data observed during a trial and thus the missing data mechanism via the plotting of data; for example, use of Kaplan–Meier curves looking at time to withdrawal. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

14.
In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.  相似文献   

15.
In longitudinal data, missing observations occur commonly with incomplete responses and covariates. Missing data can have a ‘missing not at random’ mechanism, a non‐monotone missing pattern, and moreover response and covariates can be missing not simultaneously. To avoid complexities in both modelling and computation, a two‐stage estimation method and a pairwise‐likelihood method are proposed. The two‐stage estimation method enjoys simplicities in computation, but incurs more severe efficiency loss. On the other hand, the pairwise approach leads to estimators with better efficiency, but can be cumbersome in computation. In this paper, we develop a compromise method using a hybrid pairwise‐likelihood framework. Our proposed approach has better efficiency than the two‐stage method, but its computational cost is still reasonable compared to the pairwise approach. The performance of the methods is evaluated empirically by means of simulation studies. Our methods are used to analyse longitudinal data obtained from the National Population Health Study.  相似文献   

16.
Dynamic principal component analysis (DPCA), also known as frequency domain principal component analysis, has been developed by Brillinger [Time Series: Data Analysis and Theory, Vol. 36, SIAM, 1981] to decompose multivariate time-series data into a few principal component series. A primary advantage of DPCA is its capability of extracting essential components from the data by reflecting the serial dependence of them. It is also used to estimate the common component in a dynamic factor model, which is frequently used in econometrics. However, its beneficial property cannot be utilized when missing values are present, which should not be simply ignored when estimating the spectral density matrix in the DPCA procedure. Based on a novel combination of conventional DPCA and self-consistency concept, we propose a DPCA method when missing values are present. We demonstrate the advantage of the proposed method over some existing imputation methods through the Monte Carlo experiments and real data analysis.  相似文献   

17.
Differential analysis techniques are commonly used to offer scientists a dimension reduction procedure and an interpretable gateway to variable selection, especially when confronting high-dimensional genomic data. Huang et al. used a gene expression profile of breast cancer cell lines to identify genomic markers which are highly correlated with in vitro sensitivity of a drug Dasatinib. They considered three statistical methods to identify differentially expressed genes and finally used the results from the intersection. But the statistical methods that are used in the paper are not sufficient to select the genomic markers. In this paper we used three alternative statistical methods to select a combined list of genomic markers and compared the genes that were proposed by Huang et al. We then proposed to use sparse principal component analysis (Sparse PCA) to identify a final list of genomic markers. The Sparse PCA incorporates correlation into account among the genes and helps to draw a successful genomic markers discovery. We present a new and a small set of genomic markers to separate out the groups of patients effectively who are sensitive to the drug Dasatinib. The analysis procedure will also encourage scientists in identifying genomic markers that can help to separate out two groups.  相似文献   

18.
Analyzing incomplete data for inferring the structure of gene regulatory networks (GRNs) is a challenging task in bioinformatic. Bayesian network can be successfully used in this field. k-nearest neighbor, singular value decomposition (SVD)-based and multiple imputation by chained equations are three fundamental imputation methods to deal with missing values. Path consistency (PC) algorithm based on conditional mutual information (PCA–CMI) is a famous algorithm for inferring GRNs. This algorithm needs the data set to be complete. However, the problem is that PCA–CMI is not a stable algorithm and when applied on permuted gene orders, different networks are obtained. We propose an order independent algorithm, PCA–CMI–OI, for inferring GRNs. After imputation of missing data, the performances of PCA–CMI and PCA–CMI–OI are compared. Results show that networks constructed from data imputed by the SVD-based method and PCA–CMI–OI algorithm outperform other imputation methods and PCA–CMI. An undirected or partially directed network is resulted by PC-based algorithms. Mutual information test (MIT) score, which can deal with discrete data, is one of the famous methods for directing the edges of resulted networks. We also propose a new score, ConMIT, which is appropriate for analyzing continuous data. Results shows that the precision of directing the edges of skeleton is improved by applying the ConMIT score.  相似文献   

19.
Traditional factor analysis (FA) rests on the assumption of multivariate normality. However, in some practical situations, the data do not meet this assumption; thus, the statistical inference made from such data may be misleading. This paper aims at providing some new tools for the skew-normal (SN) FA model when missing values occur in the data. In such a model, the latent factors are assumed to follow a restricted version of multivariate SN distribution with additional shape parameters for accommodating skewness. We develop an analytically feasible expectation conditional maximization algorithm for carrying out parameter estimation and imputation of missing values under missing at random mechanisms. The practical utility of the proposed methodology is illustrated with two real data examples and the results are compared with those obtained from the traditional FA counterparts.  相似文献   

20.
When missing data occur in studies designed to compare the accuracy of diagnostic tests, a common, though naive, practice is to base the comparison of sensitivity, specificity, as well as of positive and negative predictive values on some subset of the data that fits into methods implemented in standard statistical packages. Such methods are usually valid only under the strong missing completely at random (MCAR) assumption and may generate biased and less precise estimates. We review some models that use the dependence structure of the completely observed cases to incorporate the information of the partially categorized observations into the analysis and show how they may be fitted via a two-stage hybrid process involving maximum likelihood in the first stage and weighted least squares in the second. We indicate how computational subroutines written in R may be used to fit the proposed models and illustrate the different analysis strategies with observational data collected to compare the accuracy of three distinct non-invasive diagnostic methods for endometriosis. The results indicate that even when the MCAR assumption is plausible, the naive partial analyses should be avoided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号