共查询到20条相似文献,搜索用时 15 毫秒
1.
《Journal of Statistical Computation and Simulation》2012,82(5):547-560
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally. 相似文献
2.
The two-way two-levels crossed factorial design is a commonly used design by practitioners at the exploratory phase of industrial experiments. The F-test in the usual linear model for analysis of variance (ANOVA) is a key instrument to assess the impact of each factor and of their interactions on the response variable. However, if assumptions such as normal distribution and homoscedasticity of errors are violated, the conventional wisdom is to resort to nonparametric tests. Nonparametric methods, rank-based as well as permutation, have been a subject of recent investigations to make them effective in testing the hypotheses of interest and to improve their performance in small sample situations. In this study, we assess the performances of some nonparametric methods and, more importantly, we compare their powers. Specifically, we examine three permutation methods (Constrained Synchronized Permutations, Unconstrained Synchronized Permutations and Wald-Type Permutation Test), a rank-based method (Aligned Rank Transform) and a parametric method (ANOVA-Type Test). In the simulations, we generate datasets with different configurations of distribution of errors, variance, factor's effect and number of replicates. The objective is to elicit practical advice and guides to practitioners regarding the sensitivity of the tests in the various configurations, the conditions under which some tests cannot be used, the tradeoff between power and type I error, and the bias of the power on one main factor analysis due to the presence of effect of the other factor. A dataset from an industrial engineering experiment for thermoformed packaging production is used to illustrate the application of the various methods of analysis, taking into account the power of the test suggested by the objective of the experiment. 相似文献
3.
4.
Computer simulation techniques were employed to investigate the Type I and Type II error rates (experiment-wise and comparison-wise) of three nonparametric multiple comparison procedures. Three different underlying distributions were considered. It was found that the nonparametric analog to Fisher’s LSD (a Kruskal-Wallis test, followed by pairwise Mann-Whitney U tests if a significant overall effect is detected) appeared to be superior to the Nemenyi-Dunn and Steel-Dwass procedures, because of the extreme conservatism of these latter methods. 相似文献
5.
Imputation methods that assign a selection of respondents’ values for missing i tern nonresponses give rise to an addd,tional source of sampling variation, which we term imputation varLance , We examine the effect of imputation variance on the precision of the mean, and propose four procedures for sampling the rEespondents that reduce this additional variance. Two of the procedures employ improved sample designs through selection of respc,ndents by sampling without replacement and by stratified sampl;lng. The other two increase the sample base by the use of multiple imputations. 相似文献
6.
7.
8.
It is cleared in recent researches that the raising of missing values in datasets is inevitable. Imputation of missing data is one of the several methods which have been introduced to overcome this issue. Imputation techniques are trying to answer the case of missing data by covering missing values with reasonable estimates permanently. There are a lot of benefits for these procedures rather than their drawbacks. The operation of these methods has not been clarified, which means that they provide mistrust among analytical results. One approach to evaluate the outcomes of the imputation process is estimating uncertainty in the imputed data. Nonparametric methods are appropriate to estimating the uncertainty when data are not followed by any particular distribution. This paper deals with a nonparametric method for estimation and testing the significance of the imputation uncertainty, which is based on Wilcoxon test statistic, and which could be employed for estimating the precision of the imputed values created by imputation methods. This proposed procedure could be employed to judge the possibility of the imputation process for datasets, and to evaluate the influence of proper imputation methods when they are utilized to the same dataset. This proposed approach has been compared with other nonparametric resampling methods, including bootstrap and jackknife to estimate uncertainty in the imputed data under the Bayesian bootstrap imputation method. The ideas supporting the proposed method are clarified in detail, and a simulation study, which indicates how the approach has been employed in practical situations, is illustrated. 相似文献
9.
Federica Cugnata 《统计学通讯:模拟与计算》2017,46(1):315-330
In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms. 相似文献
10.
Vera R. Eastwood 《Revue canadienne de statistique》1993,21(2):209-222
A general model for changepoint problems is discussed from a nonparametric viewpoint. The test statistics introduced are based on Cramér-von Mises functionals of certain processes and are shown to converge in distribution to corresponding Gaussian functionals (under the assumption of no change in distribution, H0). We also demonstrate how the distribution of the limiting Gaussian functionals may be tabulated. Finally, properties of the tests under the alternative hypothesis of exactly one changepoint occurring are studied, and some examples are given. 相似文献
11.
B.B. Winter 《Statistics》2013,47(3):339-355
Two different approaches to the design of optimal observations networks are compared. One approach is based on the traditional experimental design theory, the other essentially uses the covariance analysis methodology of observed fields, It is found that for random fields generated by regression models with random parameters both approaches lead to similar solutions 相似文献
12.
This paper develops a new test statistic for testing hypotheses of the form HO M0=M1=…=Mk versus Ha: M0≦M1,M2,…,Mk] where at least one inequality is strict. M0 is the median of the control population and M1 is the median of the population receiving trearment i, i=1,2,…,k. The population distributions are assumed to be unknown but to differ only in their location parameters if at all. A simulation study is done comparing the new test statistic with the Chacko and the Kruskal-Wallis when the underlying population distributions are either normal, uniform, exponential, or Cauchy. Sample sizes of five, eight, ten, and twenty were considered. The new test statistic did better than the Chacko and the Kruskal-Wallis when the medians of the populations receiving the treatments were approximately the same 相似文献
13.
《Journal of Statistical Computation and Simulation》2012,82(11):1653-1675
In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect. 相似文献
14.
Raffaele Argiento Alessandra Guglielmi Antonio Pievatolo 《Journal of statistical planning and inference》2009,139(12):3989-4005
We will pursue a Bayesian nonparametric approach in the hierarchical mixture modelling of lifetime data in two situations: density estimation, when the distribution is a mixture of parametric densities with a nonparametric mixing measure, and accelerated failure time (AFT) regression modelling, when the same type of mixture is used for the distribution of the error term. The Dirichlet process is a popular choice for the mixing measure, yielding a Dirichlet process mixture model for the error; as an alternative, we also allow the mixing measure to be equal to a normalized inverse-Gaussian prior, built from normalized inverse-Gaussian finite dimensional distributions, as recently proposed in the literature. Markov chain Monte Carlo techniques will be used to estimate the predictive distribution of the survival time, along with the posterior distribution of the regression parameters. A comparison between the two models will be carried out on the grounds of their predictive power and their ability to identify the number of components in a given mixture density. 相似文献
15.
《Journal of Statistical Computation and Simulation》2012,82(12):2384-2405
Given two independent samples of size n and m drawn from univariate distributions with unknown densities f and g, respectively, we are interested in identifying subintervals where the two empirical densities deviate significantly from each other. The solution is built by turning the nonparametric density comparison problem into a comparison of two regression curves. Each regression curve is created by binning the original observations into many small size bins, followed by a suitable form of root transformation to the binned data counts. Turned as a regression comparison problem, several nonparametric regression procedures for detection of sparse signals can be applied. Both multiple testing and model selection methods are explored. Furthermore, an approach for estimating larger connected regions where the two empirical densities are significantly different is also derived, based on a scale-space representation. The proposed methods are applied on simulated examples as well as real-life data from biology. 相似文献
16.
17.
A comparison of multiple imputation and doubly robust estimation for analyses with missing data 总被引:1,自引:0,他引:1
James R. Carpenter Michael G. Kenward Stijn Vansteelandt 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(3):571-584
Summary. Multiple imputation is now a well-established technique for analysing data sets where some units have incomplete observations. Provided that the imputation model is correct, the resulting estimates are consistent. An alternative, weighting by the inverse probability of observing complete data on a unit, is conceptually simple and involves fewer modelling assumptions, but it is known to be both inefficient (relative to a fully parametric approach) and sensitive to the choice of weighting model. Over the last decade, there has been a considerable body of theoretical work to improve the performance of inverse probability weighting, leading to the development of 'doubly robust' or 'doubly protected' estimators. We present an intuitive review of these developments and contrast these estimators with multiple imputation from both a theoretical and a practical viewpoint. 相似文献
18.
In the causal analysis of survival data a time-based response is related to a set of explanatory variables. Definition of the relation between the time and the covariates may become a difficult task, particularly in the preliminary stage, when the information is limited. Through a nonparametric approach, we propose to estimate the survival function allowing to evaluate the relative importance of each potential explanatory variable, in a simple and explanatory fashion. To achieve this aim, each of the explanatory variables is used to partition the observed survival times. The observations are assumed to be partially exchangeable according to such partition. We then consider, conditionally on each partition, a hierarchical nonparametric Bayesian model on the hazard functions. We define and compare different prior distribution for the hazard functions. 相似文献
19.
《Journal of Statistical Computation and Simulation》2012,82(5):527-544
To estimate the effective dose level EDα in the common binary response model, several parametric and nonparametric estimators have been proposed in the literature. In the present article, we focus on nonparametric methods and present a detailed numerical comparison of four different approaches to estimate the EDα nonparametrically. The methods are briefly reviewed and their finite sample properties are studied by means of a detailed simulation study. Moreover, a data example is presented to illustrate the different concepts. 相似文献
20.
Haiko Luepsen 《统计学通讯:模拟与计算》2013,42(9):2547-2576
ABSTRACTFor two-way layouts in a between-subjects analysis of variance design, the parametric F-test is compared with seven nonparametric methods: rank transform (RT), inverse normal transform (INT), aligned rank transform (ART), a combination of ART and INT, Puri & Sen's L statistic, Van der Waerden, and Akritas and Brunners ANOVA-type statistics (ATS). The type I error rates and the power are computed for 16 normal and nonnormal distributions, with and without homogeneity of variances, for balanced and unbalanced designs as well as for several models including the null and the full model. The aim of this study is to identify a method that is applicable without too much testing for all the attributes of the plot. The Van der Waerden test shows the overall best performance though there are some situations in which it is disappointing. The Puri & Sen's and the ATS tests show generally very low power. These two and the other methods cannot keep the type I error rate under control in too many situations. Especially in the case of lognormal distributions, the use of any of the rank-based procedures can be dangerous for cell sizes above 10. As already shown by many other authors, nonnormal distributions do not violate the parametric F-test, but unequal variances do, and heterogeneity of variances leads to an inflated error rate more or less also for the nonparametric methods. Finally, it should be noted that some procedures show rising error rates with increasing cell sizes, the ART, especially for discrete variables, and the RT, Puri & Sen, and the ATS in the cases of heteroscedasticity. 相似文献