首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Six procedures which convert tests of homogeneity of variance into tests for mean equality for independent groups are compared. The tests are the analysis of variance (ANOVA) and Welch F statistics. The Welch statistics are included since it was anticipated that ANOVA would not provide a robust test when samples of unequal sizes are obtained from non-normal populations. However, the Welch tests are not found to be uniformly preferrable. In addition, a prior recommendation for Miller's jackknife procedure is not supported for the unequal sample size case. The data indicates that the current tests for variance heterogeneity are either sensitive to non-normality or, if robust, lacking in power. Therefore, these tests cannot be recommended for the purpose of testing the validity of the ANOVA homogeneity assumption.  相似文献   

2.
SUMMARY When the assumptions of parametric statistical tests for the difference between two means are violated, it is commonly advised that non-parametric tests are a more robust substitute. The history of the investigation of this issue is summarized. The robustness of the t -test was evaluated, by repeated computer testing for differences between samples from two populations of equal means but non-normal distributions and with different variances and sample sizes. Two common alternatives to t -Welch's approximate t and the Mann-Whitney U -test-were evaluated in the same way. The t -test is sufficiently robust for use in all likely cases, except when skew is severe or when population variances and sample sizes both differ. The Welch test satisfactorily addressed the latter problem, but was itself sensitive to departures from normality. Contrary to its popular reputation, the U -test showed a dramatic 'lack of robustness' in many cases-largely because it is sensitive to population differences other than between means, so it is not properly a 'non-parametric analogue' of the t -test, as it is too often described.  相似文献   

3.
Bayesian sample size estimation for equivalence and non-inferiority tests for diagnostic methods is considered. The goal of the study is to test whether a new screening test of interest is equivalent to, or not inferior to the reference test, which may or may not be a gold standard. Sample sizes are chosen by the model performance criteria of average posterior variance, length and coverage probability. In the absence of a gold standard, sample sizes are evaluated by the ratio of marginal probabilities of the two screening tests; whereas in the presence of gold standard, sample sizes are evaluated by the measures of sensitivity and specificity.  相似文献   

4.
The size of the two-sample t test is generally thought to be robust against nonnormal distributions if the sample sizes are large. This belief is based on central limit theory, and asymptotic expansions of the moments of the t statistic suggest that robustness may be improved for moderate sample sizes if the variance, skewness, and kurtosis of the distributions are matched, particularly if the sample sizes are also equal.

It is shown that asymptotic arguments such as these can be misleading and that, in fact, the size of the t test can be as large as unity if the distributions are allowed to be completely arbitrary. Restricting the distributions to be identical or symmetric (but otherwise arbitrary) does not guarantee that the size can be controlled either, but controlling the tail-heaviness of the distributions does. The last result is proved more generally for the k-sample F test.  相似文献   

5.
Consistency of some nonparametric tests with real variables has been studied by several authors under the assumption that population variance is finite and/or in the presence of some violations of the data exchangeability between samples. Since main inferential conclusions of permutation tests concern the actual dataset, where sample sizes are held fixed, we consider the notion of consistency in the weak version (in probability). Here, we characterize weak consistency of permutation tests assuming population mean is finite and without assuming existence of population variance. Moreover, since permutation test statistics do not require to be standardized, we do not assume that data are homoscedastic in the alternative. Several application examples to mostly used test statistics are discussed. A simulation study and some hints for robust testing procedures are also presented.  相似文献   

6.
We examine the asymptotic and small sample properties of model-based and robust tests of the null hypothesis of no randomized treatment effect based on the partial likelihood arising from an arbitrarily misspecified Cox proportional hazards model. When the distribution of the censoring variable is either conditionally independent of the treatment group given covariates or conditionally independent of covariates given the treatment group, the numerators of the partial likelihood treatment score and Wald tests have asymptotic mean equal to 0 under the null hypothesis, regardless of whether or how the Cox model is misspecified. We show that the model-based variance estimators used in the calculation of the model-based tests are not, in general, consistent under model misspecification, yet using analytic considerations and simulations we show that their true sizes can be as close to the nominal value as tests calculated with robust variance estimators. As a special case, we show that the model-based log-rank test is asymptotically valid. When the Cox model is misspecified and the distribution of censoring depends on both treatment group and covariates, the asymptotic distributions of the resulting partial likelihood treatment score statistic and maximum partial likelihood estimator do not, in general, have a zero mean under the null hypothesis. Here neither the fully model-based tests, including the log-rank test, nor the robust tests will be asymptotically valid, and we show through simulations that the distortion to test size can be substantial.  相似文献   

7.
A robust procedure is developed for testing the equality of means in the two sample normal model. This is based on the weighted likelihood estimators of Basu et al. (1993). When the normal model is true the tests proposed have the same asymptotic power as the two sample Student's t-statistic in the equal variance case. However, when the normality assumptions are only approximately true the proposed tests can be substantially more powerful than the classical tests. In a Monte Carlo study for the equal variance case under various outlier models the proposed test using Hellinger distance based weighted likelihood estimator compared favorably with the classical test as well as the robust test proposed by Tiku (1980).  相似文献   

8.
We study the properties of the quasi-maximum likelihood estimator (QMLE) and related test statistics in dynamic models that jointly parameterize conditional means and conditional covariances, when a normal log-likelihood os maximized but the assumption of normality is violated. Because the score of the normal log-likelihood has the martingale difference property when the forst two conditional moments are correctly specified, the QMLE is generally Consistent and has a limiting normal destribution. We provide easily computable formulas for asymptotic standard errors that are valid under nonnormality. Further, we show how robust LM tests for the adequacy of the jointly parameterized mean and variance can be computed from simple auxiliary regressions. An appealing feature of these robyst inference procedures is that only first derivatives of the conditional mean and variance functions are needed. A monte Carlo study indicates that the asymptotic results carry over to finite samples. Estimation of several AR and AR-GARCH time series models reveals that in most sotuations the robust test statistics compare favorably to the two standard (nonrobust) formulations of the Wald and IM tests. Also, for the GARCH models and the sample sizes analyzed here, the bias in the QMLE appears to be relatively small. An empirical application to stock return volatility illustrates the potential imprtance of computing robust statistics in practice.  相似文献   

9.
Research on tests for scale equality, that are robust to violations of the distributional normality assumption, have focused exclusively on an overall test statistic and have not examined procedures for identifying specific differences in multiple group designs. The present study compares four contrast analysis procedures for scale differences in the single factor four group design. Two data transformations are considered under several conbinations of variance difference, sample sizes, and distributional forms.The results indicate that no single transformation or analysis procedure is uniformly superior in controlling the familywise error rate or in statistical power. The relationship between sample size and variances is a major factor in selecting a contrast analysis procedure.  相似文献   

10.
We study the properties of the quasi-maximum likelihood estimator (QMLE) and related test statistics in dynamic models that jointly parameterize conditional means and conditional covariances, when a normal log-likelihood os maximized but the assumption of normality is violated. Because the score of the normal log-likelihood has the martingale difference property when the forst two conditional moments are correctly specified, the QMLE is generally Consistent and has a limiting normal destribution. We provide easily computable formulas for asymptotic standard errors that are valid under nonnormality. Further, we show how robust LM tests for the adequacy of the jointly parameterized mean and variance can be computed from simple auxiliary regressions. An appealing feature of these robyst inference procedures is that only first derivatives of the conditional mean and variance functions are needed. A monte Carlo study indicates that the asymptotic results carry over to finite samples. Estimation of several AR and AR-GARCH time series models reveals that in most sotuations the robust test statistics compare favorably to the two standard (nonrobust) formulations of the Wald and IM tests. Also, for the GARCH models and the sample sizes analyzed here, the bias in the QMLE appears to be relatively small. An empirical application to stock return volatility illustrates the potential imprtance of computing robust statistics in practice.  相似文献   

11.
In a previous paper, Posten, Yeh and Owen (1982), the robustness of the type I error for the two tailed two sample t - test was studied under departures from the assumption of equal variances. The level of robustness of this test was then quantified under the concept of regions of robustness. These results are extended here to the one - tailed test for the same problem. The high level of robustness for equal or nearly equal sample sizes observed in the previous study is again documented quantitatively.  相似文献   

12.
When the two-sample t-test has equal sample slies, it is widely considered to be a robust procedure (with respect to the significaoce level) under violatioa of the assuaptioo of equal variances. This paper is coa-earned with a quantification of the amount of robustness which this procedure has under such violations, The approach is through the concept of "religion of robustness" and the resluts show an extremely strong degree of robustness for the equal an extremely strong degree of robustness for the equal sample size t-test, probably more so than most statistyicians realise. This extremely high level of robustness, however, reduces quickly as the sample sizes begin to vary from equality. The regions of robustnes obtained show that while most users would likely be satisfied with the degree of robustness inherent when the two sample sizes each vary by 10% from equality, most would wish to be much more cautions when the variation is 20%. The study covers sample sizes n1 -= n 2 = 5(5)30(10)50 plus 10% and 20% variations thereof for the two-tailed test and nominal significance levels of 0.01 and 0.05.  相似文献   

13.
We propose a robust version of Cox-type test statistics for the choice between two non-nested hypotheses. We first show that the influence of small amounts of contamination in the data on the test decision can be very large. Secondly, we build a robust test statistic by using the results on robust parametric tests that are available in the literature and show that the level of the robust test is stable. Finally, we show numerically not only the robustness of this new test statistic but also that its asymptotic distribution is a good approximation of its sample distribution, unlike for the classical test statistic. We apply our results to the choice between a Pareto and an exponential distribution as well as between two competing regressors in the simple linear regression model without intercept.  相似文献   

14.
Lehmann & Stein (1948) proved the existence of non-similar tests which can be more powerful than best similar tests. They used Student's problem of testing for a non-zero mean given a random sample from the normal distribution with unknown variance as an example. This raises the question: should we use a non-similar test instead of Student's t test? Questions like this can be answered by comparing the power of the test with the power envelope. This paper discusses the difficulties involved in computing power envelopes. It reports an empirical comparison of the power of the t test and the power envelope and finds that the two are almost identical especially for sample sizes greater than 20. These findings suggest that, as well as being uniformly most powerful (UMP) within the class of similar tests, Student's t test is approximately UMP within the class of all tests. For practical purposes it might also be regarded as UMP when moderate or large sample sizes are involved.  相似文献   

15.
The use of several robust estimators of location with their associated variance estimates in a modified T-method for pairwise multiple comparisons between treatment means was compared with the sample mean and variance and with the k-sample rank sum test. The methods were compared with respect to the stability of their experimentwise error rates under a variety of non-normal situations (robustness of validity) and their average confidence interval lengths (robustness of efficiency).  相似文献   

16.
Heterogeneity of variances of treatment groups influences the validity and power of significance tests of location in two distinct ways. First, if sample sizes are unequal, the Type I error rate and power are depressed if a larger variance is associated with a larger sample size, and elevated if a larger variance is associated with a smaller sample size. This well-established effect, which occurs in t and F tests, and to a lesser degree in nonparametric rank tests, results from unequal contributions of pooled estimates of error variance in the computation of test statistics. It is observed in samples from normal distributions, as well as non-normal distributions of various shapes. Second, transformation of scores from skewed distributions with unequal variances to ranks produces differences in the means of the ranks assigned to the respective groups, even if the means of the initial groups are equal, and a subsequent inflation of Type I error rates and power. This effect occurs for all sample sizes, equal and unequal. For the t test, the discrepancy diminishes, and for the Wilcoxon–Mann–Whitney test, it becomes larger, as sample size increases. The Welch separate-variance t test overcomes the first effect but not the second. Because of interaction of these separate effects, the validity and power of both parametric and nonparametric tests performed on samples of any size from unknown distributions with possibly unequal variances can be distorted in unpredictable ways.  相似文献   

17.
Because the usual F test for equal means is not robust to unequal variances, Brown and Forsythe (1974a) suggest replacing F with the statistics F or W which are based on the Satterthwaite and Welch adjusted degrees of freedom procedures. This paper reports practical situations where both F and W give * unsatisfactory results. In particular, both F and W may not provide adequate control over Type I errors. Moreover, for equal variances, but unequal sample sizes, W should be avoided in favor of F (or F ), but for equal sample sizes, and possibly unequal variances, W was the only satisfactory statistic. New results on power are included as well. The paper also considers the effect of using F or W only after a significant test for equal variances has been obtained, and new results on the robustness of the F test are described. It is found that even for equal sample sizes as large as 50 per treatment group, there are practical situations where the F test does not provide adequately control over the probability of a Type I error.  相似文献   

18.
A new jackknife test is proposed to test the equality of variances in several populations. The new test is based on jackknifing one group of observations at a time, instead of one observation in each group as recommended by Miller for a two sample case, and by Layard for several samples. The proposed test is examined, and compared with other tests, in terms of power and robustness with respect to a wide variety of non-normal distributions. It is found that the new test is robust and has reasonably high power for normal as well as for non-normal observations, irrespective of the sample size. Furthermore, the proposed test is certainly superior to all other tests considered here in small to moderate size samples, and is as good as or better than the other tests in large samples, irrespective of the distribution of sampling observations.  相似文献   

19.
A number of tests are available for testing the equality of several population variances. Some are claimed to be robust. We compared six of those claimed robust procedures by Monte Carlo simulated experiments, particularly for cases of small and unequal sample sizes. Our results show that the jack-knife test compares favorably with the other tests.  相似文献   

20.
The problem of testing the similarity of two normal populations is reconsidered, in this article, from a nonclassical point of view. We introduce a test statistic based on the maximum likelihood estimate of Weitzman's overlapping coefficient. Simulated critical points are provided for the proposed test for various sample sizes and significance levels. Statistical powers of the proposed test are computed via simulation studies and compared to those of the existing tests. Furthermore, Type-I error robustness of the proposed and the existing tests are studied via simulation studies when the underlying distributions are non-normal. Two data sets are analyzed for illustration purposes. Finally, the proposed test has been implemented to assess the bioequivalence of two drug formulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号