首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Fisher exact test has been unjustly dismissed by some as ‘only conditional,’ whereas it is unconditionally the uniform most powerful test among all unbiased tests, tests of size α and with power greater than its nominal level of significance α. The problem with this truly optimal test is that it requires randomization at the critical value(s) to be of size α. Obviously, in practice, one does not want to conclude that ‘with probability x the we have a statistical significant result.’ Usually, the hypothesis is rejected only if the test statistic's outcome is more extreme than the critical value, reducing the actual size considerably.

The randomized unconditional Fisher exact is constructed (using Neyman–structure arguments) by deriving a conditional randomized test randomizing at critical values c(t) by probabilities γ(t), that both depend on the total number of successes T (the complete-sufficient statistic for the nuisance parameter—the common success probability) conditioned upon.

In this paper, the Fisher exact is approximated by deriving nonrandomized conditional tests with critical region including the critical value only if γ (t) > γ0, for a fixed threshold value γ0, such that the size of the unconditional modified test is for all value of the nuisance parameter—the common success probability—smaller, but as close as possible to α. It will be seen that this greatly improves the size of the test as compared with the conservative nonrandomized Fisher exact test.

Size, power, and p value comparison with the (virtual) randomized Fisher exact test, and the conservative nonrandomized Fisher exact, Pearson's chi-square test, with the more competitive mid-p value, the McDonald's modification, and Boschloo's modifications are performed under the assumption of two binomial samples.  相似文献   

2.
The mid-p-value is the standard p-value for a test minus half the difference between it and the nearest lower possible value. Its smaller size lends it an obvious appeal to users — it provides a more significant-looking summary of the evidence against the null hypothesis. This paper examines the possibility that the user might overstate the significance of the evidence by using the smaller mid-p in place of the standard p-value. Routine use of the mid-p is shown to control a quantity related to the Type I error rate. This related quantity is appropriate to consider when the decision to accept or reject the null hypothesis is not always firm. The natural, subjective interpretation of a p-value as the probability that the null hypothesis is true is also examined. The usual asymptotic correspondence between these two probabilities for one-sided hypotheses is shown to be strengthened when the standard p-value is replaced by the mid-p.  相似文献   

3.
The mid-p is defined as the sum of the probabilities of all outcomes more extreme than an observed value, plus half of the probabilities of all outcomes exactly as extreme. On the one hand, it offers greater power than the standard p-value, but on the other, tests based on the mid-p statistic may have greater Type I error than their nominal level. This article investigates the mid p-value's properties under the estimated truth paradigm, which views p-values as estimators of the truth. The mid-p is shown to minimize the maximum risk for one-sided and two-sided tests.  相似文献   

4.
Abstract

Numerous methods—based on exact and asymptotic distributions—can be used to obtain confidence intervals for the odds ratio in 2 × 2 tables. We examine ten methods for generating these intervals based on coverage probability, closeness of coverage probability to target, and length of confidence intervals. Based on these criteria, Cornfield’s method, without the continuity correction, performed the best of the methods examined here. A drawback to use of this method is the significant possibility that the attained coverage probability will not meet the nominal confidence level. Use of a mid-P value greatly improves methods based on the “exact” distribution. When combined with the Wilson rule for selection of a rejection set, the resulting method is a procedure that performed very well. Crow’s method, with use of a mid-P, performed well, although it was only a slight improvement over the Wilson mid-P method. Its cumbersome calculations preclude its general acceptance. Woolf's (logit) method—with the Haldane–Anscombe correction— performed well, especially with regard to length of confidence intervals, and is recommended based on ease of computation.  相似文献   

5.
Fisher's exact test for two-by-two contingency tables has repeatedly been criticized as being too conservative. These criticisms arise most frequently in the context of a planned experiment for which the numbers of successes in each of two experimental groups are assumed to be binomially distributed. It is argued here that the binomial model is often unrealistic, and that the departures from the binomial assumptions reduce the conservatism in Fisher's exact test. Further discussion supports a recent claim of Barnard (1989) that the residual conservatism is attributable, not to any additional information used by the competing method, but to the discrete nature of the test, and can be drastically reduced through the use of Lancaster's mid-p-value. The binomial model is not recommended in that it depends on extra, questionable assumptions.  相似文献   

6.
The classical unconditional exact p-value test can be used to compare two multinomial distributions with small samples. This general hypothesis requires parameter estimation under the null which makes the test severely conservative. Similar property has been observed for Fisher's exact test with Barnard and Boschloo providing distinct adjustments that produce more powerful testing approaches. In this study, we develop a novel adjustment for the conservativeness of the unconditional multinomial exact p-value test that produces nominal type I error rate and increased power in comparison to all alternative approaches. We used a large simulation study to empirically estimate the 5th percentiles of the distributions of the p-values of the exact test over a range of scenarios and implemented a regression model to predict the values for two-sample multinomial settings. Our results show that the new test is uniformly more powerful than Fisher's, Barnard's, and Boschloo's tests with gains in power as large as several hundred percent in certain scenarios. Lastly, we provide a real-life data example where the unadjusted unconditional exact test wrongly fails to reject the null hypothesis and the corrected unconditional exact test rejects the null appropriately.  相似文献   

7.
Unconditional exact tests are increasingly used in practice for categorical data to increase the power of a study and to make the data analysis approach being consistent with the study design. In a two-arm study with a binary endpoint, p-value based on the exact unconditional Barnard test is computed by maximizing the tail probability over a nuisance parameter with a range from 0 to 1. The traditional grid search method is able to find an approximate maximum with a partition of the parameter space, but it is not accurate and this approach becomes computationally intensive for a study beyond two groups. We propose using a polynomial method to rewrite the tail probability as a polynomial. The solutions from the derivative of the polynomial contain the solution for the global maximum of the tail probability. We use an example from a double-blind randomized Phase II cancer clinical trial to illustrate the application of the proposed polynomial method to achieve an accurate p-value. We also compare the performance of the proposed method and the traditional grid search method under various conditions. We would recommend using this new polynomial method in computing accurate exact unconditional p-values.  相似文献   

8.
Editor's Report     
There are two common methods for statistical inference on 2 × 2 contingency tables. One is the widely taught Pearson chi-square test, which uses the well-known χ2statistic. The chi-square test is appropriate for large sample inference, and it is equivalent to the Z-test that uses the difference between the two sample proportions for the 2 × 2 case. Another method is Fisher’s exact test, which evaluates the likelihood of each table with the same marginal totals. This article mathematically justifies that these two methods for determining extreme do not completely agree with each other. Our analysis obtains one-sided and two-sided conditions under which a disagreement in determining extreme between the two tests could occur. We also address the question whether or not their discrepancy in determining extreme would make them draw different conclusions when testing homogeneity or independence. Our examination of the two tests casts light on which test should be trusted when the two tests draw different conclusions.  相似文献   

9.
ABSTRACT

Various approaches can be used to construct a model from a null distribution and a test statistic. I prove that one such approach, originating with D. R. Cox, has the property that the p-value is never greater than the Generalized Likelihood Ratio (GLR). When combined with the general result that the GLR is never greater than any Bayes factor, we conclude that, under Cox’s model, the p-value is never greater than any Bayes factor. I also provide a generalization, illustrations for the canonical Normal model, and an alternative approach based on sufficiency. This result is relevant for the ongoing discussion about the evidential value of small p-values, and the movement among statisticians to “redefine statistical significance.”  相似文献   

10.
This article considers the problem of testing marginal homogeneity in a 2 × 2 contingency table. We first review some well-known conditional and unconditional p-values appeared in the statistical literature. Then we treat the p-value as the test statistic and use the unconditional approach to obtain the modified p-value, which is shown to be valid. For a given nominal level, the rejection region of the modified p-value test contains that of the original p-value test. Some nice properties of the modified p-value are given. Especially, under mild conditions the rejection region of the modified p-value test is shown to be the Barnard convex set as described by Barnard (1947 Barnard , G. A. ( 1947 ). Significance tests for 2 × 2 tables . Biometrika 34 : 123138 .[Crossref], [PubMed], [Web of Science ®] [Google Scholar]). If the one-sided null hypothesis has two nuisance parameters, we show that this result can reduce the dimension of the nuisance parameter space from two to one for computing modified p-values and sizes of tests. Numerical studies including an illustrative example are given. Numerical comparisons show that the sizes of the modified p-value tests are closer to a nominal level than those of the original p-value tests for many cases, especially in the case of small to moderate sample sizes.  相似文献   

11.
Exact unconditional tests for comparing two binomial probabilities are generally more powerful than conditional tests like Fisher's exact test. Their power can be further increased by the Berger and Boos confidence interval method, where a p-value is found by restricting the common binomial probability under H 0 to a 1?γ confidence interval. We studied the average test power for the exact unconditional z-pooled test for a wide range of cases with balanced and unbalanced sample sizes, and significance levels 0.05 and 0.01. The detailed results are available online on the web. Among the values 10?3, 10?4, …, 10?10, the value γ=10?4 gave the highest power, or close to the highest power, in all the cases we looked at, and can be given as a general recommendation as an optimal γ.  相似文献   

12.
The current status and panel count data frequently arise from cancer and tumorigenicity studies when events currently occur. A common and widely used class of two sample tests, for current status and panel count data, is the permutation class. We manipulate the double saddlepoint method to calculate the exact mid-p-values of the underlying permutation distributions of this class of tests. Permutation simulations are replaced by analytical saddlepoint computations which provide extremely accurate mid-p-values that are exact for most practical purposes and almost always more accurate than normal approximations. The method is illustrated using two real tumorigenicity panel count data. To compare the saddlepoint approximation with the normal asymptotic approximation, a simulation study is conducted. The speed and accuracy of the saddlepoint method facilitate the calculation of the confidence interval for the treatment effect. The inversion of the mid-p-values to calculate the confidence interval for the mean rate of development of the recurrent event is discussed.  相似文献   

13.
ABSTRACT

When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α?=?0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information.  相似文献   

14.
Abstract

In a 2-step monotone missing dataset drawn from a multivariate normal population, T2-type test statistic (similar to Hotelling’s T2 test statistic) and likelihood ratio (LR) are often used for the test for a mean vector. In complete data, Hotelling’s T2 test and LR test are equivalent, however T2-type test and LR test are not equivalent in the 2-step monotone missing dataset. Then we interest which statistic is reasonable with relation to power. In this paper, we derive asymptotic power function of both statistics under a local alternative and obtain an explicit form for difference in asymptotic power function. Furthermore, under several parameter settings, we compare LR and T2-type test numerically by using difference in empirical power and in asymptotic power function. Summarizing obtained results, we recommend applying LR test for testing a mean vector.  相似文献   

15.
This article proposes a modified p-value for the two-sided test of the location of the normal distribution when the parameter space is restricted. A commonly used test for the two-sided test of the normal distribution is the uniformly most powerful unbiased (UMPU) test, which is also the likelihood ratio test. The p-value of the test is used as evidence against the null hypothesis. Note that the usual p-value does not depend on the parameter space but only on the observation and the assumption of the null hypothesis. When the parameter space is known to be restricted, the usual p-value cannot sufficiently utilize this information to make a more accurate decision. In this paper, a modified p-value (also called the rp-value) dependent on the parameter space is proposed, and the test derived from the modified p-value is also shown to be the UMPU test.  相似文献   

16.
A p-value is developed for testing the equivalence of the variances of a bivariate normal distribution. The unknown correlation coefficient is a nuisance parameter in the problem. If the correlation is known, the proposed p-value provides an exact test. For large samples, the p-value can be computed by replacing the unknown correlation by the sample correlation, and the resulting test is quite satisfactory. For small samples, it is proposed to compute the p-value by replacing the unknown correlation by a scalar multiple of the sample correlation. However, a single scalar is not satisfactory, and it is proposed to use different scalars depending on the magnitude of the sample correlation coefficient. In order to implement this approach, tables are obtained providing sub-intervals for the sample correlation coefficient, and the scalars to be used if the sample correlation coefficient belongs to a particular sub-interval. Once such tables are available, the proposed p-value is quite easy to compute since it has an explicit analytic expression. Numerical results on the type I error probability and power are reported on the performance of such a test, and the proposed p-value test is also compared to another test based on a rejection region. The results are illustrated with two examples: an example dealing with the comparability of two measuring devices, and an example dealing with the assessment of bioequivalence.  相似文献   

17.
In this article, we point out some interesting relations between the exact test and the score test for a binomial proportion p. Based on the properties of the tests, we propose some approximate as well as exact methods of computing sample sizes required for the tests to attain a specified power. Sample sizes required for the tests are tabulated for various values of p to attain a power of 0.80 at level 0.05. We also propose approximate and exact methods of computing sample sizes needed to construct confidence intervals with a given precision. Using the proposed exact methods, sample sizes required to construct 95% confidence intervals with various precisions are tabulated for p = .05(.05).5. The approximate methods for computing sample sizes for score confidence intervals are very satisfactory and the results coincide with those of the exact methods for many cases.  相似文献   

18.
Guogen Shan 《Statistics》2018,52(5):1086-1095
In addition to point estimate for the probability of response in a two-stage design (e.g. Simon's two-stage design for binary endpoints), confidence limits should be computed and reported. The current method of inverting the p-value function to compute the confidence interval does not guarantee coverage probability in a two-stage setting. The existing exact approach to calculate one-sided limits is based on the overall number of responses to order the sample space. This approach could be conservative because many sample points have the same limits. We propose a new exact one-sided interval based on p-value for the sample space ordering. Exact intervals are computed by using binomial distributions directly, instead of a normal approximation. Both exact intervals preserve the nominal confidence level. The proposed exact interval based on the p-value generally performs better than the other exact interval with regard to expected length and simple average length of confidence intervals.  相似文献   

19.
ABSTRACT

The display of the data by means of contingency tables is used in different approaches to statistical inference, for example, to broach the test of homogeneity of independent multinomial distributions. We develop a Bayesian procedure to test simple null hypotheses versus bilateral alternatives in contingency tables. Given independent samples of two binomial distributions and taking a mixed prior distribution, we calculate the posterior probability that the proportion of successes in the first population is the same as in the second. This posterior probability is compared with the p-value of the classical method, obtaining a reconciliation between both results, classical and Bayesian. The obtained results are generalized for r × s tables.  相似文献   

20.
Abstract

In statistical hypothesis testing, a p-value is expected to be distributed as the uniform distribution on the interval (0, 1) under the null hypothesis. However, some p-values, such as the generalized p-value and the posterior predictive p-value, cannot be assured of this property. In this paper, we propose an adaptive p-value calibration approach, and show that the calibrated p-value is asymptotically distributed as the uniform distribution. For Behrens–Fisher problem and goodness-of-fit test under a normal model, the calibrated p-values are constructed and their behavior is evaluated numerically. Simulations show that the calibrated p-values are superior than original ones.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号