首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We propose a new meta-analysis method to pool univariate p-values across independent studies and we compare our method with that of Fisher, Stouffer, and George through simulations and identify sub-spaces where each of these methods are optimal and propose a strategy to choose the best meta-analysis method under different sub-spaces. We compare these meta-analysis approaches using p-values from periodicity tests of 4,940 S. Pombe genes from 10 independent time-course experiments and show that our new approach ranks the periodic, conserved, and cycling genes much higher, and detects at least as many genes among the top 1,000 genes, compared to other methods.  相似文献   

2.
Unconditional exact tests are increasingly used in practice for categorical data to increase the power of a study and to make the data analysis approach being consistent with the study design. In a two-arm study with a binary endpoint, p-value based on the exact unconditional Barnard test is computed by maximizing the tail probability over a nuisance parameter with a range from 0 to 1. The traditional grid search method is able to find an approximate maximum with a partition of the parameter space, but it is not accurate and this approach becomes computationally intensive for a study beyond two groups. We propose using a polynomial method to rewrite the tail probability as a polynomial. The solutions from the derivative of the polynomial contain the solution for the global maximum of the tail probability. We use an example from a double-blind randomized Phase II cancer clinical trial to illustrate the application of the proposed polynomial method to achieve an accurate p-value. We also compare the performance of the proposed method and the traditional grid search method under various conditions. We would recommend using this new polynomial method in computing accurate exact unconditional p-values.  相似文献   

3.
Abstract

In statistical hypothesis testing, a p-value is expected to be distributed as the uniform distribution on the interval (0, 1) under the null hypothesis. However, some p-values, such as the generalized p-value and the posterior predictive p-value, cannot be assured of this property. In this paper, we propose an adaptive p-value calibration approach, and show that the calibrated p-value is asymptotically distributed as the uniform distribution. For Behrens–Fisher problem and goodness-of-fit test under a normal model, the calibrated p-values are constructed and their behavior is evaluated numerically. Simulations show that the calibrated p-values are superior than original ones.  相似文献   

4.
This paper considers the problem of testing equality between two independent binomial proportions. Hwang and Yang (Statist. Sinica 11 (2001) 807) apply the Neyman–Pearson fundamental lemma and the estimated truth approach to derive optimal procedures, named expected p-values. This p-value has been shown to be identical to the mid p-value in Lancaster (J. Amer. Statist. Assoc. (1961) 223) for the one-sided test. For the two-sided test, the paper proves the usual two-sided mid p-value is identical to the expected p-value in the balanced sample case.  相似文献   

5.
In this article, we introduce two goodness-of-fit tests for testing normality through the concept of the posterior predictive p-value. The discrepancy variables selected are the Kolmogorov-Smirnov (KS) and Berk-Jones (BJ) statistics and the prior chosen is Jeffreys’ prior. The constructed posterior predictive p-values are shown to be distributed independently of the unknown parameters under the null hypothesis, thus they can be taken as the test statistics. It emerges from the simulation that the new tests are more powerful than the corresponding classical tests against most of the alternatives concerned.  相似文献   

6.
ABSTRACT

This article has two objectives. The first and narrower is to formalize the p-value function, which records all possible p-values, each corresponding to a value for whatever the scalar parameter of interest is for the problem at hand, and to show how this p-value function directly provides full inference information for any corresponding user or scientist. The p-value function provides familiar inference objects: significance levels, confidence intervals, critical values for fixed-level tests, and the power function at all values of the parameter of interest. It thus gives an immediate accurate and visual summary of inference information for the parameter of interest. We show that the p-value function of the key scalar interest parameter records the statistical position of the observed data relative to that parameter, and we then describe an accurate approximation to that p-value function which is readily constructed.  相似文献   

7.
While it is often argued that a p-value is a probability; see Wasserstein and Lazar, we argue that a p-value is not defined as a probability. A p-value is a bijection of the sufficient statistic for a given test which maps to the same scale as the Type I error probability. As such, the use of p-values in a test should be no more a source of controversy than the use of a sufficient statistic. It is demonstrated that there is, in fact, no ambiguity about what a p-value is, contrary to what has been claimed in recent public debates in the applied statistics community. We give a simple example to illustrate that rejecting the use of p-values in testing for a normal mean parameter is conceptually no different from rejecting the use of a sample mean. The p-value is innocent; the problem arises from its misuse and misinterpretation. The way that p-values have been informally defined and interpreted appears to have led to tremendous confusion and controversy regarding their place in statistical analysis.  相似文献   

8.
Abstract

The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent misinterpretation of P-values as error probabilities. It then discusses several properties of P-values that have been presented as fatal flaws: That P-values exhibit extreme variation across samples (and thus are “unreliable”), confound effect size with sample size, are sensitive to sample size, and depend on investigator sampling intentions. These properties are often criticized from a likelihood or Bayesian framework, yet they are exactly the properties P-values should exhibit when they are constructed and interpreted correctly within their originating framework. Other common criticisms are that P-values force users to focus on irrelevant hypotheses and overstate evidence against those hypotheses. These problems are not however properties of P-values but are faults of researchers who focus on null hypotheses and overstate evidence based on misperceptions that p?=?0.05 represents enough evidence to reject hypotheses. Those problems are easily seen without use of Bayesian concepts by translating the observed P-value p into the Shannon information (S-value or surprisal) –log2(p).  相似文献   

9.
This article considers the problem of testing marginal homogeneity in a 2 × 2 contingency table. We first review some well-known conditional and unconditional p-values appeared in the statistical literature. Then we treat the p-value as the test statistic and use the unconditional approach to obtain the modified p-value, which is shown to be valid. For a given nominal level, the rejection region of the modified p-value test contains that of the original p-value test. Some nice properties of the modified p-value are given. Especially, under mild conditions the rejection region of the modified p-value test is shown to be the Barnard convex set as described by Barnard (1947 Barnard , G. A. ( 1947 ). Significance tests for 2 × 2 tables . Biometrika 34 : 123138 .[Crossref], [PubMed], [Web of Science ®] [Google Scholar]). If the one-sided null hypothesis has two nuisance parameters, we show that this result can reduce the dimension of the nuisance parameter space from two to one for computing modified p-values and sizes of tests. Numerical studies including an illustrative example are given. Numerical comparisons show that the sizes of the modified p-value tests are closer to a nominal level than those of the original p-value tests for many cases, especially in the case of small to moderate sample sizes.  相似文献   

10.
The classical unconditional exact p-value test can be used to compare two multinomial distributions with small samples. This general hypothesis requires parameter estimation under the null which makes the test severely conservative. Similar property has been observed for Fisher's exact test with Barnard and Boschloo providing distinct adjustments that produce more powerful testing approaches. In this study, we develop a novel adjustment for the conservativeness of the unconditional multinomial exact p-value test that produces nominal type I error rate and increased power in comparison to all alternative approaches. We used a large simulation study to empirically estimate the 5th percentiles of the distributions of the p-values of the exact test over a range of scenarios and implemented a regression model to predict the values for two-sample multinomial settings. Our results show that the new test is uniformly more powerful than Fisher's, Barnard's, and Boschloo's tests with gains in power as large as several hundred percent in certain scenarios. Lastly, we provide a real-life data example where the unadjusted unconditional exact test wrongly fails to reject the null hypothesis and the corrected unconditional exact test rejects the null appropriately.  相似文献   

11.
Abstract

The hypothesis tests of performance measures for an M/Ek/1 queueing system are considered. With pivotal models deduced from sufficient statistics for the unknown parameters, a generalized p-value approach to derive tests about parametric functions are proposed. The focus is on derivation of the p-values of hypothesis testing for five popular performance measures of the system in the steady state. Given a sample T, let p(T) be the p values we developed. We derive a closed form expression to show that, for small samples, the probability P(p(T) ? γ) is approximately equal to γ, for 0 ? γ ? 1.  相似文献   

12.
This paper investigates methodologies for evaluating the probabilistic value (P-value) of the Kolmogorov–Smirnov (K–S) goodness-of-fit test using algorithmic program development implemented in Microsoft® Visual Basic® (VB). Six methods were examined for the one-sided one-sample and two methods for the two-sided one-sample cumulative sampling distributions in the investigative software implementation that was based on machine-precision arithmetic. For sample sizes n≤2000 considered, results from the Smirnov iterative method found optimal accuracy for K–S P-values≥0.02, while those from the SmirnovD were more accurate for lower P-values for the one-sided one-sample distribution statistics. Also, the Durbin matrix method sustained better P-value results than the Durbin recursion method for the two-sided one-sample tests up to n≤700 sample sizes. Based on these results, an algorithm for Microsoft Excel® function was proposed from which a model function was developed and its implementation was used to test the performance of engineering students in a general engineering course across seven departments.  相似文献   

13.
This paper considers p-value based step-wise rejection procedures for testing multiple hypotheses. The existing procedures have used constants as critical values at all steps. With the intention of incorporating the exact magnitude of the p-values at the earlier steps into the decisions at the later steps, this paper applies a different strategy that the critical values at the later steps are determined as functions of the p-values from the earlier steps. As a result, we have derived a new equality and developed a two-step rejection procedure following that. The new procedure is a short-cut of a step-up procedure, and it possesses great simplicity. In terms of power, the proposed procedure is generally comparable to the existing ones and exceptionally superior when the largest p-value is anticipated to be less than 0.5.  相似文献   

14.
The mid-p is defined as the sum of the probabilities of all outcomes more extreme than an observed value, plus half of the probabilities of all outcomes exactly as extreme. On the one hand, it offers greater power than the standard p-value, but on the other, tests based on the mid-p statistic may have greater Type I error than their nominal level. This article investigates the mid p-value's properties under the estimated truth paradigm, which views p-values as estimators of the truth. The mid-p is shown to minimize the maximum risk for one-sided and two-sided tests.  相似文献   

15.
ABSTRACT

Various approaches can be used to construct a model from a null distribution and a test statistic. I prove that one such approach, originating with D. R. Cox, has the property that the p-value is never greater than the Generalized Likelihood Ratio (GLR). When combined with the general result that the GLR is never greater than any Bayes factor, we conclude that, under Cox’s model, the p-value is never greater than any Bayes factor. I also provide a generalization, illustrations for the canonical Normal model, and an alternative approach based on sufficiency. This result is relevant for the ongoing discussion about the evidential value of small p-values, and the movement among statisticians to “redefine statistical significance.”  相似文献   

16.
Just as frequentist hypothesis tests have been developed to check model assumptions, prior predictive p-values and other Bayesian p-values check prior distributions as well as other model assumptions. These model checks not only suffer from the usual threshold dependence of p-values, but also from the suppression of model uncertainty in subsequent inference. One solution is to transform Bayesian and frequentist p-values for model assessment into a fiducial distribution across the models. Averaging the Bayesian or frequentist posterior distributions with respect to the fiducial distribution can reproduce results from Bayesian model averaging or classical fiducial inference.  相似文献   

17.
P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels 0.05, 0.01, and 0.001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates.  相似文献   

18.
ABSTRACT

Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directly answer this question and are often misinterpreted in ways that lead to overstating the evidence against the null hypothesis. Even in the “post p?<?0.05 era,” however, it is quite possible that p-values will continue to be widely reported and used to assess the strength of evidence (if for no other reason than the widespread availability and use of statistical software that routinely produces p-values and thereby implicitly advocates for their use). If so, the potential for misinterpretation will persist. In this article, we recommend three practices that would help researchers more accurately interpret p-values. Each of the three recommended practices involves interpreting p-values in light of their corresponding “Bayes factor bound,” which is the largest odds in favor of the alternative hypothesis relative to the null hypothesis that is consistent with the observed data. The Bayes factor bound generally indicates that a given p-value provides weaker evidence against the null hypothesis than typically assumed. We therefore believe that our recommendations can guard against some of the most harmful p-value misinterpretations. In research communities that are deeply attached to reliance on “p?<?0.05,” our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA statement on statistical significance and p-values.  相似文献   

19.
Two-treatment multicentre clinical trials are very common in practice. In cases where a non-parametric analysis is appropriate, a rank-sum test for grouped data called the van Elteren test can be applied. As an alternative approach, one may apply a combination test such as Fisher's combination test or the inverse normal combination test (also called Liptak's method) in order to combine centre-specific P-values. If there are no ties and no differences between centres with regard to the groups’ sample sizes, the inverse normal combination test using centre-specific Wilcoxon rank-sum tests is equivalent to the van Elteren test. In this paper, the van Elteren test is compared with Fisher's combination test based on Wilcoxon rank-sum tests. Data from two multicentre trials as well as simulated data indicate that Fisher's combination of P-values is more powerful than the van Elteren test in realistic scenarios, i.e. when there are large differences between the centres’ P-values, some quantitative interaction between treatment and centre, and/or heterogeneity in variability. The combination approach opens the possibility of using statistics other than the rank sum, and it is also a suitable method for more complicated designs, e.g. when covariates such as age or gender are included in the analysis.  相似文献   

20.
A class of bivariate symmetry tests for complete data and competing risks data is considered. Saddlepoint approximation for the exact p-values of the underlying permutation distribution of these tests is derived. Several simulation studies are conducted to evaluate the performance of the saddlepoint approximation and the asymptotic approximation. The saddlepoint approximation was found to be highly accurate and superior to the asymptotic approximations in replicating the exact permutation significance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号