期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Practicing safe statistics with the mid-p

R.D. Routledge 《Revue canadienne de statistique》1994,22(1):103-110

The mid-p-value is the standard p-value for a test minus half the difference between it and the nearest lower possible value. Its smaller size lends it an obvious appeal to users — it provides a more significant-looking summary of the evidence against the null hypothesis. This paper examines the possibility that the user might overstate the significance of the evidence by using the smaller mid-p in place of the standard p-value. Routine use of the mid-p is shown to control a quantity related to the Type I error rate. This related quantity is appropriate to consider when the decision to accept or reject the null hypothesis is not always firm. The natural, subjective interpretation of a p-value as the probability that the null hypothesis is true is also examined. The usual asymptotic correspondence between these two probabilities for one-sided hypotheses is shown to be strengthened when the standard p-value is replaced by the mid-p. 相似文献

2.

Abandon Statistical Significance

Blakeley B. McShane David Gal Andrew Gelman Christian Robert Jennifer L. Tackett 《The American statistician》2019,73(1):235-245

ABSTRACT

We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to “ban” p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly. 相似文献

3.

Significance test for linear regression: how to test without P-values?

Paravee Maneejuk Woraphon Yamaka 《Journal of applied statistics》2021,48(5):827

The discussion on the use and misuse of p-values in 2016 by the American Statistician Association was a timely assertion that statistical concept should be properly used in science. Some researchers, especially the economists, who adopt significance testing and p-values to report their results, may felt confused by the statement, leading to misinterpretations of the statement. In this study, we aim to re-examine the accuracy of the p-value and introduce an alternative way for testing the hypothesis. We conduct a simulation study to investigate the reliability of the p-value. Apart from investigating the performance of p-value, we also introduce some existing approaches, Minimum Bayes Factors and Belief functions, for replacing p-value. Results from the simulation study confirm unreliable p-value in some cases and that our proposed approaches seem to be useful as the substituted tool in the statistical inference. Moreover, our results show that the plausibility approach is more accurate for making decisions about the null hypothesis than the traditionally used p-values when the null hypothesis is true. However, the MBFs of Edwards et al. [Bayesian statistical inference for psychological research. Psychol. Rev. 70(3) (1963), pp. 193–242]; Vovk [A logic of probability, with application to the foundations of statistics. J. Royal Statistical Soc. Series B (Methodological) 55 (1993), pp. 317–351] and Sellke et al. [Calibration of p values for testing precise null hypotheses. Am. Stat. 55(1) (2001), pp. 62–71] provide more reliable results compared to all other methods when the null hypothesis is false.KEYWORDS: Ban of P-value, Minimum Bayes Factors, belief functions 相似文献

4.

The adaptive calibration of testing p-values

Guimei Zhao Li Wang 《统计学通讯:理论与方法》2013,42(4):922-932

Abstract

In statistical hypothesis testing, a p-value is expected to be distributed as the uniform distribution on the interval (0, 1) under the null hypothesis. However, some p-values, such as the generalized p-value and the posterior predictive p-value, cannot be assured of this property. In this paper, we propose an adaptive p-value calibration approach, and show that the calibrated p-value is asymptotically distributed as the uniform distribution. For Behrens–Fisher problem and goodness-of-fit test under a normal model, the calibrated p-values are constructed and their behavior is evaluated numerically. Simulations show that the calibrated p-values are superior than original ones. 相似文献

5.

Two new estimators for the proportion of true null hypotheses in multiple test

Yunxuan Qiao Wei Yu 《Journal of Statistical Computation and Simulation》2017,87(4):712-723

In multiple hypothesis test, an important problem is estimating the proportion of true null hypotheses. Existing methods are mainly based on the p-values of the single tests. In this paper, we propose two new estimations for this proportion. One is a natural extension of the commonly used methods based on p-values and the other is based on a mixed distribution. Simulations show that the first method is comparable with existing methods and performs better under some cases. And the method based on a mixed distribution can get accurate estimators even if the variance of data is large or the difference between the null hypothesis and alternative hypothesis is very small. 相似文献

6.

Will the ASA's Efforts to Improve Statistical Practice be Successful? Some Evidence to the Contrary

Raymond Hubbard 《The American statistician》2019,73(1):31-35

ABSTRACT

Recent efforts by the American Statistical Association to improve statistical practice, especially in countering the misuse and abuse of null hypothesis significance testing (NHST) and p-values, are to be welcomed. But will they be successful? The present study offers compelling evidence that this will be an extraordinarily difficult task. Dramatic citation-count data on 25 articles and books severely critical of NHST's negative impact on good science, underlining that this issue was/is well known, did nothing to stem its usage over the period 1960–2007. On the contrary, employment of NHST increased during this time. To be successful in this endeavor, as well as restoring the relevance of the statistics profession to the scientific community in the 21st century, the ASA must be prepared to dispense detailed advice. This includes specifying those situations, if they can be identified, in which the p-value plays a clearly valuable role in data analysis and interpretation. The ASA might also consider a statement that recommends abandoning the use of p-values. 相似文献

7.

Three Recommendations for Improving the Use of p-Values

Daniel J. Benjamin 《The American statistician》2019,73(1):186-191

ABSTRACT

Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directly answer this question and are often misinterpreted in ways that lead to overstating the evidence against the null hypothesis. Even in the “post p?<?0.05 era,” however, it is quite possible that p-values will continue to be widely reported and used to assess the strength of evidence (if for no other reason than the widespread availability and use of statistical software that routinely produces p-values and thereby implicitly advocates for their use). If so, the potential for misinterpretation will persist. In this article, we recommend three practices that would help researchers more accurately interpret p-values. Each of the three recommended practices involves interpreting p-values in light of their corresponding “Bayes factor bound,” which is the largest odds in favor of the alternative hypothesis relative to the null hypothesis that is consistent with the observed data. The Bayes factor bound generally indicates that a given p-value provides weaker evidence against the null hypothesis than typically assumed. We therefore believe that our recommendations can guard against some of the most harmful p-value misinterpretations. In research communities that are deeply attached to reliance on “p?<?0.05,” our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA statement on statistical significance and p-values. 相似文献

8.

A more powerful unconditional exact test of homogeneity for 2 × c contingency table analysis

Louis Ehwerhemuepha Heng Sok Cyril Rakovski 《Journal of applied statistics》2019,46(14):2572-2582

The classical unconditional exact p-value test can be used to compare two multinomial distributions with small samples. This general hypothesis requires parameter estimation under the null which makes the test severely conservative. Similar property has been observed for Fisher's exact test with Barnard and Boschloo providing distinct adjustments that produce more powerful testing approaches. In this study, we develop a novel adjustment for the conservativeness of the unconditional multinomial exact p-value test that produces nominal type I error rate and increased power in comparison to all alternative approaches. We used a large simulation study to empirically estimate the 5th percentiles of the distributions of the p-values of the exact test over a range of scenarios and implemented a regression model to predict the values for two-sample multinomial settings. Our results show that the new test is uniformly more powerful than Fisher's, Barnard's, and Boschloo's tests with gains in power as large as several hundred percent in certain scenarios. Lastly, we provide a real-life data example where the unadjusted unconditional exact test wrongly fails to reject the null hypothesis and the corrected unconditional exact test rejects the null appropriately. 相似文献

9.

Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels

Mark Andrew Gannon Carlos Alberto de Bragança Pereira Adriano Polpo 《The American statistician》2019,73(1):213-222

ABSTRACT

This article argues that researchers do not need to completely abandon the p-value, the best-known significance index, but should instead stop using significance levels that do not depend on sample sizes. A testing procedure is developed using a mixture of frequentist and Bayesian tools, with a significance level that is a function of sample size, obtained from a generalized form of the Neyman–Pearson Lemma that minimizes a linear combination of α, the probability of rejecting a true null hypothesis, and β, the probability of failing to reject a false null, instead of fixing α and minimizing β. The resulting hypothesis tests do not violate the Likelihood Principle and do not require any constraints on the dimensionalities of the sample space and parameter space. The procedure includes an ordering of the entire sample space and uses predictive probability (density) functions, allowing for testing of both simple and compound hypotheses. Accessible examples are presented to highlight specific characteristics of the new tests. 相似文献

10.

A note on multiple testing for composite null hypotheses

Stefano Cabras 《Journal of statistical planning and inference》2010

Multiple hypothesis testing literature has recently experienced a growing development with particular attention to the control of the false discovery rate (FDR) based on p-values. While these are not the only methods to deal with multiplicity, inference with small samples and large sets of hypotheses depends on the specific choice of the p-value used to control the FDR in the presence of nuisance parameters. In this paper we propose to use the partial posterior predictive p-value [Bayarri, M.J., Berger, J.O., 2000. p-values for composite null models. J. Amer. Statist. Assoc. 95, 1127–1142] that overcomes this difficulty. This choice is motivated by theoretical considerations and examples. Finally, an application to a controlled microarray experiment is presented. 相似文献

11.

Editorial Collaborators

Sander Greenland 《The American statistician》2019,73(1):106-108

Abstract

The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent misinterpretation of P-values as error probabilities. It then discusses several properties of P-values that have been presented as fatal flaws: That P-values exhibit extreme variation across samples (and thus are “unreliable”), confound effect size with sample size, are sensitive to sample size, and depend on investigator sampling intentions. These properties are often criticized from a likelihood or Bayesian framework, yet they are exactly the properties P-values should exhibit when they are constructed and interpreted correctly within their originating framework. Other common criticisms are that P-values force users to focus on irrelevant hypotheses and overstate evidence against those hypotheses. These problems are not however properties of P-values but are faults of researchers who focus on null hypotheses and overstate evidence based on misperceptions that p?=?0.05 represents enough evidence to reject hypotheses. Those problems are easily seen without use of Bayesian concepts by translating the observed P-value p into the Shannon information (S-value or surprisal) –log₂(p). 相似文献

12.

P-Value Precision and Reproducibility

《The American statistician》2013,67(4):213-221

P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels 0.05, 0.01, and 0.001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates. 相似文献

13.

Modified p-Value of Two-Sided Test for Normal Distribution with Restricted Parameter Space

Hsiuying Wang 《统计学通讯:理论与方法》2013,42(8):1361-1374

This article proposes a modified p-value for the two-sided test of the location of the normal distribution when the parameter space is restricted. A commonly used test for the two-sided test of the normal distribution is the uniformly most powerful unbiased (UMPU) test, which is also the likelihood ratio test. The p-value of the test is used as evidence against the null hypothesis. Note that the usual p-value does not depend on the parameter space but only on the observation and the assumption of the null hypothesis. When the parameter space is known to be restricted, the usual p-value cannot sufficiently utilize this information to make a more accurate decision. In this paper, a modified p-value (also called the rp-value) dependent on the parameter space is proposed, and the test derived from the modified p-value is also shown to be the UMPU test. 相似文献

14.

The p-value Function and Statistical Inference

D. A. S. Fraser 《The American statistician》2019,73(1):135-147

ABSTRACT

This article has two objectives. The first and narrower is to formalize the p-value function, which records all possible p-values, each corresponding to a value for whatever the scalar parameter of interest is for the problem at hand, and to show how this p-value function directly provides full inference information for any corresponding user or scientist. The p-value function provides familiar inference objects: significance levels, confidence intervals, critical values for fixed-level tests, and the power function at all values of the parameter of interest. It thus gives an immediate accurate and visual summary of inference information for the parameter of interest. We show that the p-value function of the key scalar interest parameter records the statistical position of the observed data relative to that parameter, and we then describe an accurate approximation to that p-value function which is readily constructed. 相似文献

15.

Why are p-Values Controversial?

Todd A. Kuffner Stephen G. Walker 《The American statistician》2019,73(1):1-3

While it is often argued that a p-value is a probability; see Wasserstein and Lazar, we argue that a p-value is not defined as a probability. A p-value is a bijection of the sufficient statistic for a given test which maps to the same scale as the Type I error probability. As such, the use of p-values in a test should be no more a source of controversy than the use of a sufficient statistic. It is demonstrated that there is, in fact, no ambiguity about what a p-value is, contrary to what has been claimed in recent public debates in the applied statistics community. We give a simple example to illustrate that rejecting the use of p-values in testing for a normal mean parameter is conceptually no different from rejecting the use of a sample mean. The p-value is innocent; the problem arises from its misuse and misinterpretation. The way that p-values have been informally defined and interpreted appears to have led to tremendous confusion and controversy regarding their place in statistical analysis. 相似文献

16.

The Use of Posterior Predictive P-Values in Testing Goodness-of-Fit

Daojiang He Xingzhong Xu Xuhua Liu 《统计学通讯:理论与方法》2013,42(23):4287-4297

In this article, we introduce two goodness-of-fit tests for testing normality through the concept of the posterior predictive p-value. The discrepancy variables selected are the Kolmogorov-Smirnov (KS) and Berk-Jones (BJ) statistics and the prior chosen is Jeffreys’ prior. The constructed posterior predictive p-values are shown to be distributed independently of the unknown parameters under the null hypothesis, thus they can be taken as the test statistics. It emerges from the simulation that the new tests are more powerful than the corresponding classical tests against most of the alternatives concerned. 相似文献

17.

Statistical evidence and surprise unified under possibility theory

David R. Bickel 《Scandinavian Journal of Statistics》2023,50(3):923-928

Sander Greenland argues that reported results of hypothesis tests should include the surprisal, the base-2 logarithm of the reciprocal of a p-value. The surprisal measures how many bits of evidence in the data warrant rejecting the null hypothesis. A generalization of surprisal also can measure how much the evidence justifies rejecting a composite hypothesis such as the complement of a confidence interval. That extended surprisal, called surprise, quantifies how many bits of astonishment an agent believing a hypothesis would experience upon observing the data. While surprisal is a function of a point in hypothesis space, surprise is a function of a subset of hypothesis space. Satisfying the conditions of conditional min-plus probability, surprise inherits a wealth of tools from possibility theory. The equivalent compatibility function has been recently applied to the replication crisis, to adjusting p-values for prior information, and to comparing scientific theories. 相似文献

18.

Adaptive procedure for generalized familywise error rate control

Li Wang 《统计学通讯:模拟与计算》2017,46(10):8140-8151

This article considers multiple hypotheses testing with the generalized familywise error rate k-FWER control, which is the probability of at least k false rejections. We first assume the p-values corresponding to the true null hypotheses are independent, and propose adaptive generalized Bonferroni procedure with k-FWER control based on the estimation of the number of true null hypotheses. Then, we assume the p-values are dependent, satisfying block dependence, and propose adaptive procedure with k-FWER control. Extensive simulations compare the performance of the adaptive procedures with different estimators. 相似文献

19.

The discrepancy of P-values and posterior probability: Two-sided hypothesis of variance of normal distribution with unknown mean

Rahim Chinipardaz Behzad Falahifard Mohammad Reza Akhoond 《统计学通讯:理论与方法》2017,46(11):5544-5555

This article is concerned with the comparison of P-value and Bayesian measure in point null hypothesis for the variance of Normal distribution with unknown mean. First, using fixed prior for test parameter, the posterior probability is obtained and compared with the P-value when an appropriate prior is used for the mean parameter. In the second, lower bounds of the posterior probability of H₀ under a reasonable class of prior are compared with the P-value. It has been shown that even in the presence of nuisance parameters, these two approaches can lead to different results in the statistical inference. 相似文献

20.

Bonferroni-type Plug-in Procedure Controlling Generalized Familywise Error Rate

Li Wang 《统计学通讯:理论与方法》2013,42(14):3042-3055

Consider the multiple hypotheses testing problem controlling the generalized familywise error rate k-FWER, the probability of at least k false rejections. We propose a plug-in procedure based on the estimation of the number of true null hypotheses. Under the independence assumption of the p-values corresponding to the true null hypotheses, we first introduce the least favorable configuration (LFC) of k-FWER for Bonferroni-type plug-in procedure, then we construct a plug-in k-FWER-controlled procedure based on LFC. For dependent p-values, we establish the asymptotic k-FWER control under some mild conditions. Simulation studies suggest great improvement over generalized Bonferroni test and generalized Holm test. 相似文献