首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to “ban” p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly.  相似文献   

2.
ABSTRACT

Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directly answer this question and are often misinterpreted in ways that lead to overstating the evidence against the null hypothesis. Even in the “post p?<?0.05 era,” however, it is quite possible that p-values will continue to be widely reported and used to assess the strength of evidence (if for no other reason than the widespread availability and use of statistical software that routinely produces p-values and thereby implicitly advocates for their use). If so, the potential for misinterpretation will persist. In this article, we recommend three practices that would help researchers more accurately interpret p-values. Each of the three recommended practices involves interpreting p-values in light of their corresponding “Bayes factor bound,” which is the largest odds in favor of the alternative hypothesis relative to the null hypothesis that is consistent with the observed data. The Bayes factor bound generally indicates that a given p-value provides weaker evidence against the null hypothesis than typically assumed. We therefore believe that our recommendations can guard against some of the most harmful p-value misinterpretations. In research communities that are deeply attached to reliance on “p?<?0.05,” our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA statement on statistical significance and p-values.  相似文献   

3.
ABSTRACT

When comparing two treatment groups, the objectives are often to (1) determine if the difference between groups (the effect) is of scientific interest, or nonnegligible, and (2) determine if the effect is positive or negative. In practice, a p-value corresponding to the null hypothesis that no effect exists is used to accomplish the first objective and a point estimate for the effect is used to accomplish the second objective. This article demonstrates that this approach is fundamentally flawed and proposes a new approach. The proposed method allows for claims regarding the size of an effect (nonnegligible vs. negligible) and its nature (positive vs. negative) to be made, and provides measures of statistical significance associated with each claim.  相似文献   

4.
ABSTRACT

Various approaches can be used to construct a model from a null distribution and a test statistic. I prove that one such approach, originating with D. R. Cox, has the property that the p-value is never greater than the Generalized Likelihood Ratio (GLR). When combined with the general result that the GLR is never greater than any Bayes factor, we conclude that, under Cox’s model, the p-value is never greater than any Bayes factor. I also provide a generalization, illustrations for the canonical Normal model, and an alternative approach based on sufficiency. This result is relevant for the ongoing discussion about the evidential value of small p-values, and the movement among statisticians to “redefine statistical significance.”  相似文献   

5.
ABSTRACT

In response to growing concern about the reliability and reproducibility of published science, researchers have proposed adopting measures of “greater statistical stringency,” including suggestions to require larger sample sizes and to lower the highly criticized “p?<?0.05” significance threshold. While pros and cons are vigorously debated, there has been little to no modeling of how adopting these measures might affect what type of science is published. In this article, we develop a novel optimality model that, given current incentives to publish, predicts a researcher’s most rational use of resources in terms of the number of studies to undertake, the statistical power to devote to each study, and the desirable prestudy odds to pursue. We then develop a methodology that allows one to estimate the reliability of published research by considering a distribution of preferred research strategies. Using this approach, we investigate the merits of adopting measures of “greater statistical stringency” with the goal of informing the ongoing debate.  相似文献   

6.
ABSTRACT

Recent efforts by the American Statistical Association to improve statistical practice, especially in countering the misuse and abuse of null hypothesis significance testing (NHST) and p-values, are to be welcomed. But will they be successful? The present study offers compelling evidence that this will be an extraordinarily difficult task. Dramatic citation-count data on 25 articles and books severely critical of NHST's negative impact on good science, underlining that this issue was/is well known, did nothing to stem its usage over the period 1960–2007. On the contrary, employment of NHST increased during this time. To be successful in this endeavor, as well as restoring the relevance of the statistics profession to the scientific community in the 21st century, the ASA must be prepared to dispense detailed advice. This includes specifying those situations, if they can be identified, in which the p-value plays a clearly valuable role in data analysis and interpretation. The ASA might also consider a statement that recommends abandoning the use of p-values.  相似文献   

7.
ABSTRACT

A statistical test can be seen as a procedure to produce a decision based on observed data, where some decisions consist of rejecting a hypothesis (yielding a significant result) and some do not, and where one controls the probability to make a wrong rejection at some prespecified significance level. Whereas traditional hypothesis testing involves only two possible decisions (to reject or not a null hypothesis), Kaiser’s directional two-sided test as well as the more recently introduced testing procedure of Jones and Tukey, each equivalent to running two one-sided tests, involve three possible decisions to infer the value of a unidimensional parameter. The latter procedure assumes that a point null hypothesis is impossible (e.g., that two treatments cannot have exactly the same effect), allowing a gain of statistical power. There are, however, situations where a point hypothesis is indeed plausible, for example, when considering hypotheses derived from Einstein’s theories. In this article, we introduce a five-decision rule testing procedure, equivalent to running a traditional two-sided test in addition to two one-sided tests, which combines the advantages of the testing procedures of Kaiser (no assumption on a point hypothesis being impossible) and Jones and Tukey (higher power), allowing for a nonnegligible (typically 20%) reduction of the sample size needed to reach a given statistical power to get a significant result, compared to the traditional approach.  相似文献   

8.
Abstract

It is widely acknowledged that the biomedical literature suffers from a surfeit of false positive results. Part of the reason for this is the persistence of the myth that observation of p?<?0.05 is sufficient justification to claim that you have made a discovery. It is hopeless to expect users to change their reliance on p-values unless they are offered an alternative way of judging the reliability of their conclusions. If the alternative method is to have a chance of being adopted widely, it will have to be easy to understand and to calculate. One such proposal is based on calculation of false positive risk(FPR). It is suggested that p-values and confidence intervals should continue to be given, but that they should be supplemented by a single additional number that conveys the strength of the evidence better than the p-value. This number could be the minimum FPR (that calculated on the assumption of a prior probability of 0.5, the largest value that can be assumed in the absence of hard prior data). Alternatively one could specify the prior probability that it would be necessary to believe in order to achieve an FPR of, say, 0.05.  相似文献   

9.
假设检验的一个常见误区   总被引:8,自引:0,他引:8  
文章通过对假设检验的一个常见错误进行了理论分析,指出假设检验的方法只能在一定情况下否定原假设而不能肯定原假设,并提出了设定原假设和备择假设的正确而简明的方法。  相似文献   

10.
One important type of question in statistical inference is how to interpret data as evidence. The law of likelihood provides a satisfactory answer in interpreting data as evidence for simple hypotheses, but remains silent for composite hypotheses. This article examines how the law of likelihood can be extended to composite hypotheses within the scope of the likelihood principle. From a system of axioms, we conclude that the strength of evidence for the composite hypotheses should be represented by an interval between lower and upper profiles likelihoods. This article is intended to reveal the connection between profile likelihoods and the law of likelihood under the likelihood principle rather than argue in favor of the use of profile likelihoods in addressing general questions of statistical inference. The interpretation of the result is also discussed.  相似文献   

11.
万东华  周晶 《统计研究》2020,37(6):119-128
本文在查阅大量文献的基础上,系统梳理了近代西方统计理论传入我国之后统计学的发展历程。从西方统计理论的传入到统计学在我国的本土化发展,从统计学“一门”与“两门”之争到“大统计”思想的兴起、再到统计学一级学科的建立,从统计学术组织的成立、发展到其对统计学科建设的推动作用,本文尽可能客观地对以上内容进行回顾和展现,以期读者对我国近现代统计学发展有一个基本的了解。  相似文献   

12.
The likelihood ratio test for the equality of k univariate normal populations is extended to include the case in which the means are expressed as simple linear regression functions involving two parameters, The moments of the likelihood ratio statistic are derived, and an approximation to the null distribution is obtained.  相似文献   

13.
ABSTRACT

Both philosophically and in practice, statistics is dominated by frequentist and Bayesian thinking. Under those paradigms, our courses and textbooks talk about the accuracy with which true model parameters are estimated or the posterior probability that they lie in a given set. In nonparametric problems, they talk about convergence to the true function (density, regression, etc.) or the probability that the true function lies in a given set. But the usual paradigms' focus on learning the true model and parameters can distract the analyst from another important task: discovering whether there are many sets of models and parameters that describe the data reasonably well. When we discover many good models we can see in what ways they agree. Points of agreement give us more confidence in our inferences, but points of disagreement give us less. Further, the usual paradigms’ focus seduces us into judging and adopting procedures according to how well they learn the true values. An alternative is to judge models and parameter values, not procedures, and judge them by how well they describe data, not how close they come to the truth. The latter is especially appealing in problems without a true model.  相似文献   

14.
Considering the Wald, score, and likelihood ratio asymptotic test statistics, we analyze a multivariate null intercept errors-in-variables regression model, where the explanatory and the response variables are subject to measurement errors, and a possible structure of dependency between the measurements taken within the same individual are incorporated, representing a longitudinal structure. This model was proposed by Aoki et al. (2003b Aoki , R. , Bolfarine , H. , Achcar , J. A. , Pinto Jr. , D. L. ( 2003b ). Bayesian analysis of a multivariate null intercept errors-in-variables regression model . Journal of Biopharmaceutical Statistics 13 : 767775 .[Taylor & Francis Online] [Google Scholar]) and analyzed under the bayesian approach. In this article, considering the classical approach, we analyze asymptotic test statistics and present a simulation study to compare the behavior of the three test statistics for different sample sizes, parameter values and nominal levels of the test. Also, closed form expressions for the score function and the Fisher information matrix are presented. We consider two real numerical illustrations, the odontological data set from Hadgu and Koch (1999 Hadgu , A. , Koch , G. ( 1999 ). Application of generalized estimating equations to a dental randomized clinical trial . Journal of Biopharmaceutical Statistics 9 ( 1 ): 161178 .[Taylor & Francis Online] [Google Scholar]), and a quality control data set.  相似文献   

15.
ABSTRACT

It is now widely accepted that the techniques of null hypothesis significance testing (NHST) are routinely misused and misinterpreted by researchers seeking insight from data. There is, however, no consensus on acceptable alternatives, leaving researchers with little choice but to continue using NHST, regardless of its failings. I examine the potential for the Analysis of Credibility (AnCred) to resolve this impasse. Using real-life examples, I assess the ability of AnCred to provide researchers with a simple but robust framework for assessing study findings that goes beyond the standard dichotomy of statistical significance/nonsignificance. By extracting more insight from standard summary statistics while offering more protection against inferential fallacies, AnCred may encourage researchers to move toward the post p?<?0.05 era.  相似文献   

16.
Given the usual normal multivariate linear regression model Y = BX + E, with B subjected to double linear restrictions GBF' = T, a likelihood ratio test criterion for testing the composite linear null hypothesis HBJ' = U; G, F, T, H, J, U specified, is provided. The applications of such tests are discussed by Timm (1980).  相似文献   

17.
Two consistent estimators for the non-null variance of Wil-coxon-Mann-Whitney’s statistic applied to grouped ordered data, are considered. The first is based on U-statistics and the sec-ond is obtained by the Delta method. Some examples are given to demonstrate the extent of error when using a null variance esti-mate for constructing confidence intervals. It appears that the two consistent estimates are very close, but may both be disting-uishably larger or smaller than the null variance estimate.  相似文献   

18.
Summary.  The evaluation of handwritten characters that are selected from an anonymous letter and written material from a suspect is an open problem in forensic science. The individualization of handwriting is largely dependent on examiners who evaluate the characteristics in a qualitative and subjective way. Precise individual characterization of the shape of handwritten characters is possible through Fourier analysis: each handwritten character can be described through a set of variables such as the surface and harmonics as demonstrated by Marquis and co-workers in 2005. The assessment of the value of the evidence is performed through the derivation of a likelihood ratio for multivariate data. The methodology allows the forensic scientist to take into account the correlation between variables, and the non-constant variability within sources (i.e. individuals). Numerical procedures are implemented to handle the complexity and to compute the marginal likelihood under competing propositions.  相似文献   

19.
Evaluation of trace evidence in the form of multivariate data   总被引:1,自引:0,他引:1  
Summary.  The evaluation of measurements on characteristics of trace evidence found at a crime scene and on a suspect is an important part of forensic science. Five methods of assessment for the value of the evidence for multivariate data are described. Two are based on significance tests and three on the evaluation of likelihood ratios. The likelihood ratio which compares the probability of the measurements on the evidence assuming a common source for the crime scene and suspect evidence with the probability of the measurements on the evidence assuming different sources for the crime scene and suspect evidence is a well-documented measure of the value of the evidence. One of the likelihood ratio approaches transforms the data to a univariate projection based on the first principal component. The other two versions of the likelihood ratio for multivariate data account for correlation among the variables and for two levels of variation: that between sources and that within sources. One version assumes that between-source variability is modelled by a multivariate normal distribution; the other version models the variability with a multivariate kernel density estimate. Results are compared from the analysis of measurements on the elemental composition of glass.  相似文献   

20.
ABSTRACT

As the debate over best statistical practices continues in academic journals, conferences, and the blogosphere, working researchers (e.g., psychologists) need to figure out how much time and effort to invest in attending to experts' arguments, how to design their next project, and how to craft a sustainable long-term strategy for data analysis and inference. The present special issue of The American Statistician promises help. In this article, we offer a modest proposal for a continued and informed use of the conventional p-value without the pitfalls of statistical rituals. Other statistical indices should complement reporting, and extra-statistical (e.g., theoretical) judgments ought to be made with care and clarity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号