首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
假设检验的一个常见误区   总被引:8,自引:0,他引:8  
文章通过对假设检验的一个常见错误进行了理论分析,指出假设检验的方法只能在一定情况下否定原假设而不能肯定原假设,并提出了设定原假设和备择假设的正确而简明的方法。  相似文献   

2.
ABSTRACT

We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to “ban” p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly.  相似文献   

3.
ABSTRACT

When comparing two treatment groups, the objectives are often to (1) determine if the difference between groups (the effect) is of scientific interest, or nonnegligible, and (2) determine if the effect is positive or negative. In practice, a p-value corresponding to the null hypothesis that no effect exists is used to accomplish the first objective and a point estimate for the effect is used to accomplish the second objective. This article demonstrates that this approach is fundamentally flawed and proposes a new approach. The proposed method allows for claims regarding the size of an effect (nonnegligible vs. negligible) and its nature (positive vs. negative) to be made, and provides measures of statistical significance associated with each claim.  相似文献   

4.
ABSTRACT

Recent efforts by the American Statistical Association to improve statistical practice, especially in countering the misuse and abuse of null hypothesis significance testing (NHST) and p-values, are to be welcomed. But will they be successful? The present study offers compelling evidence that this will be an extraordinarily difficult task. Dramatic citation-count data on 25 articles and books severely critical of NHST's negative impact on good science, underlining that this issue was/is well known, did nothing to stem its usage over the period 1960–2007. On the contrary, employment of NHST increased during this time. To be successful in this endeavor, as well as restoring the relevance of the statistics profession to the scientific community in the 21st century, the ASA must be prepared to dispense detailed advice. This includes specifying those situations, if they can be identified, in which the p-value plays a clearly valuable role in data analysis and interpretation. The ASA might also consider a statement that recommends abandoning the use of p-values.  相似文献   

5.
ABSTRACT

A statistical test can be seen as a procedure to produce a decision based on observed data, where some decisions consist of rejecting a hypothesis (yielding a significant result) and some do not, and where one controls the probability to make a wrong rejection at some prespecified significance level. Whereas traditional hypothesis testing involves only two possible decisions (to reject or not a null hypothesis), Kaiser’s directional two-sided test as well as the more recently introduced testing procedure of Jones and Tukey, each equivalent to running two one-sided tests, involve three possible decisions to infer the value of a unidimensional parameter. The latter procedure assumes that a point null hypothesis is impossible (e.g., that two treatments cannot have exactly the same effect), allowing a gain of statistical power. There are, however, situations where a point hypothesis is indeed plausible, for example, when considering hypotheses derived from Einstein’s theories. In this article, we introduce a five-decision rule testing procedure, equivalent to running a traditional two-sided test in addition to two one-sided tests, which combines the advantages of the testing procedures of Kaiser (no assumption on a point hypothesis being impossible) and Jones and Tukey (higher power), allowing for a nonnegligible (typically 20%) reduction of the sample size needed to reach a given statistical power to get a significant result, compared to the traditional approach.  相似文献   

6.
Abstract

It is widely recognized by statisticians, though not as widely by other researchers, that the p-value cannot be interpreted in isolation, but rather must be considered in the context of certain features of the design and substantive application, such as sample size and meaningful effect size. I consider the setting of the normal mean and highlight the information contained in the p-value in conjunction with the sample size and meaningful effect size. The p-value and sample size jointly yield 95% confidence bounds for the effect of interest, which can be compared to the predetermined meaningful effect size to make inferences about the true effect. I provide simple examples to demonstrate that although the p-value is calculated under the null hypothesis, and thus seemingly may be divorced from the features of the study from which it arises, its interpretation as a measure of evidence requires its contextualization within the study. This implies that any proposal for improved use of the p-value as a measure of the strength of evidence cannot simply be a change to the threshold for significance.  相似文献   

7.
ABSTRACT

In response to growing concern about the reliability and reproducibility of published science, researchers have proposed adopting measures of “greater statistical stringency,” including suggestions to require larger sample sizes and to lower the highly criticized “p?<?0.05” significance threshold. While pros and cons are vigorously debated, there has been little to no modeling of how adopting these measures might affect what type of science is published. In this article, we develop a novel optimality model that, given current incentives to publish, predicts a researcher’s most rational use of resources in terms of the number of studies to undertake, the statistical power to devote to each study, and the desirable prestudy odds to pursue. We then develop a methodology that allows one to estimate the reliability of published research by considering a distribution of preferred research strategies. Using this approach, we investigate the merits of adopting measures of “greater statistical stringency” with the goal of informing the ongoing debate.  相似文献   

8.
ABSTRACT

Replication is complicated in psychological research because studies of a given psychological phenomenon can never be direct or exact replications of one another, and thus effect sizes vary from one study of the phenomenon to the next—an issue of clear importance for replication. Current large-scale replication projects represent an important step forward for assessing replicability, but provide only limited information because they have thus far been designed in a manner such that heterogeneity either cannot be assessed or is intended to be eliminated. Consequently, the nontrivial degree of heterogeneity found in these projects represents a lower bound on the true degree of heterogeneity. We recommend enriching large-scale replication projects going forward by embracing heterogeneity. We argue this is the key for assessing replicability: if effect sizes are sufficiently heterogeneous—even if the sign of the effect is consistent—the phenomenon in question does not seem particularly replicable and the theory underlying it seems poorly constructed and in need of enrichment. Uncovering why and revising theory in light of it will lead to improved theory that explains heterogeneity and increases replicability. Given this, large-scale replication projects can play an important role not only in assessing replicability but also in advancing theory.  相似文献   

9.
ABSTRACT

It is now widely accepted that the techniques of null hypothesis significance testing (NHST) are routinely misused and misinterpreted by researchers seeking insight from data. There is, however, no consensus on acceptable alternatives, leaving researchers with little choice but to continue using NHST, regardless of its failings. I examine the potential for the Analysis of Credibility (AnCred) to resolve this impasse. Using real-life examples, I assess the ability of AnCred to provide researchers with a simple but robust framework for assessing study findings that goes beyond the standard dichotomy of statistical significance/nonsignificance. By extracting more insight from standard summary statistics while offering more protection against inferential fallacies, AnCred may encourage researchers to move toward the post p?<?0.05 era.  相似文献   

10.
This article presents evidence that published results of scientific investigations are not a representative sample of results of all scientific studies. Research studies from 11 major journals demonstrate the existence of biases that favor studies that observe effects that, on statistical evaluation, have a low probability of erroneously rejecting the so-called null hypothesis (H 0). This practice makes the probability of erroneously rejecting H 0 different for the reader than for the investigator. It introduces two biases in the interpretation of the scientific literature: one due to multiple repetition of studies with false hypothesis, and one due to failure to publish smaller and less significant outcomes of tests of a true hypotheses. These practices distort the results of literature surveys and of meta-analyses. These results also indicate that practice leading to publication bias have not changed over a period of 30 years.  相似文献   

11.
The “What If” analysis is applicablein research and heuristic situations that utilize statistical significance testing. One utility for the “What If” is in a pedagogical perspective; the “What If” analysis provides professors an interactive tool that visually represents examples of what statistical significance testing entails and the variables that affect the commonly misinterpreted pCALCULATED value. In order to develop a strong understanding of what affects the pCALCULATED value, the students tangibly manipulate data within the Excel sheet to create a visualize representation that explicitly demonstrates how variables affect the pCALCULATED value. The second utility is primarily applicable to researchers. “What If” analysis contributes to research in two ways: (1) a “What If” analysis can be run a priori to estimate the sample size a researcher may wish to use for his study; and (2) a “What If” analysis can be run a posteriori to aid in the interpretation of results. If used, the “What If” analysis provides researchers with another utility that enables them to conduct high-quality research and disseminate their results in an accurate manner.  相似文献   

12.
ABSTRACT

As the debate over best statistical practices continues in academic journals, conferences, and the blogosphere, working researchers (e.g., psychologists) need to figure out how much time and effort to invest in attending to experts' arguments, how to design their next project, and how to craft a sustainable long-term strategy for data analysis and inference. The present special issue of The American Statistician promises help. In this article, we offer a modest proposal for a continued and informed use of the conventional p-value without the pitfalls of statistical rituals. Other statistical indices should complement reporting, and extra-statistical (e.g., theoretical) judgments ought to be made with care and clarity.  相似文献   

13.
The extensive research literature in many areas of behavioral, medical, and social sciences has led some reviewers to the use of quantitative methods for research synthesis. Typically, these analyses have involved estimating the effect magnitude from each of a series of studies and averaging the estimates to obtain a single index of effect magnitude. This article provides some statistical methods for estimating and testing linear models for effect magnitude and can be used for determining the effect of variations in experimental conditions. These methods can be used for synthesizing the results of studies where the index of effect magnitude is a correlation coefficient or standardized mean difference. A natural test for model specification is also given.  相似文献   

14.
对应分析统计检验体系探讨   总被引:4,自引:0,他引:4  
对应分析因其结果的易读性,近些年得到了越来越广泛的应用。为了更好地应用对应分析,提出建立对应分析统计检验体系,包括对应分析适用性的统计检验以及对应分析效果的检验,同时还提出应用对应分析时应注意的其它问题。  相似文献   

15.
ABSTRACT

In this article, we assess the 31 articles published in Basic and Applied Social Psychology (BASP) in 2016, which is one full year after the BASP editors banned the use of inferential statistics. We discuss how the authors collected their data, how they reported and summarized their data, and how they used their data to reach conclusions. We found multiple instances of authors overstating conclusions beyond what the data would support if statistical significance had been considered. Readers would be largely unable to recognize this because the necessary information to do so was not readily available.  相似文献   

16.
The first two stages in modelling times series are hypothesis testing and estimation. For long memory time series, the second stage was studied in the paper published in [M. Boutahar et al., Estimation methods of the long memory parameter: monte Carlo analysis and application, J. Appl. Statist. 34(3), pp. 261–301.] in which we have presented some estimation methods of the long memory parameter. The present paper is intended for the first stage, and hence completes the former, by exploring some tests for detecting long memory in time series. We consider two kinds of tests: the non-parametric class and the semi-parametric one. We precise the limiting distribution of the non-parametric tests under the null of short memory and we show that they are consistent against the alternative of long memory. We perform also some Monte Carlo simulations to analyse the size distortion and the power of all proposed tests. We conclude that for large sample size, the two classes are equivalent but for small sample size the non-parametric class is better than the semi-parametric one.  相似文献   

17.
DSTAT, Version 1.10: Available from Lawrence Erlbaum Associates, Inc., 10 Industrial Ave., Mahwah, NJ 07430-2262; phone: 800-926-6579

TRUE EPISTAT, Version 4.0: Available from Epistat Services, 2011 Cap Rock Circle, Richardson, TX 75080; phone: 214-680-1376; fax: 214-680-1303.

FAST*PRO, Version 1.0: Available from Academic Press, Inc., 955 Massachusetts Avenue, Cambridge, MA 02139; phone: 800-321-5068; fax: 800-336-7377.

Meta-analysts conduct studies in which the responses are analytic summary measurements, such as risk differences, effect sizes, p values, or z statistics, obtained from a series of independent studies. The motivation for conducting a meta-analysis is to integrate research findings over studies in order to summarize the evidence about treatment efficacy or risk factors. This article presents a comparative review of three meta-analytic software packages: DSTAT, TRUE EPISTAT, and FAST*PRO.  相似文献   

18.
数据筛选在大数据处理过程中处于至关重要的地位。如何运用合适的数据筛选算法从大量数据中筛选出有价值的数据是目前需要解决的重要问题之一。文章综合利用统计假设检验的方法设计了一种系统的实验组和对照组差异性的数据筛选算法,并利用MATLAB软件实现了该算法。最后将该算法应用于自闭症的基因表达谱数据(23520个基因),分别筛选出了实验组和对照组表达谱差异较大的244个基因作为自闭症相关的基因。通过基因注释,发现目前文献中已知的与自闭症相关的基因FIGF、MED13、NDRG4、POU3F2、USP8等在筛选的244个基因中,表明了该算法的有效性。  相似文献   

19.
A large-sample test for testing the equality of two effect sizes is presented. The null and non-null distributions of the proposed test statistic are derived. Further, the problem of estimating the effect size is considered when it is a priori suspected that two effect sizes may be close to each other. The combined data from all the samples leads to more efficient estimator of the effect size. We propose a basis for optimally combining estimation problems when there is uncertainty concerning the appropriate statistical model-estimator to use in representing the sampling process. The objective here is to produce natural adaptive estimators with some good statistical properties. In the context of two bivariate statistical models, the expressions for the asymptotic mean squared error of the proposed estimators are derived and compared with the parallel expressions for the benchmark estimators. We demonstrate that the suggested preliminary test estimator has superior asymptotic mean squared error performance relative to the benchmark and pooled estimators. A simulation study and application of the methodology to real data are presented.  相似文献   

20.
There are two distinct definitions of “P-value” for evaluating a proposed hypothesis or model for the process generating an observed dataset. The original definition starts with a measure of the divergence of the dataset from what was expected under the model, such as a sum of squares or a deviance statistic. A P-value is then the ordinal location of the measure in a reference distribution computed from the model and the data, and is treated as a unit-scaled index of compatibility between the data and the model. In the other definition, a P-value is a random variable on the unit interval whose realizations can be compared to a cutoff α to generate a decision rule with known error rates under the model and specific alternatives. It is commonly assumed that realizations of such decision P-values always correspond to divergence P-values. But this need not be so: Decision P-values can violate intuitive single-sample coherence criteria where divergence P-values do not. It is thus argued that divergence and decision P-values should be carefully distinguished in teaching, and that divergence P-values are the relevant choice when the analysis goal is to summarize evidence rather than implement a decision rule.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号