首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We consider the problem of comparing step-down and step-up multiple test procedures for testing n hypotheses when independent p-values or independent test statistics are available. The defining critical values of these procedures for independent test statistics are asymptotically equal, which yields a theoretical argument for the numerical observation that the step-up procedure is mostly more powerful than the step-down procedure. The main aim of this paper is to quantify the differences between the critical values more precisely. As a by-product we also obtain more information about the gain when we consider two subsequent steps of these procedures. Moreover, we investigate how liberal the step-up procedure becomes when the step-up critical values are replaced by their step-down counterparts or by more refined approximate values. The results for independent p-values are the basis for obtaining corresponding results when independent real-valued test statistics are at hand. It turns out that the differences of step-down and step-up critical values as well as the differences between subsequent steps tend to zero for many distributions, except for heavy-tailed distributions. The Cauchy distribution yields an example where the critical values of both procedures are nearly linearly increasing in n.  相似文献   

2.
In this paper we present a modification of the Benjamini and Hochberg false discovery rate controlling procedure for testing non-positive dependent test statistics. The new testing procedure makes use of the same series of linearly increasing critical values. Yet, in the new procedure the set of p-values is divided into subsets of positively dependent p-values, and each subset of p-values is separately sorted and compared to the series of critical values. In the first part of the paper we introduce the new testing methodology, discuss the technical issues needed to apply the new approach, and apply it to data from a genetic experiment.  相似文献   

3.
This paper considers the problem of testing equality between two independent binomial proportions. Hwang and Yang (Statist. Sinica 11 (2001) 807) apply the Neyman–Pearson fundamental lemma and the estimated truth approach to derive optimal procedures, named expected p-values. This p-value has been shown to be identical to the mid p-value in Lancaster (J. Amer. Statist. Assoc. (1961) 223) for the one-sided test. For the two-sided test, the paper proves the usual two-sided mid p-value is identical to the expected p-value in the balanced sample case.  相似文献   

4.
The discussion on the use and misuse of p-values in 2016 by the American Statistician Association was a timely assertion that statistical concept should be properly used in science. Some researchers, especially the economists, who adopt significance testing and p-values to report their results, may felt confused by the statement, leading to misinterpretations of the statement. In this study, we aim to re-examine the accuracy of the p-value and introduce an alternative way for testing the hypothesis. We conduct a simulation study to investigate the reliability of the p-value. Apart from investigating the performance of p-value, we also introduce some existing approaches, Minimum Bayes Factors and Belief functions, for replacing p-value. Results from the simulation study confirm unreliable p-value in some cases and that our proposed approaches seem to be useful as the substituted tool in the statistical inference. Moreover, our results show that the plausibility approach is more accurate for making decisions about the null hypothesis than the traditionally used p-values when the null hypothesis is true. However, the MBFs of Edwards et al. [Bayesian statistical inference for psychological research. Psychol. Rev. 70(3) (1963), pp. 193–242]; Vovk [A logic of probability, with application to the foundations of statistics. J. Royal Statistical Soc. Series B (Methodological) 55 (1993), pp. 317–351] and Sellke et al. [Calibration of p values for testing precise null hypotheses. Am. Stat. 55(1) (2001), pp. 62–71] provide more reliable results compared to all other methods when the null hypothesis is false.KEYWORDS: Ban of P-value, Minimum Bayes Factors, belief functions  相似文献   

5.
ABSTRACT

This article has two objectives. The first and narrower is to formalize the p-value function, which records all possible p-values, each corresponding to a value for whatever the scalar parameter of interest is for the problem at hand, and to show how this p-value function directly provides full inference information for any corresponding user or scientist. The p-value function provides familiar inference objects: significance levels, confidence intervals, critical values for fixed-level tests, and the power function at all values of the parameter of interest. It thus gives an immediate accurate and visual summary of inference information for the parameter of interest. We show that the p-value function of the key scalar interest parameter records the statistical position of the observed data relative to that parameter, and we then describe an accurate approximation to that p-value function which is readily constructed.  相似文献   

6.
Abstract

In statistical hypothesis testing, a p-value is expected to be distributed as the uniform distribution on the interval (0, 1) under the null hypothesis. However, some p-values, such as the generalized p-value and the posterior predictive p-value, cannot be assured of this property. In this paper, we propose an adaptive p-value calibration approach, and show that the calibrated p-value is asymptotically distributed as the uniform distribution. For Behrens–Fisher problem and goodness-of-fit test under a normal model, the calibrated p-values are constructed and their behavior is evaluated numerically. Simulations show that the calibrated p-values are superior than original ones.  相似文献   

7.
Multiple hypothesis testing literature has recently experienced a growing development with particular attention to the control of the false discovery rate (FDR) based on p-values. While these are not the only methods to deal with multiplicity, inference with small samples and large sets of hypotheses depends on the specific choice of the p-value used to control the FDR in the presence of nuisance parameters. In this paper we propose to use the partial posterior predictive p-value [Bayarri, M.J., Berger, J.O., 2000. p-values for composite null models. J. Amer. Statist. Assoc. 95, 1127–1142] that overcomes this difficulty. This choice is motivated by theoretical considerations and examples. Finally, an application to a controlled microarray experiment is presented.  相似文献   

8.
In the framework of null hypothesis significance testing for functional data, we propose a procedure able to select intervals of the domain imputable for the rejection of a null hypothesis. An unadjusted p-value function and an adjusted one are the output of the procedure, namely interval-wise testing. Depending on the sort and level α of type-I error control, significant intervals can be selected by thresholding the two p-value functions at level α. We prove that the unadjusted (adjusted) p-value function point-wise (interval-wise) controls the probability of type-I error and it is point-wise (interval-wise) consistent. To enlighten the gain in terms of interpretation of the phenomenon under study, we applied the interval-wise testing to the analysis of a benchmark functional data set, i.e. Canadian daily temperatures. The new procedure provides insights that current state-of-the-art procedures do not, supporting similar advantages in the analysis of functional data with less prior knowledge.  相似文献   

9.
ABSTRACT

Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directly answer this question and are often misinterpreted in ways that lead to overstating the evidence against the null hypothesis. Even in the “post p?<?0.05 era,” however, it is quite possible that p-values will continue to be widely reported and used to assess the strength of evidence (if for no other reason than the widespread availability and use of statistical software that routinely produces p-values and thereby implicitly advocates for their use). If so, the potential for misinterpretation will persist. In this article, we recommend three practices that would help researchers more accurately interpret p-values. Each of the three recommended practices involves interpreting p-values in light of their corresponding “Bayes factor bound,” which is the largest odds in favor of the alternative hypothesis relative to the null hypothesis that is consistent with the observed data. The Bayes factor bound generally indicates that a given p-value provides weaker evidence against the null hypothesis than typically assumed. We therefore believe that our recommendations can guard against some of the most harmful p-value misinterpretations. In research communities that are deeply attached to reliance on “p?<?0.05,” our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA statement on statistical significance and p-values.  相似文献   

10.
While it is often argued that a p-value is a probability; see Wasserstein and Lazar, we argue that a p-value is not defined as a probability. A p-value is a bijection of the sufficient statistic for a given test which maps to the same scale as the Type I error probability. As such, the use of p-values in a test should be no more a source of controversy than the use of a sufficient statistic. It is demonstrated that there is, in fact, no ambiguity about what a p-value is, contrary to what has been claimed in recent public debates in the applied statistics community. We give a simple example to illustrate that rejecting the use of p-values in testing for a normal mean parameter is conceptually no different from rejecting the use of a sample mean. The p-value is innocent; the problem arises from its misuse and misinterpretation. The way that p-values have been informally defined and interpreted appears to have led to tremendous confusion and controversy regarding their place in statistical analysis.  相似文献   

11.
In this paper, we translate variable selection for linear regression into multiple testing, and select significant variables according to testing result. New variable selection procedures are proposed based on the optimal discovery procedure (ODP) in multiple testing. Due to ODP’s optimality, if we guarantee the number of significant variables included, it will include less non significant variables than marginal p-value based methods. Consistency of our procedures is obtained in theory and simulation. Simulation results suggest that procedures based on multiple testing have improvement over procedures based on selection criteria, and our new procedures have better performance than marginal p-value based procedures.  相似文献   

12.
Combining p-values from statistical tests across different studies is the most commonly used approach in meta-analysis for evolutionary biology. The most commonly used p-value combination methods mainly incorporate the z-transform tests (e.g., the un-weighted z-test and the weighted z-test) and the gamma-transform tests (e.g., the CZ method [Z. Chen, W. Yang, Q. Liu, J.Y. Yang, J. Li, and M.Q. Yang, A new statistical approach to combining p-values using gamma distribution and its application to genomewide association study, Bioinformatics 15 (2014), p. S3]). However, among these existing p-value combination methods, no method is uniformly most powerful in all situations [Chen et al. 2014]. In this paper, we propose a meta-analysis method based on the gamma distribution, MAGD, by pooling the p-values from independent studies. The newly proposed test, MAGD, allows for flexible accommodating of the different levels of heterogeneity of effect sizes across individual studies. The MAGD simultaneously retains all the characters of the z-transform tests and the gamma-transform tests. We also propose an easy-to-implement resampling approach for estimating the empirical p-values of MAGD for the finite sample size. Simulation studies and two data applications show that the proposed method MAGD is essentially as powerful as the z-transform tests (the gamma-transform tests) under the circumstance with the homogeneous (heterogeneous) effect sizes across studies.  相似文献   

13.
This paper reviews global and multiple tests for the combination ofn hypotheses using the orderedp-values of then individual tests. In 1987, Röhmel and Streitberg presented a general method to construct global level α tests based on orderedp-values when there exists no prior knowledge regarding the joint distribution of the corresponding test statistics. In the case of independent test statistics, construction of global tests is available by means of recursive formulae presented by Bicher (1989), Kornatz (1994) and Finner and Roters (1994). Multiple test procedures can be developed by applying the closed test principle using these global tests as building blocks. Liu (1996) proposed representing closed tests by means of “critical matrices” which contain the critical values of the global tests. Within the framework of these theoretical concepts, well-known global tests and multiple test procedures are classified and the relationships between the different tests are characterised.  相似文献   

14.
ABSTRACT

Longstanding concerns with the role and interpretation of p-values in statistical practice prompted the American Statistical Association (ASA) to make a statement on p-values. The ASA statement spurred a flurry of responses and discussions by statisticians, with many wondering about the steps necessary to expand the adoption of these principles. Introductory statistics classrooms are key locations to introduce and emphasize the nuance related to p-values; in part because they engrain appropriate analysis choices at the earliest stages of statistics education, and also because they reach the broadest group of students. We propose a framework for statistics departments to conduct a content audit for p-value principles in their introductory curriculum. We then discuss the process and results from applying this course audit framework within our own statistics department. We also recommend meeting with client departments as a complement to the course audit. Discussions about analyses and practices common to particular fields can help to evaluate if our service courses are meeting the needs of client departments and to identify what is needed in our introductory courses to combat the misunderstanding and future misuse of p-values.  相似文献   

15.
A p-value is developed for testing the equivalence of the variances of a bivariate normal distribution. The unknown correlation coefficient is a nuisance parameter in the problem. If the correlation is known, the proposed p-value provides an exact test. For large samples, the p-value can be computed by replacing the unknown correlation by the sample correlation, and the resulting test is quite satisfactory. For small samples, it is proposed to compute the p-value by replacing the unknown correlation by a scalar multiple of the sample correlation. However, a single scalar is not satisfactory, and it is proposed to use different scalars depending on the magnitude of the sample correlation coefficient. In order to implement this approach, tables are obtained providing sub-intervals for the sample correlation coefficient, and the scalars to be used if the sample correlation coefficient belongs to a particular sub-interval. Once such tables are available, the proposed p-value is quite easy to compute since it has an explicit analytic expression. Numerical results on the type I error probability and power are reported on the performance of such a test, and the proposed p-value test is also compared to another test based on a rejection region. The results are illustrated with two examples: an example dealing with the comparability of two measuring devices, and an example dealing with the assessment of bioequivalence.  相似文献   

16.
We are considered with the problem of m simultaneous statistical test problems with composite null hypotheses. Usually, marginal p-values are computed under least favorable parameter configurations (LFCs), thus being over-conservative under non-LFCs. Our proposed randomized p-value leads to a tighter exhaustion of the marginal (local) significance level. In turn, it is stochastically larger than the LFC-based p-value under alternatives. While these distributional properties are typically nonsensical for m  =1, the exhaustion of the local significance level is extremely helpful for cases with m>1m>1 in connection with data-adaptive multiple tests as we will demonstrate by considering multiple one-sided tests for Gaussian means.  相似文献   

17.
P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels 0.05, 0.01, and 0.001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates.  相似文献   

18.
This article considers the problem of testing marginal homogeneity in a 2 × 2 contingency table. We first review some well-known conditional and unconditional p-values appeared in the statistical literature. Then we treat the p-value as the test statistic and use the unconditional approach to obtain the modified p-value, which is shown to be valid. For a given nominal level, the rejection region of the modified p-value test contains that of the original p-value test. Some nice properties of the modified p-value are given. Especially, under mild conditions the rejection region of the modified p-value test is shown to be the Barnard convex set as described by Barnard (1947 Barnard , G. A. ( 1947 ). Significance tests for 2 × 2 tables . Biometrika 34 : 123138 .[Crossref], [PubMed], [Web of Science ®] [Google Scholar]). If the one-sided null hypothesis has two nuisance parameters, we show that this result can reduce the dimension of the nuisance parameter space from two to one for computing modified p-values and sizes of tests. Numerical studies including an illustrative example are given. Numerical comparisons show that the sizes of the modified p-value tests are closer to a nominal level than those of the original p-value tests for many cases, especially in the case of small to moderate sample sizes.  相似文献   

19.
This article considers multiple hypotheses testing with the generalized familywise error rate k-FWER control, which is the probability of at least k false rejections. We first assume the p-values corresponding to the true null hypotheses are independent, and propose adaptive generalized Bonferroni procedure with k-FWER control based on the estimation of the number of true null hypotheses. Then, we assume the p-values are dependent, satisfying block dependence, and propose adaptive procedure with k-FWER control. Extensive simulations compare the performance of the adaptive procedures with different estimators.  相似文献   

20.
Abstract

The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent misinterpretation of P-values as error probabilities. It then discusses several properties of P-values that have been presented as fatal flaws: That P-values exhibit extreme variation across samples (and thus are “unreliable”), confound effect size with sample size, are sensitive to sample size, and depend on investigator sampling intentions. These properties are often criticized from a likelihood or Bayesian framework, yet they are exactly the properties P-values should exhibit when they are constructed and interpreted correctly within their originating framework. Other common criticisms are that P-values force users to focus on irrelevant hypotheses and overstate evidence against those hypotheses. These problems are not however properties of P-values but are faults of researchers who focus on null hypotheses and overstate evidence based on misperceptions that p?=?0.05 represents enough evidence to reject hypotheses. Those problems are easily seen without use of Bayesian concepts by translating the observed P-value p into the Shannon information (S-value or surprisal) –log2(p).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号