首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
Conventional clinical trial design involves considerations of power, and sample size is typically chosen to achieve a desired power conditional on a specified treatment effect. In practice, there is considerable uncertainty about what the true underlying treatment effect may be, and so power does not give a good indication of the probability that the trial will demonstrate a positive outcome. Assurance is the unconditional probability that the trial will yield a ‘positive outcome’. A positive outcome usually means a statistically significant result, according to some standard frequentist significance test. The assurance is then the prior expectation of the power, averaged over the prior distribution for the unknown true treatment effect. We argue that assurance is an important measure of the practical utility of a proposed trial, and indeed that it will often be appropriate to choose the size of the sample (and perhaps other aspects of the design) to achieve a desired assurance, rather than to achieve a desired power conditional on an assumed treatment effect. We extend the theory of assurance to two‐sided testing and equivalence trials. We also show that assurance is straightforward to compute in some simple problems of normal, binary and gamma distributed data, and that the method is not restricted to simple conjugate prior distributions for parameters. Several illustrations are given. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

2.
The independence assumption in statistical significance testing becomes increasingly crucial and unforgiving as sample size increases. Seemingly, inconsequential violations of this assumption can substantially increase the probability of a Type I error if sample sizes are large. In the case of Student's t test, it is found that correlations within samples in a range from 0.01 to 0.05 can lead to rejection of a true null hypothesis with high probability, if N is 50, 100 or larger.  相似文献   

3.
ABSTRACT

When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α?=?0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information.  相似文献   

4.
This article presents evidence that published results of scientific investigations are not a representative sample of results of all scientific studies. Research studies from 11 major journals demonstrate the existence of biases that favor studies that observe effects that, on statistical evaluation, have a low probability of erroneously rejecting the so-called null hypothesis (H 0). This practice makes the probability of erroneously rejecting H 0 different for the reader than for the investigator. It introduces two biases in the interpretation of the scientific literature: one due to multiple repetition of studies with false hypothesis, and one due to failure to publish smaller and less significant outcomes of tests of a true hypotheses. These practices distort the results of literature surveys and of meta-analyses. These results also indicate that practice leading to publication bias have not changed over a period of 30 years.  相似文献   

5.
ABSTRACT

A statistical test can be seen as a procedure to produce a decision based on observed data, where some decisions consist of rejecting a hypothesis (yielding a significant result) and some do not, and where one controls the probability to make a wrong rejection at some prespecified significance level. Whereas traditional hypothesis testing involves only two possible decisions (to reject or not a null hypothesis), Kaiser’s directional two-sided test as well as the more recently introduced testing procedure of Jones and Tukey, each equivalent to running two one-sided tests, involve three possible decisions to infer the value of a unidimensional parameter. The latter procedure assumes that a point null hypothesis is impossible (e.g., that two treatments cannot have exactly the same effect), allowing a gain of statistical power. There are, however, situations where a point hypothesis is indeed plausible, for example, when considering hypotheses derived from Einstein’s theories. In this article, we introduce a five-decision rule testing procedure, equivalent to running a traditional two-sided test in addition to two one-sided tests, which combines the advantages of the testing procedures of Kaiser (no assumption on a point hypothesis being impossible) and Jones and Tukey (higher power), allowing for a nonnegligible (typically 20%) reduction of the sample size needed to reach a given statistical power to get a significant result, compared to the traditional approach.  相似文献   

6.
ABSTRACT

This article argues that researchers do not need to completely abandon the p-value, the best-known significance index, but should instead stop using significance levels that do not depend on sample sizes. A testing procedure is developed using a mixture of frequentist and Bayesian tools, with a significance level that is a function of sample size, obtained from a generalized form of the Neyman–Pearson Lemma that minimizes a linear combination of α, the probability of rejecting a true null hypothesis, and β, the probability of failing to reject a false null, instead of fixing α and minimizing β. The resulting hypothesis tests do not violate the Likelihood Principle and do not require any constraints on the dimensionalities of the sample space and parameter space. The procedure includes an ordering of the entire sample space and uses predictive probability (density) functions, allowing for testing of both simple and compound hypotheses. Accessible examples are presented to highlight specific characteristics of the new tests.  相似文献   

7.
It is challenging to estimate the statistical power when a complicated testing strategy is used to adjust for the type-I error for multiple comparisons in a clinical trial. In this paper, we use the Bonferroni Inequality to estimate the lower bound of the statistical power assuming that test statistics are approximately normally distributed and the correlation structure among test statistics is unknown or only partially known. The method was applied to the design of a clinical study for sample size and statistical power estimation.  相似文献   

8.
We propose a new goodness-of-fit test for normal and lognormal distributions with unknown parameters and type-II censored data. This test is a generalization of Michael's test for censored samples, which is based on the empirical distribution and a variance stabilizing transformation. We estimate the parameters of the model by using maximum likelihood and Gupta's methods. The quantiles of the distribution of the test statistic under the null hypothesis are obtained through Monte Carlo simulations. The power of the proposed test is estimated and compared to that of the Kolmogorov–Smirnov test also using simulations. The new test is more powerful than the Kolmogorov–Smirnov test in most of the studied cases. Acceptance regions for the PP, QQ and Michael's stabilized probability plots are derived, making it possible to visualize which data contribute to the decision of rejecting the null hypothesis. Finally, an illustrative example is presented.  相似文献   

9.
The Benjamini–Hochberg procedure is widely used in multiple comparisons. Previous power results for this procedure have been based on simulations. This article produces theoretical expressions for expected power. To derive them, we make assumptions about the number of hypotheses being tested, which null hypotheses are true, which are false, and the distributions of the test statistics under each null and alternative. We use these assumptions to derive bounds for multiple dimensional rejection regions. With these bounds and a permanent based representation of the joint density function of the largest p-values, we use the law of total probability to derive the distribution of the total number of rejections. We derive the joint distribution of the total number of rejections and the number of rejections when the null hypothesis is true. We give an analytic expression for the expected power for a false discovery rate procedure that assumes the hypotheses are independent.  相似文献   

10.
In many engineering problems it is necessary to draw statistical inferences on the mean of a lognormal distribution based on a complete sample of observations. Statistical demonstration of mean time to repair (MTTR) is one example. Although optimum confidence intervals and hypothesis tests for the lognormal mean have been developed, they are difficult to use, requiring extensive tables and/or a computer. In this paper, simplified conservative methods for calculating confidence intervals or hypothesis tests for the lognormal mean are presented. In this paper, “conservative” refers to confidence intervals (hypothesis tests) whose infimum coverage probability (supremum probability of rejecting the null hypothesis taken over parameter values under the null hypothesis) equals the nominal level. The term “conservative” has obvious implications to confidence intervals (they are “wider” in some sense than their optimum or exact counterparts). Applying the term “conservative” to hypothesis tests should not be confusing if it is remembered that this implies that their equivalent confidence intervals are conservative. No implication of optimality is intended for these conservative procedures. It is emphasized that these are direct statistical inference methods for the lognormal mean, as opposed to the already well-known methods for the parameters of the underlying normal distribution. The method currently employed in MIL-STD-471A for statistical demonstration of MTTR is analyzed and compared to the new method in terms of asymptotic relative efficiency. The new methods are also compared to the optimum methods derived by Land (1971, 1973).  相似文献   

11.
Sander Greenland argues that reported results of hypothesis tests should include the surprisal, the base-2 logarithm of the reciprocal of a p-value. The surprisal measures how many bits of evidence in the data warrant rejecting the null hypothesis. A generalization of surprisal also can measure how much the evidence justifies rejecting a composite hypothesis such as the complement of a confidence interval. That extended surprisal, called surprise, quantifies how many bits of astonishment an agent believing a hypothesis would experience upon observing the data. While surprisal is a function of a point in hypothesis space, surprise is a function of a subset of hypothesis space. Satisfying the conditions of conditional min-plus probability, surprise inherits a wealth of tools from possibility theory. The equivalent compatibility function has been recently applied to the replication crisis, to adjusting p-values for prior information, and to comparing scientific theories.  相似文献   

12.
In this article, we propose a new goodness-of-fit test for Type I or Type II censored samples from a completely specified distribution. This test is a generalization of Michael's test for censored data, which is based on the empirical distribution and a variance stabilizing transformation. Using Monte Carlo methods, the distributions of the test statistics are analyzed under the null hypothesis. Tables of quantiles of these statistics are also provided. The power of the proposed test is studied and compared to that of other well-known tests also using simulation. The proposed test is more powerful in most of the considered cases. Acceptance regions for the PP, QQ, and Michael's stabilized probability plots are derived, which enable one to visualize which data contribute to the decision of rejecting the null hypothesis. Finally, an application in quality control is presented as illustration.  相似文献   

13.
Kh. Fazli 《Statistics》2013,47(5):407-428
We observe a realization of an inhomogeneous Poisson process whose intensity function depends on an unknown multidimensional parameter. We consider the asymptotic behaviour of the Rao score test for a simple null hypothesis against the multilateral alternative. By using the Edgeworth type expansion (under the null hypothesis) for a vector of stochastic integrals with respect to the Poisson process, we refine the (classic) threshold of the test (obtained by the central limit theorem), which improves the first type probability of error. The expansion allows us to describe the power of the test under the local alternative, i.e. a sequence of alternatives, which converge to the null hypothesis with a certain rate. The rates can be different for components of the parameter.  相似文献   

14.
Bayesian predictive power, the expectation of the power function with respect to a prior distribution for the true underlying effect size, is routinely used in drug development to quantify the probability of success of a clinical trial. Choosing the prior is crucial for the properties and interpretability of Bayesian predictive power. We review recommendations on the choice of prior for Bayesian predictive power and explore its features as a function of the prior. The density of power values induced by a given prior is derived analytically and its shape characterized. We find that for a typical clinical trial scenario, this density has a u‐shape very similar, but not equal, to a β‐distribution. Alternative priors are discussed, and practical recommendations to assess the sensitivity of Bayesian predictive power to its input parameters are provided. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

15.
Usual tests for trends stand under null hypothesis. This article presents a test of non null hypothesis for linear trends in proportions. A weighted least squares method is used to estimate the regression coefficient of proportions. A non null hypothesis is defined as its expectation equal to a prescribed regression coefficient margin. Its variance is used to construct an equation of basic relationship for linear trends in proportions along the asymptotic normal method. Then follow derivations for the sample size formula, the power function, and the test statistic. The expected power is obtained from the power function and the observed power is exhibited by Monte Carlo method. It reduces to the classical test for linear trends in proportions on setting the margin equal to zero. The agreement between the expected and the observed power is excellent. It is the non null hypothesis test matched with the classical test and can be applied to assess the clinical significance of trends among several proportions. By contrast, the classical test is restricted in testing the statistical significance. A set of data from a website is used to illustrate the methodology.  相似文献   

16.
The Benjamini-Hochberg procedure is widely used in multiple comparisons. Previous power results for this procedure have been based on simulations. This article produces theoretical expressions for expected power. To derive them, we make assumptions about the number of hypotheses being tested, which null hypotheses are true, which are false, and the distributions of the test statistics under each null and alternative. We use these assumptions to derive bounds for multiple dimensional rejection regions. With these bounds and a permanent based representation of the joint density function of the largest p-values, we use the law of total probability to derive the distribution of the total number of rejections. We derive the joint distribution of the total number of rejections and the number of rejections when the null hypothesis is true. We give an analytic expression for the expected power for a false discovery rate procedure that assumes the hypotheses are independent.  相似文献   

17.
In drug development, after completion of phase II proof‐of‐concept trials, the sponsor needs to make a go/no‐go decision to start expensive phase III trials. The probability of statistical success (PoSS) of the phase III trials based on data from earlier studies is an important factor in that decision‐making process. Instead of statistical power, the predictive power of a phase III trial, which takes into account the uncertainty in the estimation of treatment effect from earlier studies, has been proposed to evaluate the PoSS of a single trial. However, regulatory authorities generally require statistical significance in two (or more) trials for marketing licensure. We show that the predictive statistics of two future trials are statistically correlated through use of the common observed data from earlier studies. Thus, the joint predictive power should not be evaluated as a simplistic product of the predictive powers of the individual trials. We develop the relevant formulae for the appropriate evaluation of the joint predictive power and provide numerical examples. Our methodology is further extended to the more complex phase III development scenario comprising more than two (K > 2) trials, that is, the evaluation of the PoSS of at least k0 () trials from a program of K total trials. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
This article is concerned with the comparison of P-value and Bayesian measure in point null hypothesis for the variance of Normal distribution with unknown mean. First, using fixed prior for test parameter, the posterior probability is obtained and compared with the P-value when an appropriate prior is used for the mean parameter. In the second, lower bounds of the posterior probability of H0 under a reasonable class of prior are compared with the P-value. It has been shown that even in the presence of nuisance parameters, these two approaches can lead to different results in the statistical inference.  相似文献   

19.
A common approach to analysing clinical trials with multiple outcomes is to control the probability for the trial as a whole of making at least one incorrect positive finding under any configuration of true and false null hypotheses. Popular approaches are to use Bonferroni corrections or structured approaches such as, for example, closed-test procedures. As is well known, such strategies, which control the family-wise error rate, typically reduce the type I error for some or all the tests of the various null hypotheses to below the nominal level. In consequence, there is generally a loss of power for individual tests. What is less well appreciated, perhaps, is that depending on approach and circumstances, the test-wise loss of power does not necessarily lead to a family wise loss of power. In fact, it may be possible to increase the overall power of a trial by carrying out tests on multiple outcomes without increasing the probability of making at least one type I error when all null hypotheses are true. We examine two types of problems to illustrate this. Unstructured testing problems arise typically (but not exclusively) when many outcomes are being measured. We consider the case of more than two hypotheses when a Bonferroni approach is being applied while for illustration we assume compound symmetry to hold for the correlation of all variables. Using the device of a latent variable it is easy to show that power is not reduced as the number of variables tested increases, provided that the common correlation coefficient is not too high (say less than 0.75). Afterwards, we will consider structured testing problems. Here, multiplicity problems arising from the comparison of more than two treatments, as opposed to more than one measurement, are typical. We conduct a numerical study and conclude again that power is not reduced as the number of tested variables increases.  相似文献   

20.
The normalized maximum likelihood (NML) is a recent penalized likelihood that has properties that justify defining the amount of discrimination information (DI) in the data supporting an alternative hypothesis over a null hypothesis as the logarithm of an NML ratio, namely, the alternative hypothesis NML divided by the null hypothesis NML. The resulting DI, like the Bayes factor but unlike the P‐value, measures the strength of evidence for an alternative hypothesis over a null hypothesis such that the probability of misleading evidence vanishes asymptotically under weak regularity conditions and such that evidence can support a simple null hypothesis. Instead of requiring a prior distribution, the DI satisfies a worst‐case minimax prediction criterion. Replacing a (possibly pseudo‐) likelihood function with its weighted counterpart extends the scope of the DI to models for which the unweighted NML is undefined. The likelihood weights leverage side information, either in data associated with comparisons other than the comparison at hand or in the parameter value of a simple null hypothesis. Two case studies, one involving multiple populations and the other involving multiple biological features, indicate that the DI is robust to the type of side information used when that information is assigned the weight of a single observation. Such robustness suggests that very little adjustment for multiple comparisons is warranted if the sample size is at least moderate. The Canadian Journal of Statistics 39: 610–631; 2011. © 2011 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号