首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Phase II clinical trials investigate whether a new drug or treatment has sufficient evidence of effectiveness against the disease under study. Two-stage designs are popular for phase II since they can stop in the first stage if the drug is ineffective. Investigators often face difficulties in determining the target response rates, and adaptive designs can help to set the target response rate tested in the second stage based on the number of responses observed in the first stage. Popular adaptive designs consider two alternate response rates, and they generally minimise the expected sample size at the maximum uninterested response rate. Moreover, these designs consider only futility as the reason for early stopping and have high expected sample sizes if the provided drug is effective. Motivated by this problem, we propose an adaptive design that enables us to terminate the single-arm trial at the first stage for efficacy and conclude which alternate response rate to choose. Comparing the proposed design with a popular adaptive design from literature reveals that the expected sample size decreases notably if any of the two target response rates are correct. In contrast, the expected sample size remains almost the same under the null hypothesis.  相似文献   

2.
Randomised controlled trials are considered the gold standard in trial design. However, phase II oncology trials with a binary outcome are often single-arm. Although a number of reasons exist for choosing a single-arm trial, the primary reason is that single-arm designs require fewer participants than their randomised equivalents. Therefore, the development of novel methodology that makes randomised designs more efficient is of value to the trials community. This article introduces a randomised two-arm binary outcome trial design that includes stochastic curtailment (SC), allowing for the possibility of stopping a trial before the final conclusions are known with certainty. In addition to SC, the proposed design involves the use of a randomised block design, which allows investigators to control the number of interim analyses. This approach is compared with existing designs that also use early stopping, through the use of a loss function comprised of a weighted sum of design characteristics. Comparisons are also made using an example from a real trial. The comparisons show that for many possible loss functions, the proposed design is superior to existing designs. Further, the proposed design may be more practical, by allowing a flexible number of interim analyses. One existing design produces superior design realisations when the anticipated response rate is low. However, when using this design, the probability of rejecting the null hypothesis is sensitive to misspecification of the null response rate. Therefore, when considering randomised designs in phase II, we recommend the proposed approach be preferred over other sequential designs.  相似文献   

3.
We consider testing of the significance of the coefficients in the linear model. Unlike in the classical approach, there is no alternative hypothesis to accept when the null hypothesis is rejected. When there is a substantial deviation from the null hypothesis, we reject the null hypothesis and identify based on data alternative hypotheses associated with the independent variables or the levels that contributed most towards the deviation from the null hypothesis.  相似文献   

4.
Statistical models are often based on normal distributions and procedures for testing this distributional assumption are needed. Many goodness-of-fit tests suffer from the presence of outliers, in the sense that they may reject the null hypothesis even in the case of a single extreme observation. We show a possible extension of the Shapiro-Wilk test that is not affected by such a problem. The presented method is inspired by the forward search (FS), a new, recently proposed, diagnostic tool. An application to univariate observations shows how the procedure is able to capture the structure of the data, even in the presence of outliers. Other properties are also investigated.  相似文献   

5.
There are no exact fixed-level tests for testing the null hypothesis that the difference of two exponential means is less than or equal to a prespecified value θ0. For this testing problem, there are several approximate testing procedures available in the literature. Using an extended definition of p-values, Tsui and Weerahandi (1989) gave an exact significance test for this testing problem. In this paper, the performance of that procedure is investigated and is compared with approximate procedures. A size and power comparison is carried out using a simulation study. Its findings show that the test based on the generalized p-value guarantees the intended size and that it is either as good as or outperforms approximate procedures available in the literature, both in power and in size.  相似文献   

6.
Simultaneously testing a family of n null hypotheses can arise in many applications. A common problem in multiple hypothesis testing is to control Type-I error. The probability of at least one false rejection referred to as the familywise error rate (FWER) is one of the earliest error rate measures. Many FWER-controlling procedures have been proposed. The ability to control the FWER and achieve higher power is often used to evaluate the performance of a controlling procedure. However, when testing multiple hypotheses, FWER and power are not sufficient for evaluating controlling procedure’s performance. Furthermore, the performance of a controlling procedure is also governed by experimental parameters such as the number of hypotheses, sample size, the number of true null hypotheses and data structure. This paper evaluates, under various experimental settings, the performance of some FWER-controlling procedures in terms of five indices, the FWER, the false discovery rate, the false non-discovery rate, the sensitivity and the specificity. The results can provide guidance on how to select an appropriate FWER-controlling procedure to meet a study’s objective.  相似文献   

7.
ABSTRACT

When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α?=?0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information.  相似文献   

8.
The mid-p-value is the standard p-value for a test minus half the difference between it and the nearest lower possible value. Its smaller size lends it an obvious appeal to users — it provides a more significant-looking summary of the evidence against the null hypothesis. This paper examines the possibility that the user might overstate the significance of the evidence by using the smaller mid-p in place of the standard p-value. Routine use of the mid-p is shown to control a quantity related to the Type I error rate. This related quantity is appropriate to consider when the decision to accept or reject the null hypothesis is not always firm. The natural, subjective interpretation of a p-value as the probability that the null hypothesis is true is also examined. The usual asymptotic correspondence between these two probabilities for one-sided hypotheses is shown to be strengthened when the standard p-value is replaced by the mid-p.  相似文献   

9.
This article considers a simple test for the correct specification of linear spatial autoregressive models, assuming that the choice of the weight matrix Wn is true. We derive the limiting distributions of the test under the null hypothesis of correct specification and a sequence of local alternatives. We show that the test is free of nuisance parameters asymptotically under the null and prove the consistency of our test. To improve the finite sample performance of our test, we also propose a residual-based wild bootstrap and justify its asymptotic validity. We conduct a small set of Monte Carlo simulations to investigate the finite sample properties of our tests. Finally, we apply the test to two empirical datasets: the vote cast and the economic growth rate. We reject the linear spatial autoregressive model in the vote cast example but fail to reject it in the economic growth rate example. Supplementary materials for this article are available online.  相似文献   

10.
Sequential designs can be used to save computation time in implementing Monte Carlo hypothesis tests. The motivation is to stop resampling if the early resamples provide enough information on the significance of the p-value of the original Monte Carlo test. In this paper, we consider a sequential design called the B-value design proposed by Lan and Wittes and construct the sequential design bounding the resampling risk, the probability that the accept/reject decision is different from the decision from complete enumeration. For the B-value design whose exact implementation can be done by using the algorithm proposed in Fay, Kim and Hachey, we first compare the expected resample size for different designs with comparable resampling risk. We show that the B-value design has considerable savings in expected resample size compared to a fixed resample or simple curtailed design, and comparable expected resample size to the iterative push out design of Fay and Follmann. The B-value design is more practical than the iterative push out design in that it is tractable even for small values of resampling risk, which was a challenge with the iterative push out design. We also propose an approximate B-value design that can be constructed without using a specially developed software and provides analytic insights on the choice of parameter values in constructing the exact B-value design.  相似文献   

11.
Several researchers have proposed solutions to control type I error rate in sequential designs. The use of Bayesian sequential design becomes more common; however, these designs are subject to inflation of the type I error rate. We propose a Bayesian sequential design for binary outcome using an alpha‐spending function to control the overall type I error rate. Algorithms are presented for calculating critical values and power for the proposed designs. We also propose a new stopping rule for futility. Sensitivity analysis is implemented for assessing the effects of varying the parameters of the prior distribution and maximum total sample size on critical values. Alpha‐spending functions are compared using power and actual sample size through simulations. Further simulations show that, when total sample size is fixed, the proposed design has greater power than the traditional Bayesian sequential design, which sets equal stopping bounds at all interim analyses. We also find that the proposed design with the new stopping for futility rule results in greater power and can stop earlier with a smaller actual sample size, compared with the traditional stopping rule for futility when all other conditions are held constant. Finally, we apply the proposed method to a real data set and compare the results with traditional designs.  相似文献   

12.
Occasionally, investigators collect auxiliary marks at the time of failure in a clinical study. Because the failure event may be censored at the end of the follow‐up period, these marked endpoints are subject to induced censoring. We propose two new families of two‐sample tests for the null hypothesis of no difference in mark‐scale distribution that allows for arbitrary associations between mark and time. One family of proposed tests is a nonparametric extension of an existing semi‐parametric linear test of the same null hypothesis while a second family of tests is based on novel marked rank processes. Simulation studies indicate that the proposed tests have the desired size and possess adequate statistical power to reject the null hypothesis under a simple change of location in the marginal mark distribution. When the marginal mark distribution has heavy tails, the proposed rank‐based tests can be nearly twice as powerful as linear tests.  相似文献   

13.
This article develops statistical inference for the general linear models in order restricted randomized (ORR) designs. The ORR designs use the heterogeneity among experimental units to induce a negative correlation structure among responses obtained from different treatment regimes. This negative correlation structure acts as a variance reduction technique for treatment contrast. The parameters of the general linear models are estimated and a generalized F-test is constructed for its components. It is shown that the null distribution of the test statistic can be approximated reasonably well with an F-distribution for moderate sample sizes. It is also shown that the empirical power of the proposed test is substantially higher than the powers of its competitors in the literature. The proposed test and estimator are applied to a data set from a clinical trial to illustrate how one can improve such an experiment.  相似文献   

14.
Consider a longitudinal experiment where subjects are allocated to one of two treatment arms and are subjected to repeated measurements over time. Two non-parametric group sequential procedures, based on the Wilcoxon rank sum test and fitted with asymptotically efficient allocation rules, are derived to test the equality of the rates of change over time of the two treatments, when the distribution of responses is unknown. The procedures are designed to allow for early stopping to reject the null hypothesis while allocating less subjects to the inferior treatment. Simulations – based on the normal, the logistic and the exponential distributions – showed that the proposed allocation rules substantially reduce allocations to the inferior treatment, but at the expense of a relatively small increase in the total sample size and a moderate decrease in power as compared to the pairwise allocation rule.  相似文献   

15.
ABSTRACT

A statistical test can be seen as a procedure to produce a decision based on observed data, where some decisions consist of rejecting a hypothesis (yielding a significant result) and some do not, and where one controls the probability to make a wrong rejection at some prespecified significance level. Whereas traditional hypothesis testing involves only two possible decisions (to reject or not a null hypothesis), Kaiser’s directional two-sided test as well as the more recently introduced testing procedure of Jones and Tukey, each equivalent to running two one-sided tests, involve three possible decisions to infer the value of a unidimensional parameter. The latter procedure assumes that a point null hypothesis is impossible (e.g., that two treatments cannot have exactly the same effect), allowing a gain of statistical power. There are, however, situations where a point hypothesis is indeed plausible, for example, when considering hypotheses derived from Einstein’s theories. In this article, we introduce a five-decision rule testing procedure, equivalent to running a traditional two-sided test in addition to two one-sided tests, which combines the advantages of the testing procedures of Kaiser (no assumption on a point hypothesis being impossible) and Jones and Tukey (higher power), allowing for a nonnegligible (typically 20%) reduction of the sample size needed to reach a given statistical power to get a significant result, compared to the traditional approach.  相似文献   

16.
Tests on multivariate means that are hypothesized to be in a specified direction have received attention from both theoretical and applied points of view. One of the most common procedures used to test this cone alternative is the likelihood ratio test (LRT) assuming a multivariate normal model for the data. However, the resulting test for an ordered alternative is biased in that the only usable critical values are bounds on the null distribution. The present paper provides empirical evidence that bootstrapping the null distribution of the likelihood ratio statistic results in a bootstrap test (BT) with comparable power properties without the additional burden of assuming multivariate normality. Additionally, the tests based on the LRT statistic can reject the null hypothesis in favor of the alternative even though the true means are far from the alternative region. The BT also has similar properties for normal and nonnormal data. This anomalous behavior is due to the formulation of the null hypothesis and a possible remedy is to reformulate the null to be the complement of the alternative hypothesis. We discuss properties of a BT for the modified set of hypotheses (MBT) based on a simulation study. The resulting test is conservative in general and in some specific cases has power estimates comparable to those for existing methods. The BT has higher sensitivity but relatively lower specificity, whereas the MBT has higher specificity but relatively lower sensitivity.  相似文献   

17.
In this paper we evaluate the performance of three methods for testing the existence of a unit root in a time series, when the models under consideration in the null hypothesis do not display autocorrelation in the error term. In such cases, simple versions of the Dickey-Fuller test should be used as the most appropriate ones instead of the known augmented Dickey-Fuller or Phillips-Perron tests. Through Monte Carlo simulations we show that, apart from a few cases, testing the existence of a unit root we obtain actual type I error and power very close to their nominal levels. Additionally, when the random walk null hypothesis is true, by gradually increasing the sample size, we observe that p-values for the drift in the unrestricted model fluctuate at low levels with small variance and the Durbin-Watson (DW) statistic is approaching 2 in both the unrestricted and restricted models. If, however, the null hypothesis of a random walk is false, taking a larger sample, the DW statistic in the restricted model starts to deviate from 2 while in the unrestricted model it continues to approach 2. It is also shown that the probability not to reject that the errors are uncorrelated, when they are indeed not correlated, is higher when the DW test is applied at 1% nominal level of significance.  相似文献   

18.
Optimal three-stage designs with equal sample sizes at each stage are presented and compared to fixed sample designs, fully sequential designs, designs restricted to use the fixed sample critical value at the final stage, and to modifications of other group sequential designs previously proposed in the literature. Typically, the greatest savings realized with interim analyses are obtained by the first interim look. More than 50% of the savings possible with a fully sequential design can be realized with a simple two-stage design. Three-stage designs can realize as much as 75% of the possible savings. Without much loss in efficiency, the designs can be modified so that the critical value at the final stage equals the usual fixed sample value while maintaining the overall level of significance, alleviating some potential confusion should a final stage be necessary. Some common group sequential designs, modified to allow early acceptance of the null hypothesis, are shown to be nearly optimal in some settings while performing poorly in others. An example is given to illustrate the use of several three-stage plans in the design of clinical trials.  相似文献   

19.
Phase II clinical trials designed for evaluating a drug's treatment effect can be either single‐arm or double‐arm. A single‐arm design tests the null hypothesis that the response rate of a new drug is lower than a fixed threshold, whereas a double‐arm scheme takes a more objective comparison of the response rate between the new treatment and the standard of care through randomization. Although the randomized design is the gold standard for efficacy assessment, various situations may arise where a single‐arm pilot study prior to a randomized trial is necessary. To combine the single‐ and double‐arm phases and pool the information together for better decision making, we propose a Single‐To‐double ARm Transition design (START) with switching hypotheses tests, where the first stage compares the new drug's response rate with a minimum required level and imposes a continuation criterion, and the second stage utilizes randomization to determine the treatment's superiority. We develop a software package in R to calibrate the frequentist error rates and perform simulation studies to assess the trial characteristics. Finally, a metastatic pancreatic cancer trial is used for illustrating the decision rules under the proposed START design.  相似文献   

20.

A basic graphical approach for checking normality is the Q - Q plot that compares sample quantiles against the population quantiles. In the univariate analysis, the probability plot correlation coefficient test for normality has been studied extensively. We consider testing the multivariate normality by using the correlation coefficient of the Q - Q plot. When multivariate normality holds, the sample squared distance should follow a chi-square distribution for large samples. The plot should resemble a straight line. A correlation coefficient test can be constructed by using the pairs of points in the probability plot. When the correlation coefficient test does not reject the null hypothesis, the sample data may come from a multivariate normal distribution or some other distributions. So, we use the following two steps to test multivariate normality. First, we check the multivariate normality by using the probability plot correction coefficient test. If the test does not reject the null hypothesis, then we test symmetry of the distribution and determine whether multivariate normality holds. This test procedure is called the combination test. The size and power of this test are studied, and it is found that the combination test, in general, is more powerful than other tests for multivariate normality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号