首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Interest in confirmatory adaptive combined phase II/III studies with treatment selection has increased in the past few years. These studies start comparing several treatments with a control. One (or more) treatment(s) is then selected after the first stage based on the available information at an interim analysis, including interim data from the ongoing trial, external information and expert knowledge. Recruitment continues, but now only for the selected treatment(s) and the control, possibly in combination with a sample size reassessment. The final analysis of the selected treatment(s) includes the patients from both stages and is performed such that the overall Type I error rate is strictly controlled, thus providing confirmatory evidence of efficacy at the final analysis. In this paper we describe two approaches to control the Type I error rate in adaptive designs with sample size reassessment and/or treatment selection. The first method adjusts the critical value using a simulation-based approach, which incorporates the number of patients at an interim analysis, the true response rates, the treatment selection rule, etc. We discuss the underlying assumptions of simulation-based procedures and give several examples where the Type I error rate is not controlled if some of the assumptions are violated. The second method is an adaptive Bonferroni-Holm test procedure based on conditional error rates of the individual treatment-control comparisons. We show that this procedure controls the Type I error rate, even if a deviation from a pre-planned adaptation rule or the time point of such a decision is necessary.  相似文献   

2.
Propensity score analysis (PSA) is a technique to correct for potential confounding in observational studies. Covariate adjustment, matching, stratification, and inverse weighting are the four most commonly used methods involving propensity scores. The main goal of this research is to determine which PSA method performs the best in terms of protecting against spurious association detection, as measured by Type I error rate, while maintaining sufficient power to detect a true association, if one exists. An examination of these PSA methods along with ordinary least squares regression was conducted under two cases: correct PSA model specification and incorrect PSA model specification. PSA covariate adjustment and PSA matching maintain the nominal Type I error rate, when the PSA model is correctly specified, but only PSA covariate adjustment achieves adequate power levels. Other methods produced conservative Type I Errors in some scenarios, while liberal Type I error rates were observed in other scenarios.  相似文献   

3.
The term 'futility' is used to refer to the inability of a clinical trial to achieve its objectives. In particular, stopping a clinical trial when the interim results suggest that it is unlikely to achieve statistical significance can save resources that could be used on more promising research. There are various approaches that have been proposed to assess futility, including stochastic curtailment, predictive power, predictive probability, and group sequential methods. In this paper, we describe and contrast these approaches, and discuss several issues associated with futility analyses, such as ethical considerations, whether or not type I error can or should be reclaimed, one-sided vs two-sided futility rules, and the impact of futility analyses on power.  相似文献   

4.
Sequential monitoring of efficacy and safety data has become a vital component of modern clinical trials. It affords companies the opportunity to stop studies early in cases when it appears as if the primary objective will not be achieved or when there is clear evidence that the primary objective has already been met. This paper introduces a new concept of the backward conditional hypothesis test (BCHT) to evaluate clinical trial success. Unlike the regular conditional power approach that relies on the probability that the final study result will be statistically significant based on the current interim look, the BCHT was constructed based on the hypothesis test framework. The framework comprises a significant test level as opposed to the arbitrary fixed futility index utilized in the conditional power method. Additionally, the BCHT has proven to be a uniformly most powerful test. Noteworthy features of the BCHT method compared with the conditional power method will be presented. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

5.
A Monte Carlo simulation was conducted to compare the type I error rate and test power of the analysis of means (ANOM) test to the one-way analysis of variance F-test (ANOVA-F). Simulation results showed that as long as the homogeneity of the variance assumption was satisfied, regardless of the shape of the distribution, number of group and the combination of observations, both ANOVA-F and ANOM test have displayed similar type I error rates. However, both tests have been negatively affected from the heterogeneity of the variances. This case became more obvious when the variance ratios increased. The test power values of both tests changed with respect to the effect size (Δ), variance ratio and sample size combinations. As long as the variances are homogeneous, ANOVA-F and ANOM test have similar powers except unbalanced cases. Under unbalanced conditions, the ANOVA-F was observed to be powerful than the ANOM-test. On the other hand, an increase in total number of observations caused the power values of ANOVA-F and ANOM test approach to each other. The relations between effect size (Δ) and the variance ratios affected the test power, especially when the sample sizes are not equal. As ANOVA-F has become to be superior in some of the experimental conditions being considered, ANOM is superior in the others. However, generally, when the populations with large mean have larger variances as well, ANOM test has been seen to be superior. On the other hand, when the populations with large mean have small variances, generally, ANOVA-F has observed to be superior. The situation became clearer when the number of the groups is 4 or 5.  相似文献   

6.
In an environment where (i) potential risks to subjects participating in clinical studies need to be managed carefully, (ii) trial costs are increasing, and (iii) there are limited research resources available, it is necessary to prioritize research projects and sometimes re-prioritize if early indications suggest that a trial has low probability of success. Futility designs allow this re-prioritization to take place. This paper reviews a number of possible futility methods available and presents a case study from a late-phase study of an HIV therapeutic, which utilized conditional power-based stopping thresholds. The two most challenging aspects of incorporating a futility interim analysis into a trial design are the selection of optimal stopping thresholds and the timing of the analysis, both of which require the balancing of various risks. The paper outlines a number of graphical aids that proved useful in explaining the statistical risks involved to the study team. Further, the paper outlines a decision analysis undertaken which combined expectations of drug performance with conditional power calculations in order to produce probabilities of different interim and final outcomes, and which ultimately led to the selection of the final stopping thresholds.  相似文献   

7.
For the case of a one‐sample experiment with known variance σ2=1, it has been shown that at interim analysis the sample size (SS) may be increased by any arbitrary amount provided: (1) The conditional power (CP) at interim is ?50% and (2) there can be no decision to decrease the SS (stop the trial early). In this paper we verify this result for the case of a two‐sample experiment with proportional SS in the treatment groups and an arbitrary common variance. Numerous authors have presented the formula for the CP at interim for a two‐sample test with equal SS in the treatment groups and an arbitrary common variance, for both the one‐ and two‐sided hypothesis tests. In this paper we derive the corresponding formula for the case of unequal, but proportional SS in the treatment groups for both one‐sided superiority and two‐sided hypothesis tests. Finally, we present an SAS macro for doing this calculation and provide a worked out hypothetical example. In discussion we note that this type of trial design trades the ability to stop early (for lack of efficacy) for the elimination of the Type I error penalty. The loss of early stopping requires that such a design employs a data monitoring committee, blinding of the sponsor to the interim calculations, and pre‐planning of how much and under what conditions to increase the SS and that this all be formally written into an interim analysis plan before the start of the study. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

8.
In monitoring clinical trials, the question of futility, or whether the data thus far suggest that the results at the final analysis are unlikely to be statistically successful, is regularly of interest over the course of a study. However, the opposite viewpoint of whether the study is sufficiently demonstrating proof of concept (POC) and should continue is a valuable consideration and ultimately should be addressed with high POC power so that a promising study is not prematurely terminated. Conditional power is often used to assess futility, and this article interconnects the ideas of assessing POC for the purpose of study continuation with conditional power, while highlighting the importance of the POC type I error and the POC type II error for study continuation or not at the interim analysis. Methods for analyzing subgroups motivate the interim analyses to maintain high POC power via an adjusted interim POC significance level criterion for study continuation or testing against an inferiority margin. Furthermore, two versions of conditional power based on the assumed effect size or the observed interim effect size are considered. Graphical displays illustrate the relationship of the POC type II error for premature study termination to the POC type I error for study continuation and the associated conditional power criteria.  相似文献   

9.
When independent variables are measured with error, ordinary least squares regression can yield parameter estimates that are biased and inconsistent. This article documents an inflation of Type I error rate that can also occur. In addition to analytic results, a large‐scale Monte Carlo study shows unacceptably high Type I error rates under circumstances that could easily be encountered in practice. A set of smaller‐scale simulations indicate that the problem applies to various types of regression and various types of measurement error. The Canadian Journal of Statistics 37: 33‐46; 2009 © 2009 Statistical Society of Canada  相似文献   

10.
In this article, it is shown how to compute, in an approximated way, probabilities of Type I error and Type II error of sequential Bayesian procedures for testing one-sided null hypotheses. First, some theoretical results are obtained, and then an algorithm is developed for applying these results. The prior predictive density plays a central role in this study.  相似文献   

11.
ABSTRACT

Likelihood ratio tests for a change in mean in a sequence of independent, normal random variables are based on the maximum two-sample t-statistic, where the maximum is taken over all possible changepoints. The maximum t-statistic has the undesirable characteristic that Type I errors are not uniformly distributed across possible changepoints. False positives occur more frequently near the ends of the sequence and occur less frequently near the middle of the sequence. In this paper we describe an alternative statistic that is based upon a minimum p-value, where the minimum is taken over all possible changepoints. The p-value at any particular changepoint is based upon both the two-sample t-statistic at that changepoint and the probability that the maximum two-sample t-statistic is achieved at that changepoint. The new statistic has a more uniform distribution of Type I errors across potential changepoints and it compares favorably with respect to statistical power, false discovery rates, and the mean square error of changepoint estimates.  相似文献   

12.
The effect of restricted randomisations on the validity and efficiency of using spatial model as well as more common analysis of variance methods for analysing field trials was examined by simulating yields in agricultural fields with known spatial variation and analysing those using eight different statistical models. Some of the models took into account the restrictions on the randomisation and some matched the true generated spatial variation. Two different types of spatial variations were generated. The results showed that the type I error was controlled if the correct spatial model was applied or if the model took into account the restrictions on the randomisation. If the model did not include the correct spatial model nor reflected the randomisation, the type I error could be much higher than the nominal value. The power of the tests could be increased if the model paid attention to the spatial variation. However, it may not always be possible to find the correct and true model. Therefore it is recommended that the random part of the model should reflect both the randomisation process and include terms that take into account spatial variation in the data in order to increase the power.  相似文献   

13.
We compared the robustness of univariate and multivariate statistical procedures to control Type I error rates when the normality and homocedasticity assumptions were not fulfilled. The procedures we evaluated are the mixed model adjusted by means of the SAS Proc Mixed module, and Bootstrap-F approach, Brown–Forsythe multivariate approach, Welch–James multivariate approach, and Welch–James multivariate approach with robust estimators. The results suggest that the Kenward Roger, Brown–Forsythe, Welch–James, and Improved Generalized Aprroximate procedures satisfactorily kept Type I error rates within the nominal levels for both the main and interaction effects under most of the conditions assessed.  相似文献   

14.
Real world data often fail to meet the underlying assumption of population normality. The Rank Transformation (RT) procedure has been recommended as an alternative to the parametric factorial analysis of covariance (ANCOVA). The purpose of this study was to compare the Type I error and power properties of the RT ANCOVA to the parametric procedure in the context of a completely randomized balanced 3 × 4 factorial layout with one covariate. This study was concerned with tests of homogeneity of regression coefficients and interaction under conditional (non)normality. Both procedures displayed erratic Type I error rates for the test of homogeneity of regression coefficients under conditional nonnormality. With all parametric assumptions valid, the simulation results demonstrated that the RT ANCOVA failed as a test for either homogeneity of regression coefficients or interaction due to severe Type I error inflation. The error inflation was most severe when departures from conditional normality were extreme. Also associated with the RT procedure was a loss of power. It is recommended that the RT procedure not be used as an alternative to factorial ANCOVA despite its encouragement from SAS, IMSL, and other respected sources.  相似文献   

15.
In early drug development, especially when studying new mechanisms of action or in new disease areas, little is known about the targeted or anticipated treatment effect or variability estimates. Adaptive designs that allow for early stopping but also use interim data to adapt the sample size have been proposed as a practical way of dealing with these uncertainties. Predictive power and conditional power are two commonly mentioned techniques that allow predictions of what will happen at the end of the trial based on the interim data. Decisions about stopping or continuing the trial can then be based on these predictions. However, unless the user of these statistics has a deep understanding of their characteristics important pitfalls may be encountered, especially with the use of predictive power. The aim of this paper is to highlight these potential pitfalls. It is critical that statisticians understand the fundamental differences between predictive power and conditional power as they can have dramatic effects on decision making at the interim stage, especially if used to re-evaluate the sample size. The use of predictive power can lead to much larger sample sizes than either conditional power or standard sample size calculations. One crucial difference is that predictive power takes account of all uncertainty, parts of which are ignored by standard sample size calculations and conditional power. By comparing the characteristics of each of these statistics we highlight important characteristics of predictive power that experimenters need to be aware of when using this approach.  相似文献   

16.
Conditional power calculations are frequently used to guide the decision whether or not to stop a trial for futility or to modify planned sample size. These ignore the information in short‐term endpoints and baseline covariates, and thereby do not make fully efficient use of the information in the data. We therefore propose an interim decision procedure based on the conditional power approach which exploits the information contained in baseline covariates and short‐term endpoints. We will realize this by considering the estimation of the treatment effect at the interim analysis as a missing data problem. This problem is addressed by employing specific prediction models for the long‐term endpoint which enable the incorporation of baseline covariates and multiple short‐term endpoints. We show that the proposed procedure leads to an efficiency gain and a reduced sample size, without compromising the Type I error rate of the procedure, even when the adopted prediction models are misspecified. In particular, implementing our proposal in the conditional power approach enables earlier decisions relative to standard approaches, whilst controlling the probability of an incorrect decision. This time gain results in a lower expected number of recruited patients in case of stopping for futility, such that fewer patients receive the futile regimen. We explain how these methods can be used in adaptive designs with unblinded sample size re‐assessment based on the inverse normal P‐value combination method to control Type I error. We support the proposal by Monte Carlo simulations based on data from a real clinical trial.  相似文献   

17.
Research on tests for scale equality, that are robust to violations of the distributional normality assumption, have focused exclusively on an overall test statistic and have not examined procedures for identifying specific differences in multiple group designs. The present study compares four contrast analysis procedures for scale differences in the single factor four group design. Two data transformations are considered under several conbinations of variance difference, sample sizes, and distributional forms.The results indicate that no single transformation or analysis procedure is uniformly superior in controlling the familywise error rate or in statistical power. The relationship between sample size and variances is a major factor in selecting a contrast analysis procedure.  相似文献   

18.
One characterization of group sequential methods uses alpha spending functions to allocate the false positive rate throughout a study. We consider and evaluate several such spending functions as well as the time points of the interim analyses at which they apply. In addition, we evaluate the double triangular test as an alternative procedure that allows for early termination of the trial not only due to efficacy differences between treatments, but also due to lack of such differences. We motivate and illustrate our work by reference to the analysis of survival data from a proposed oncology study. Such group sequential procedures with one or two interim analyses are only slightly less powerful than fixed sample trials, but provide for the strong possibility of early stopping. Therefore, in all situations where they can practically be applied, we recommend their routine use in clinical trials. The double triangular test provides a suitable alternative to the group sequential procedures in that they do not provide for early stopping with acceptance of the null hypothesis. Again, there is only a modest loss in power relative to fixed sample tests. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

19.
The problem of predicting times to failure of units from the Exponential Distribution which are censored under a simple step-stress model is considered in this article. We discuss two types of censoring—regular and progressive Type I—and two kinds of predictors—the maximum likelihood predictors (MLP) and the conditional median predictors (CMP) for each type of censoring. Numerical examples are used to illustrate the prediction methods. Using simulation studies, mean squared prediction error (MSPE) and prediction intervals are generated for these examples. MLP and the CMP are then compared with respect to MSPE and the prediction interval.  相似文献   

20.
Multi-arm trials are an efficient way of simultaneously testing several experimental treatments against a shared control group. As well as reducing the sample size required compared to running each trial separately, they have important administrative and logistical advantages. There has been debate over whether multi-arm trials should correct for the fact that multiple null hypotheses are tested within the same experiment. Previous opinions have ranged from no correction is required, to a stringent correction (controlling the probability of making at least one type I error) being needed, with regulators arguing the latter for confirmatory settings. In this article, we propose that controlling the false-discovery rate (FDR) is a suitable compromise, with an appealing interpretation in multi-arm clinical trials. We investigate the properties of the different correction methods in terms of the positive and negative predictive value (respectively how confident we are that a recommended treatment is effective and that a non-recommended treatment is ineffective). The number of arms and proportion of treatments that are truly effective is varied. Controlling the FDR provides good properties. It retains the high positive predictive value of FWER correction in situations where a low proportion of treatments is effective. It also has a good negative predictive value in situations where a high proportion of treatments is effective. In a multi-arm trial testing distinct treatment arms, we recommend that sponsors and trialists consider use of the FDR.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号