首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Collecting individual patient data has been described as the 'gold standard' for undertaking meta-analysis. If studies involve time-to-event outcomes, conducting a meta-analysis based on aggregate data can be problematical. Two meta-analyses of randomized controlled trials with time-to-event outcomes are used to illustrate the practicality and value of several proposed methods to obtain summary statistic estimates. In the first example the results suggest that further effort should be made to find unpublished trials. In the second example the use of aggregate data for trials where no individual patient data have been supplied allows the totality of evidence to be assessed and indicates previously unrecognized heterogeneity.  相似文献   

2.
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre‐specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre‐specifying multiple test statistics and relying on the minimum p‐value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p‐value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p‐value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p‐value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
A Monte Carlo simulation evaluated five pairwise multiple comparison procedures for controlling Type I error rates, any-pair power, and all-pairs power. Realistic conditions of non-normality were based on a previous survey. Variance ratios were varied from 1:1 to 64:1. Procedures evaluated included Tukey's honestly significant difference (HSD) preceded by an F test, the Hayter–Fisher, the Games–Howell preceded by an F test, the Pertiz with F tests, and the Peritz with Alexander–Govern tests. Tukey's procedure shows the greatest robustness in Type I error control. Any-pair power is generally best with one of the Peritz procedures. All-pairs power is best with the Pertiz F test procedure. However, Tukey's HSD preceded by the Alexander–Govern F test may provide the best combination for controlling Type I and power rates in a variety of conditions of non-normality and variance heterogeneity.  相似文献   

4.
ABSTRACT

Background: Many exposures in epidemiological studies have nonlinear effects and the problem is to choose an appropriate functional relationship between such exposures and the outcome. One common approach is to investigate several parametric transformations of the covariate of interest, and to select a posteriori the function that fits the data the best. However, such approach may result in an inflated Type I error. Methods: Through a simulation study, we generated data from Cox's models with different transformations of a single continuous covariate. We investigated the Type I error rate and the power of the likelihood ratio test (LRT) corresponding to three different procedures that considered the same set of parametric dose-response functions. The first unconditional approach did not involve any model selection, while the second conditional approach was based on a posteriori selection of the parametric function. The proposed third approach was similar to the second except that it used a corrected critical value for the LRT to ensure a correct Type I error. Results: The Type I error rate of the second approach was two times higher than the nominal size. For simple monotone dose-response, the corrected test had similar power as the unconditional approach, while for non monotone, dose-response, it had a higher power. A real-life application that focused on the effect of body mass index on the risk of coronary heart disease death, illustrated the advantage of the proposed approach. Conclusion: Our results confirm that a posteriori selecting the functional form of the dose-response induces a Type I error inflation. The corrected procedure, which can be applied in a wide range of situations, may provide a good trade-off between Type I error and power.  相似文献   

5.
We consider the blinded sample size re‐estimation based on the simple one‐sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two‐sample t‐test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re‐estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non‐inferiority margin for non‐inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

6.
Moderated multiple regression provides a useful framework for understanding moderator variables. These variables can also be examined within multilevel datasets, although the literature is not clear on the best way to assess data for significant moderating effects, particularly within a multilevel modeling framework. This study explores potential ways to test moderation at the individual level (level one) within a 2-level multilevel modeling framework, with varying effect sizes, cluster sizes, and numbers of clusters. The study examines five potential methods for testing interaction effects: the Wald test, F-test, likelihood ratio test, Bayesian information criterion (BIC), and Akaike information criterion (AIC). For each method, the simulation study examines Type I error rates and power. Following the simulation study, an applied study uses real data to assess interaction effects using the same five methods. Results indicate that the Wald test, F-test, and likelihood ratio test all perform similarly in terms of Type I error rates and power. Type I error rates for the AIC are more liberal, and for the BIC typically more conservative. A four-step procedure for applied researchers interested in examining interaction effects in multi-level models is provided.  相似文献   

7.
Most multivariate statistical techniques rely on the assumption of multivariate normality. The effects of nonnormality on multivariate tests are assumed to be negligible when variance–covariance matrices and sample sizes are equal. Therefore, in practice, investigators usually do not attempt to assess multivariate normality. In this simulation study, the effects of skewed and leptokurtic multivariate data on the Type I error and power of Hotelling's T 2 were examined by manipulating distribution, sample size, and variance–covariance matrix. The empirical Type I error rate and power of Hotelling's T 2 were calculated before and after the application of generalized Box–Cox transformation. The findings demonstrated that even when variance–covariance matrices and sample sizes are equal, small to moderate changes in power still can be observed.  相似文献   

8.
This paper presents the results of a small sample simulation study designed to evaluate the performance of a recently proposed test statistic for the analysis of correlated binary data. The new statistic is an adjusted Mantel-Haenszel test, which may be used in testing for association between a binary exposure and a binary outcome of interest across several fourfold tables when the data have been collected under a cluster sampling design. Al- though originally developed for the analysis of periodontal data, the proposed method may be applied to clustered binary data arising in a variety of settings, including longitu- dinal studies, family studies, and school-based research. The features of the simulation are intended to mimic those of a research study of periodontal health, in which a large number of observations is made on each of a relatively small number of patients. The simulation reveals that the adjusted test statistic performs well in finite samples, having empirical type I error rates close to nominal and empirical power similar to that of more complicated marginal regression methods. Software for computing the adjusted statistic is also provided.  相似文献   

9.
This study examined the influence of heterogeneity of variance on Type I error rates and power of the independent-samples Student's t-test of equality of means on samples of scores from normal and 10 non-normal distributions. The same test of equality of means was performed on corresponding rank-transformed scores. For many non-normal distributions, both versions produced anomalous power functions, resulting partly from the fact that the hypothesis test was biased, so that under some conditions, the probability of rejecting H 0 decreased as the difference between means increased. In all cases where bias occurred, the t-test on ranks exhibited substantially greater bias than the t-test on scores. This anomalous result was independent of the more familiar changes in Type I error rates and power attributable to unequal sample sizes combined with unequal variances.  相似文献   

10.
Paired binary data arise naturally when paired body parts are investigated in clinical trials. One of the widely used models for dealing with this kind of data is the equal correlation coefficients model. Before using this model, it is necessary to test whether the correlation coefficients in each group are actually equal. In this paper, three test statistics (likelihood ratio test, Wald-type test, and Score test) are derived for this purpose. The simulation results show that the Score test statistic maintains type I error rate and has satisfactory power, and therefore is recommended among the three methods. The likelihood ratio test is over conservative in most cases, and the Wald-type statistic is not robust with respect to empirical type I error. Three real examples, including a multi-centre Phase II double-blind placebo randomized controlled trial, are given to illustrate the three proposed test statistics.  相似文献   

11.
For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests.  相似文献   

12.
We aimed to determine the most proper change measure among simple difference, percent, or symmetrized percent changes in simple paired designs. For this purpose, we devised a computer simulation program. Since distributions of percent and symmetrized percent change values are skewed and bimodal, paired t-test did not give good results according to Type I error and the test power. To be to able use percent change or symmetrized percent change as change measure, either the distribution of test statistics should be transformed to a known theoretical distribution by transformation methods or a new test statistic for these values should be developed.  相似文献   

13.
The standard hypothesis testing procedure in meta-analysis (or multi-center clinical trials) in the absence of treatment-by-center interaction relies on approximating the null distribution of the standard test statistic by a standard normal distribution. For relatively small sample sizes, the standard procedure has been shown by various authors to have poor control of the type I error probability, leading to too many liberal decisions. In this article, two test procedures are proposed, which rely on thet—distribution as the reference distribution. A simulation study indicates that the proposed procedures attain significance levels closer to the nominal level compared with the standard procedure.  相似文献   

14.
Several procedures have been proposed for testing the hypothesis that all off-diagonal elements of the correlation matrix of a multivariate normal distribution are equal. If the hypothesis of equal correlation can be accepted, it is then of interest to estimate and perhaps test hypotheses for the common correlation. In this paper, two versions of five different test statistics are compared via simulation in terms of adequacy of the normal approximation, coverage probabilities of confidence intervals, control of Type I error, and power. The results indicate that two test statistics based on the average of the Fisher z-transforms of the sample correlations should be used in most cases. A statistic based on the sample eigenvalues also gives reasonable results for confidence intervals and lower-tailed tests.  相似文献   

15.
In terms of the risk of making a Type I error in evaluating a null hypothesis of equality, requiring two independent confirmatory trials with two‐sided p‐values less than 0.05 is equivalent to requiring one confirmatory trial with two‐sided p‐value less than 0.001 25. Furthermore, the use of a single confirmatory trial is gaining acceptability, with discussion in both ICH E9 and a CPMP Points to Consider document. Given the growing acceptance of this approach, this note provides a formula for the sample size savings that are obtained with the single clinical trial approach depending on the levels of Type I and Type II errors chosen. For two replicate trials each powered at 90%, which corresponds to a single larger trial powered at 81%, an approximate 19% reduction in total sample size is achieved with the single trial approach. Alternatively, a single trial with the same sample size as the total sample size from two smaller trials will have much greater power. For example, in the case where two trials are each powered at 90% for two‐sided α=0.05 yielding an overall power of 81%, a single trial using two‐sided α=0.001 25 would have 91% power. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

16.
The borrowing of historical control data can be an efficient way to improve the treatment effect estimate of the current control group in a randomized clinical trial. When the historical and current control data are consistent, the borrowing of historical data can increase power and reduce Type I error rate. However, when these 2 sources of data are inconsistent, it may result in a combination of biased estimates, reduced power, and inflation of Type I error rate. In some situations, inconsistency between historical and current control data may be caused by a systematic variation in the measured baseline prognostic factors, which can be appropriately addressed through statistical modeling. In this paper, we propose a Bayesian hierarchical model that can incorporate patient‐level baseline covariates to enhance the appropriateness of the exchangeability assumption between current and historical control data. The performance of the proposed method is shown through simulation studies, and its application to a clinical trial design for amyotrophic lateral sclerosis is described. The proposed method is developed for scenarios involving multiple imbalanced prognostic factors and thus has meaningful implications for clinical trials evaluating new treatments for heterogeneous diseases such as amyotrophic lateral sclerosis.  相似文献   

17.
The Bartlett's test (1937) for equality of variances is based on the χ2 distribution approximation. This approximation deteriorates either when the sample size is small (particularly < 4) or when the population number is large. According to a simulation investigation, we find a similar varying trend for the mean differences between empirical distributions of Bartlett's statistics and their χ2 approximations. By using the mean differences to represent the distribution departures, a simple adjustment approach on the Bartlett's statistic is proposed on the basis of equal mean principle. The performance before and after adjustment is extensively investigated under equal and unequal sample sizes, with number of populations varying from 3 to 100. Compared with the traditional Bartlett's statistic, the adjusted statistic is distributed more closely to χ2 distribution, for homogeneity samples from normal populations. The type I error is well controlled and the power is a little higher after adjustment. In conclusion, the adjustment has good control on the type I error and higher power, and thus is recommended for small samples and large population number when underlying distribution is normal.  相似文献   

18.
We address statistical issues involved in the partially clustered design where clusters are only employed in the intervention arm, but not in the control arm. We develop a cluster adjusted t-test to compare group treatment effects with individual treatment effects for continuous outcomes in which the individual level data are used as the unit of the analysis in both arms, we develop an approach for determining sample sizes using this cluster adjusted t-test, and use simulation to demonstrate the consistent accuracy of the proposed cluster adjusted t-test and power estimation procedures. Two real examples illustrate how to use the proposed methods.  相似文献   

19.
K correlated 2×2 tables with structural zero are commonly encountered in infectious disease studies. A hypothesis test for risk difference is considered in K independent 2×2 tables with structural zero in this paper. Score statistic, likelihood ratio statistic and Wald‐type statistic are proposed to test the hypothesis on the basis of stratified data and pooled data. Sample size formulae are derived for controlling a pre‐specified power or a pre‐determined confidence interval width. Our empirical results show that score statistic and likelihood ratio statistic behave better than Wald‐type statistic in terms of type I error rate and coverage probability, sample sizes based on stratified test are smaller than those based on the pooled test in the same design. A real example is used to illustrate the proposed methodologies. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

20.
In rare diseases, typically only a small number of patients are available for a randomized clinical trial. Nevertheless, it is not uncommon that more than one study is performed to evaluate a (new) treatment. Scarcity of available evidence makes it particularly valuable to pool the data in a meta-analysis. When the primary outcome is binary, the small sample sizes increase the chance of observing zero events. The frequentist random-effects model is known to induce bias and to result in improper interval estimation of the overall treatment effect in a meta-analysis with zero events. Bayesian hierarchical modeling could be a promising alternative. Bayesian models are known for being sensitive to the choice of prior distributions for between-study variance (heterogeneity) in sparse settings. In a rare disease setting, only limited data will be available to base the prior on, therefore, robustness of estimation is desirable. We performed an extensive and diverse simulation study, aiming to provide practitioners with advice on the choice of a sufficiently robust prior distribution shape for the heterogeneity parameter. Our results show that priors that place some concentrated mass on small τ values but do not restrict the density for example, the Uniform(−10, 10) heterogeneity prior on the log(τ2) scale, show robust 95% coverage combined with less overestimation of the overall treatment effect, across varying degrees of heterogeneity. We illustrate the results with meta-analyzes of a few small trials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号