首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels .05, .01, and .001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates.  相似文献   

2.
3.
Traditional vaccine efficacy trials usually use fixed designs with fairly large sample sizes. Recruiting a large number of subjects requires longer time and higher costs. Furthermore, vaccine developers are more than ever facing the need to accelerate vaccine development to fulfill the public's medical needs. A possible approach to accelerate development is to use the method of dynamic borrowing of historical controls in clinical trials. In this paper, we evaluate the feasibility and the performance of this approach in vaccine development by retrospectively analyzing two real vaccine studies: a relatively small immunological trial (typical early phase study) and a large vaccine efficacy trial (typical Phase 3 study) assessing prophylactic human papillomavirus vaccine. Results are promising, particularly for early development immunological studies, where the adaptive design is feasible, and control of type I error is less relevant.  相似文献   

4.
Traditionally, noninferiority hypotheses have been tested using a frequentist method with a fixed margin. Given that information for the control group is often available from previous studies, it is interesting to consider a Bayesian approach in which information is “borrowed” for the control group to improve efficiency. However, construction of an appropriate informative prior can be challenging. In this paper, we consider a hybrid Bayesian approach for testing noninferiority hypotheses in studies with a binary endpoint. To account for heterogeneity between the historical information and the current trial for the control group, a dynamic P value–based power prior parameter is proposed to adjust the amount of information borrowed from the historical data. This approach extends the simple test‐then‐pool method to allow a continuous discounting power parameter. An adjusted α level is also proposed to better control the type I error. Simulations are conducted to investigate the performance of the proposed method and to make comparisons with other methods including test‐then‐pool and hierarchical modeling. The methods are illustrated with data from vaccine clinical trials.  相似文献   

5.
Unfortunately, an uncorrected version of the above paper was inadvertently published. Although not altering the overall conclusions, a technical mistake (only discovered during revision) required many small modifications to the test. JVe now list these fairly substantial changes, which include the addition of an explanatory Appendix. Other minor improvements in presentation of the revised paper are omitted here - but copies of the fully corrected version of the paper will be sent to people who request offprints.  相似文献   

6.
A 3‐arm trial design that includes an experimental treatment, an active reference treatment, and a placebo is useful for assessing the noninferiority of an experimental treatment. The inclusion of a placebo arm enables the assessment of assay sensitivity and internal validation, in addition to the testing of the noninferiority of the experimental treatment compared with the reference treatment. In 3‐arm noninferiority trials, various statistical test procedures have been considered to evaluate the following 3 hypotheses: (i) superiority of the experimental treatment over the placebo, (ii) superiority of the reference treatment over the placebo, and (iii) noninferiority of the experimental treatment compared with the reference treatment. However, hypothesis (ii) can be insufficient and may not accurately assess the assay sensitivity for the noninferiority of the experimental treatment compared with the reference treatment. Thus, demonstrating that the superiority of the reference treatment over the placebo is greater than the noninferiority margin (the nonsuperiority of the reference treatment compared with the placebo) can be necessary. Here, we propose log‐rank statistical procedures for evaluating data obtained from 3‐arm noninferiority trials to assess assay sensitivity with a prespecified margin Δ. In addition, we derive the approximate sample size and optimal allocation required to minimize the total sample size and that of the placebo treatment sample size, hierarchically.  相似文献   

7.
Noninferiority testing in clinical trials is commonly understood in a Neyman-Pearson framework, and has been discussed in a Bayesian framework as well. In this paper, we discuss noninferiority testing in a Fisherian framework, in which the only assumption necessary for inference is the assumption of randomization of treatments to study subjects. Randomization plays an important role in not only the design but also the analysis of clinical trials, no matter the underlying inferential field. The ability to utilize permutation tests depends on assumptions around exchangeability, and we discuss the possible uses of permutation tests in active control noninferiority analyses. The other practical implications of this paper are admittedly minor but lead to better understanding of the historical and philosophical development of active control noninferiority testing. The conclusion may also frame discussion of other complicated issues in noninferiority testing, such as the role of an intention to treat analysis.  相似文献   

8.
The FDA released the final guidance on noninferiority trials in November 2016. In noninferiority trials, validity of the assessment of the efficacy of the test treatment depends on the control treatment's efficacy. Therefore, it is critically important that there be a reliable estimate of the control treatment effect—which is generally obtained from historical trials, and often assumed to hold in the current setting (the assay constancy assumption). Validating the constancy assumption requires clinical data, which are typically lacking. The guidance acknowledges that “lack of constancy can occur for many reasons.” We clarify the objectives of noninferiority trials. We conclude that correction for bias, rather than assay constancy, is critical to conducting valid noninferiority trials. We propose that assay constancy not be assumed and discounting or thresholds be used to address concern about loss of historical efficacy. Examples are provided for illustration.  相似文献   

9.
Test statistics from the class of two-sample linear rank tests are commonly used to compare a treatment group with a control group. Two independent random samples of sizes m and n are drawn from two populations. As a result, N = m + n observations in total are obtained. The aim is to test the null hypothesis of identical distributions. The alternative hypothesis is that the populations are of the same form but with a different measure of central tendency. This article examines mid p-values from the null permutation distributions of tests based on the class of two-sample linear rank statistics. The results obtained indicate that normal approximation-based computations are very close to the permutation simulations, and they provide p-values that are close to the exact mid p-values for all practical purposes.  相似文献   

10.
In testing for noninferiority of two binomial distributions, the hypothesis formulation most commonly considered defines equivalence in terms of a constant bound to the difference of the two parameters. In order to avoid some basic logical difficulty entailed in this formulation we use an equivalence region whose boundary has fixed vertical distance from the diagonal for all values of the reference responder rate above some cutoff point and coincides left from this point with the line joining it with the origin. For the corresponding noninferiority hypothesis we derive and compare two different testing procedures. The first one is based on an objective Bayesian decision rule. The other one is obtained through combining the score tests for noninferiority with respect to the difference and the ratio of the two proportions, respectively, by means of the intersection–union principle. Both procedures are extensively studied by means of exact computational methods.  相似文献   

11.
Many nonparametric tests in one sample problem, matched pairs, and competingrisks under censoring have the same underlying permutation distribution. This article proposes a saddlepoint approximation to the exact p-values of these tests instead of the asymptotic approximations. The performance of the saddlepoint approximation is assessed by using simulation studies that show the superiority of the saddlepoint methods over the asymptotic approximations in several settings. The use of the saddlepoint to approximate the p-values of class of two sample tests under complete randomized design is also discussed.  相似文献   

12.
The author considers studies with multiple dependent primary endpoints. Testing hypotheses with multiple primary endpoints may require unmanageably large populations. Composite endpoints consisting of several binary events may be used to reduce a trial to a manageable size. The primary difficulties with composite endpoints are that different endpoints may have different clinical importance and that higher‐frequency variables may overwhelm effects of smaller, but equally important, primary outcomes. To compensate for these inconsistencies, we weight each type of event, and the total number of weighted events is counted. To reflect the mutual dependency of primary endpoints and to make the weighting method effective in small clinical trials, we use the Bayesian approach. We assume a multinomial distribution of multiple endpoints with Dirichlet priors and apply the Bayesian test of noninferiority to the calculation of weighting parameters. We use composite endpoints to test hypotheses of superiority in single‐arm and two‐arm clinical trials. The composite endpoints have a beta distribution. We illustrate this technique with an example. The results provide a statistical procedure for creating composite endpoints. Published 2013. This article is a U.S. Government work and is in the public domain in the USA.  相似文献   

13.
Summary. We describe a model-based approach to analyse space–time surveillance data on meningococcal disease. Such data typically comprise a number of time series of disease counts, each representing a specific geographical area. We propose a hierarchical formulation, where latent parameters capture temporal, seasonal and spatial trends in disease incidence. We then add—for each area—a hidden Markov model to describe potential additional (autoregressive) effects of the number of cases at the previous time point. Different specifications for the functional form of this autoregressive term are compared which involve the number of cases in the same or in neighbouring areas. The two states of the Markov chain can be interpreted as representing an 'endemic' and a 'hyperendemic' state. The methodology is applied to a data set of monthly counts of the incidence of meningococcal disease in the 94 départements of France from 1985 to 1997. Inference is carried out by using Markov chain Monte Carlo simulation techniques in a fully Bayesian framework. We emphasize that a central feature of our model is the possibility of calculating—for each region and each time point—the posterior probability of being in a hyperendemic state, adjusted for global spatial and temporal trends, which we believe is of particular public health interest.  相似文献   

14.
A large‐sample problem of illustrating noninferiority of an experimental treatment over a referent treatment for binary outcomes is considered. The methods of illustrating noninferiority involve constructing the lower two‐sided confidence bound for the difference between binomial proportions corresponding to the experimental and referent treatments and comparing it with the negative value of the noninferiority margin. The three considered methods, Anbar, Falk–Koch, and Reduced Falk–Koch, handle the comparison in an asymmetric way, that is, only the referent proportion out of the two, experimental and referent, is directly involved in the expression for the variance of the difference between two sample proportions. Five continuity corrections (including zero) are considered with respect to each approach. The key properties of the corresponding methods are evaluated via simulations. First, the uncorrected two‐sided confidence intervals can, potentially, have smaller coverage probability than the nominal level even for moderately large sample sizes, for example, 150 per group. Next, the 15 testing methods are discussed in terms of their Type I error rate and power. In the settings with a relatively small referent proportion (about 0.4 or smaller), the Anbar approach with Yates’ continuity correction is recommended for balanced designs and the Falk–Koch method with Yates’ correction is recommended for unbalanced designs. For relatively moderate (about 0.6) and large (about 0.8 or greater) referent proportion, the uncorrected Reduced Falk–Koch method is recommended, although in this case, all methods tend to be over‐conservative. These results are expected to be used in the design stage of a noninferiority study when asymmetric comparisons are envisioned. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
Two types of state-switching models for U.S. real output have been proposed: models that switch randomly between states and models that switch states deterministically, as in the threshold autoregressive model of Potter. These models have been justified primarily on how well they fit the sample data, yielding statistically significant estimates of the model coefficients. Here we propose a new approach to the evaluation of an estimated nonlinear time series model that provides a complement to existing methods based on in-sample fit or on out-of-sample forecasting. In this new approach, a battery of distinct nonlinearity tests is applied to the sample data, resulting in a set of p-values for rejecting the null hypothesis of a linear generating mechanism. This set of p-values is taken to be a “stylized fact” characterizing the nonlinear serial dependence in the generating mechanism of the time series. The effectiveness of an estimated nonlinear model for this time series is then evaluated in terms of the congruence between this stylized fact and a set of nonlinearity test results obtained from data simulated using the estimated model. In particular, we derive a portmanteau statistic based on this set of nonlinearity test p-values that allows us to test the proposition that a given model adequately captures the nonlinear serial dependence in the sample data. We apply the method to several estimated state-switching models of U.S. real output.  相似文献   

16.
In recent years, immunological science has evolved, and cancer vaccines are now approved and available for treating existing cancers. Because cancer vaccines require time to elicit an immune response, a delayed treatment effect is expected and is actually observed in drug approval studies. Accordingly, we propose the evaluation of survival endpoints by weighted log‐rank tests with the Fleming–Harrington class of weights. We consider group sequential monitoring, which allows early efficacy stopping, and determine a semiparametric information fraction for the Fleming–Harrington family of weights, which is necessary for the error spending function. Moreover, we give a flexible survival model in cancer vaccine studies that considers not only the delayed treatment effect but also the long‐term survivors. In a Monte Carlo simulation study, we illustrate that when the primary analysis is a weighted log‐rank test emphasizing the late differences, the proposed information fraction can be a useful alternative to the surrogate information fraction, which is proportional to the number of events. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

17.
Yu  Tingting  Wu  Lang  Gilbert  Peter 《Lifetime data analysis》2019,25(2):229-258

In HIV vaccine studies, longitudinal immune response biomarker data are often left-censored due to lower limits of quantification of the employed immunological assays. The censoring information is important for predicting HIV infection, the failure event of interest. We propose two approaches to addressing left censoring in longitudinal data: one that makes no distributional assumptions for the censored data—treating left censored values as a “point mass” subgroup—and the other makes a distributional assumption for a subset of the censored data but not for the remaining subset. We develop these two approaches to handling censoring for joint modelling of longitudinal and survival data via a Cox proportional hazards model fit by h-likelihood. We evaluate the new methods via simulation and analyze an HIV vaccine trial data set, finding that longitudinal characteristics of the immune response biomarkers are highly associated with the risk of HIV infection.

  相似文献   

18.
Clinical noninferiority trials with at least three groups have received much attention recently, perhaps due to the fact that regulatory agencies often require that a placebo group be evaluated along with a new experimental drug and an active control. The authors discuss likelihood ratio tests for binary endpoints and various noninferiority hypotheses. They find that, depending on the particular hypothesis, the test reduces asymptotically either to the intersection‐union test or to a test which follows asymptotically a mixture of generalized chi‐squared distributions. They investigate the performance of this asymptotic test and provide an exact modification. They show that this test considerably outperforms multiple testing methods such as the Bonferroni adjustment with respect to power. They illustrate their methods with a cancer study to compare antiemetic agents. Finally, they discuss the extension of the results to other settings, such as Gaussian endpoints.  相似文献   

19.
The authors use a Berry-Esseen type bound to identify the factors which influence the speed of convergence to the normal distribution of the indices of Gini, Piesch and Mehran. To empirically confirm the conclusions reached, a Monte Carlo experiment is performed for a log-logistic distribution.  相似文献   

20.
Often, the response variables on sampling units are observed repeatedly over time. The sampling units may come from different populations, such as treatment groups. This setting is routinely modeled by a random coefficients growth curve model, and the techniques of general linear mixed models are applied to address the primary research aim. An alternative approach is to reduce each subject’s data to summary measures, such as within-subject averages or regression coefficients. One may then test for equality of means of the summary measures (or functions of them) among treatment groups. Here, we compare by simulation the performance characteristics of three approximate tests based on summary measures and one based on the full data, focusing mainly on accuracy of p-values. We find that performances of these procedures can be quite different for small samples in several different configurations of parameter values. The summary-measures approach performed at least as well as the full-data mixed models approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号