首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We introduce a multi-step variance minimization algorithm for numerical estimation of Type I and Type II error probabilities in sequential tests. The algorithm can be applied to general test statistics and easily built into general design algorithms for sequential tests. Our simulation results indicate that the proposed algorithm is particularly useful for estimating tail probabilities, and may lead to significant computational efficiency gains over the crude Monte Carlo method.  相似文献   

2.
In this article, we consider the two-factor unbalanced nested design model without the assumption of equal error variance. For the problem of testing ‘main effects’ of both factors, we propose a parametric bootstrap (PB) approach and compare it with the existing generalized F (GF) test. The Type I error rates of the tests are evaluated using Monte Carlo simulation. Our studies show that the PB test performs better than the GF test. The PB test performs very satisfactorily even for small samples while the GF test exhibit poor Type I error properties when the number of factorial combinations or treatments goes up. It is also noted that the same tests can be used to test the significance of the random effect variance component in a two-factor mixed effects nested model under unequal error variances.  相似文献   

3.
In this article, we consider the three-factor unbalanced nested design model without the assumption of equal error variance. For the problem of testing “main effects” of the three factors, we propose a parametric bootstrap (PB) approach and compare it with the existing generalized F (GF) test. The Type I error rates of the tests are evaluated using Monte Carlo simulation. Our studies show that the PB test performs better than the generalized F-test. The PB test performs very satisfactorily even for small samples while the GF test exhibits poor Type I error properties when the number of factorial combinations or treatments goes up. It is also noted that the same tests can be used to test the significance of the random effect variance component in a three-factor mixed effects nested model under unequal error variances.  相似文献   

4.
In this article we consider the two-way ANOVA model without interaction under heteroscedasticity. For the problem of testing equal effects of factors, we propose a parametric bootstrap (PB) approach and compare it with existing the generalized F (GF) test. The Type I error rates and powers of the tests are evaluated using Monte Carlo simulation. Our studies show that the PB test performs better than the GF test. The PB test performs very satisfactorily even for small samples while the GF test exhibits poor Type I error properties when the number of factorial combinations or treatments goes up. It is also noted that the same tests can be used to test the significance of random effect variance component in a two-way mixed-effects model under unequal error variances.  相似文献   

5.
Three modified tests for homogeneity of the odds ratio for a series of 2 × 2 tables are studied when the data are clustered. In the case of clustered data, the standard tests for homogeneity of odds ratios ignore the variance inflation caused by positive correlation among responses of subjects within the same cluster, and therefore have inflated Type I error. The modified tests adjust for the variance inflation in the three existing standard tests: Breslow–Day, Tarone and the conditional score test. The degree of clustering effect is measured by the intracluster correlation coefficient, ρ. A variance correction factor derived from ρ is then applied to the variance estimator in the standard tests of homogeneity of the odds ratio. The proposed tests are an application of the variance adjustment method commonly used in correlated data analysis and are shown to maintain the nominal significance level in a simulation study. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

6.
Heterogeneity of variances of treatment groups influences the validity and power of significance tests of location in two distinct ways. First, if sample sizes are unequal, the Type I error rate and power are depressed if a larger variance is associated with a larger sample size, and elevated if a larger variance is associated with a smaller sample size. This well-established effect, which occurs in t and F tests, and to a lesser degree in nonparametric rank tests, results from unequal contributions of pooled estimates of error variance in the computation of test statistics. It is observed in samples from normal distributions, as well as non-normal distributions of various shapes. Second, transformation of scores from skewed distributions with unequal variances to ranks produces differences in the means of the ranks assigned to the respective groups, even if the means of the initial groups are equal, and a subsequent inflation of Type I error rates and power. This effect occurs for all sample sizes, equal and unequal. For the t test, the discrepancy diminishes, and for the Wilcoxon–Mann–Whitney test, it becomes larger, as sample size increases. The Welch separate-variance t test overcomes the first effect but not the second. Because of interaction of these separate effects, the validity and power of both parametric and nonparametric tests performed on samples of any size from unknown distributions with possibly unequal variances can be distorted in unpredictable ways.  相似文献   

7.
This paper develops a test for comparing treatment effects when observations are missing at random for repeated measures data on independent subjects. It is assumed that missingness at any occasion follows a Bernoulli distribution. It is shown that the distribution of the vector of linear rank statistics depends on the unknown parameters of the probability law that governs missingness, which is absent in the existing conditional methods employing rank statistics. This dependence is through the variance–covariance matrix of the vector of linear ranks. The test statistic is a quadratic form in the linear rank statistics when the variance–covariance matrix is estimated. The limiting distribution of the test statistic is derived under the null hypothesis. Several methods of estimating the unknown components of the variance–covariance matrix are considered. The estimate that produces stable empirical Type I error rate while maintaining the highest power among the competing tests is recommended for implementation in practice. Simulation studies are also presented to show the advantage of the proposed test over other rank-based tests that do not account for the randomness in the missing data pattern. Our method is shown to have the highest power while also maintaining near-nominal Type I error rates. Our results clearly illustrate that even for an ignorable missingness mechanism, the randomness in the pattern of missingness cannot be ignored. A real data example is presented to highlight the effectiveness of the proposed method.  相似文献   

8.
An internal pilot with interim analysis (IPIA) design combines interim power analysis (an internal pilot) with interim data analysis (two-stage group sequential). We provide IPIA methods for single df hypotheses within the Gaussian general linear model, including one and two group t tests. The design allows early stopping for efficacy and futility while also re-estimating sample size based on an interim variance estimate. Study planning in small samples requires the exact and computable forms reported here. The formulation gives fast and accurate calculations of power, Type I error rate, and expected sample size.  相似文献   

9.
For comparison of multiple outcomes commonly encountered in biomedical research, Huang et al. (2005) improved O'Brien's (1984) rank-sum tests through the replacement of the ad hoc variance by the asymptotic variance of the test statistics. The improved tests control the Type I error rate at the desired level and gain power when the differences between the two comparison groups in each outcome variable fall into the same direction. However, they may lose power when the differences are in different directions (e.g., some are positive and some are negative). These tests and the popular Bonferroni correction failed to show important significant difference when applied to compare heart rates from a clinical trial to evaluate the effect of a procedure to remove the cardioprotective solution HTK. We propose an alternative test statistic, taking the maximum of the individual rank-sum statistics, which controls the type I error and maintains satisfactory power regardless of the directions of the differences. Simulation studies show the proposed test to be of higher power than other tests in certain alternative parameter space of interest. Furthermore, when used to analyze the heart rates data the proposed test yields more satisfactory results.  相似文献   

10.
To improve the goodness of fit between a regression model and observations, the model can be complicated; however, that can reduce the statistical power when the complication does not lead significantly to an improved model. In the context of two-phase (segmented) logistic regressions, the model evaluation needs to include testing for simple (one-phase) versus two-phase logistic regression models. In this article, we propose and examine a class of likelihood ratio type tests for detecting a change in logistic regression parameters that splits the model into two-phases. We show that the proposed tests, based on Shiryayev–Roberts type statistics, are on average the most powerful. The article argues in favor of a new approach for fixing Type I errors of tests when the parameters of null hypotheses are unknown. Although the suggested approach is partly based on Bayes–Factor-type testing procedures, the classical significance levels of the proposed tests are under control. We demonstrate applications of the average most powerful tests to an epidemiologic study entitled “Time to pregnancy and multiple births.”  相似文献   

11.
The parametric bootstrap tests and the asymptotic or approximate tests for detecting difference of two Poisson means are compared. The test statistics used are the Wald statistics with and without log-transformation, the Cox F statistic and the likelihood ratio statistic. It is found that the type I error rate of an asymptotic/approximate test may deviate too much from the nominal significance level α under some situations. It is recommended that we should use the parametric bootstrap tests, under which the four test statistics are similarly powerful and their type I error rates are all close to α. We apply the tests to breast cancer data and injurious motor vehicle crash data.  相似文献   

12.
Some nonparametric methods have been proposed to compare survival medians. Most of them are based on the asymptotic null distribution to estimate the p-value. However, for small to moderate sample sizes, those tests may have inflated Type I error rate, which makes their application limited. In this article, we proposed a new nonparametric test that uses bootstrap to estimate the sample mean and variance of the median. Through comprehensive simulation, we show that the proposed approach can control Type I error rates well. A real data application is used to illustrate the use of the new test.  相似文献   

13.
Serial P-values     
When a collection of hypotheses is to be tested it is necessary to maintain a bound on the simultaneous Type I error rate. Serial P-values are used to define a serial test that does provide such a bound. Moreover, serial P-values are meaningful in the context of multiple tests, with or without the ‘rejection-confirmation’ decisions. The method is particularly suited to the analysis of unbalanced data, especially contingency tables.  相似文献   

14.
We review sequential designs, including group sequential and two-stage designs, for testing or estimating a single binary parameter. We use this simple case to introduce ideas common to many sequential designs, which in this case can be explained without explicitly using stochastic processes. We focus on methods provided by our newly developed R package, binseqtest, which exactly bound the Type I error rate of tests and exactly maintain proper coverage of confidence intervals. Within this framework, we review some allowable practical adaptations of the sequential design. We explore issues such as the following: How should the design be modified if no assessment was made at one of the planned sequential stopping times? How should the parameter be estimated if the study needs to be stopped early? What reasons for stopping early are allowed? How should inferences be made when the study is stopped for crossing the boundary, but later information is collected about responses of subjects that had enrolled before the decision to stop but had not responded by that time? Answers to these questions are demonstrated using basic methods that are available in our binseqtest R package. Supplementary materials for this article are available online.  相似文献   

15.
We evaluated the properties of six statistical methods for testing equality among populations with zero-inflated continuous distributions. These tests are based on likelihood ratio (LR), Wald, central limit theorem (CLT), modified CLT (MCLT), parametric jackknife (PJ), and nonparametric jackknife (NPJ) statistics. We investigated their statistical properties using simulated data from mixed distributions with an unknown portion of non zero observations that have an underlying gamma, exponential, or log-normal density function and the remaining portion that are excessive zeros. The 6 statistical tests are compared in terms of their empirical Type I errors and powers estimated through 10,000 repeated simulated samples for carefully selected configurations of parameters. The LR, Wald, and PJ tests are preferred tests since their empirical Type I errors were close to the preset nominal 0.05 level and each demonstrated good power for rejecting null hypotheses when the sample sizes are at least 125 in each group. The NPJ test had unacceptable empirical Type I errors because it rejected far too often while the CLT and MCLT tests had low testing powers in some cases. Therefore, these three tests are not recommended for general use but the LR, Wald, and PJ tests all performed well in large sample applications.  相似文献   

16.
In the present paper an estimator of the error variance for a three-way layout in random effects model incorporating two preliminary tests of significance has been proposed. It has been well recognized that estimation of parameters, of interest under asymmetric loss function (ASL) is generally better than that under squared error loss function (SELF), particularly where overestimation and underestimation are not equally penalised. As neither overestimation nor underestimation of error variance is desirable, with this motivation, the proposed estimator for the error variance has been studied under LINEX loss function. It is claimed that, with proper choice of degree of asymmetry and level of significance, proposed the sometimes pool estimator performs fairly better than unbiased estimator. Recommendations regarding its application have been attempted.  相似文献   

17.
In this paper, Anbar's (1983) approach for estimating a difference between two binomial proportions is discussed with respect to a hypothesis testing problem. Such an approach results in two possible testing strategies. While the results of the tests are expected to agree for a large sample size when two proportions are equal, the tests are shown to perform quite differently in terms of their probabilities of a Type I error for selected sample sizes. Moreover, the tests can lead to different conclusions, which is illustrated via a simple example; and the probability of such cases can be relatively large. In an attempt to improve the tests while preserving their relative simplicity feature, a modified test is proposed. The performance of this test and a conventional test based on normal approximation is assessed. It is shown that the modified Anbar's test better controls the probability of a Type I error for moderate sample sizes.  相似文献   

18.
Approximate t-tests of single degree of freedom hypotheses in generalized least squares analyses (GLS) of mixed linear models using restricted maximum likelihood (REML) estimates of variance components have been previously developed by Giesbrecht and Burns (GB), and by Jeske and Harville (JH), using method of moment approximations for the degrees of freedom (df) for the tstatistics. This paper proposes approximate Fstatistics for tests of multiple df hypotheses using one-moment and two-moment approximations which may be viewed as extensions of the GB and JH methods. The paper focuses specifically on tests of hypotheses concerning the main-plot treatment factor in split-plot experiments with missing data. Simulation results indicate usually satisfactory control of Type I error rates.  相似文献   

19.
The r largest order statistics approach is widely used in extreme value analysis because it may use more information from the data than just the block maxima. In practice, the choice of r is critical. If r is too large, bias can occur; if too small, the variance of the estimator can be high. The limiting distribution of the r largest order statistics, denoted by GEV\(_r\), extends that of the block maxima. Two specification tests are proposed to select r sequentially. The first is a score test for the GEV\(_r\) distribution. Due to the special characteristics of the GEV\(_r\) distribution, the classical chi-square asymptotics cannot be used. The simplest approach is to use the parametric bootstrap, which is straightforward to implement but computationally expensive. An alternative fast weighted bootstrap or multiplier procedure is developed for computational efficiency. The second test uses the difference in estimated entropy between the GEV\(_r\) and GEV\(_{r-1}\) models, applied to the r largest order statistics and the \(r-1\) largest order statistics, respectively. The asymptotic distribution of the difference statistic is derived. In a large scale simulation study, both tests held their size and had substantial power to detect various misspecification schemes. A new approach to address the issue of multiple, sequential hypotheses testing is adapted to this setting to control the false discovery rate or familywise error rate. The utility of the procedures is demonstrated with extreme sea level and precipitation data.  相似文献   

20.
Futility analysis reduces the opportunity to commit Type I error. For a superiority study testing a two‐sided hypothesis, an interim futility analysis can substantially reduce the overall Type I error while keeping the overall power relatively intact. In this paper, we quantify the extent of the reduction for both one‐sided and two‐sided futility analysis. We argue that, because of the reduction, we should be allowed to set the significance level for the final analysis at a level higher than the allowable Type I error rate for the study. We propose a method to find the significance level for the final analysis. We illustrate the proposed methodology and show that a design employing a futility analysis can reduce the sample size, and therefore reduce the exposure of patients to unnecessary risk and lower the cost of a clinical trial. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号