首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Pearson’s chi-square (Pe), likelihood ratio (LR), and Fisher (Fi)–Freeman–Halton test statistics are commonly used to test the association of an unordered r×c contingency table. Asymptotically, these test statistics follow a chi-square distribution. For small sample cases, the asymptotic chi-square approximations are unreliable. Therefore, the exact p-value is frequently computed conditional on the row- and column-sums. One drawback of the exact p-value is that it is conservative. Different adjustments have been suggested, such as Lancaster’s mid-p version and randomized tests. In this paper, we have considered 3×2, 2×3, and 3×3 tables and compared the exact power and significance level of these test’s standard, mid-p, and randomized versions. The mid-p and randomized test versions have approximately the same power and higher power than that of the standard test versions. The mid-p type-I error probability seldom exceeds the nominal level. For a given set of parameters, the power of Pe, LR, and Fi differs approximately the same way for standard, mid-p, and randomized test versions. Although there is no general ranking of these tests, in some situations, especially when averaged over the parameter space, Pe and Fi have the same power and slightly higher power than LR. When the sample sizes (i.e., the row sums) are equal, the differences are small, otherwise the observed differences can be 10% or more. In some cases, perhaps characterized by poorly balanced designs, LR has the highest power.  相似文献   

2.
The Fisher exact test has been unjustly dismissed by some as ‘only conditional,’ whereas it is unconditionally the uniform most powerful test among all unbiased tests, tests of size α and with power greater than its nominal level of significance α. The problem with this truly optimal test is that it requires randomization at the critical value(s) to be of size α. Obviously, in practice, one does not want to conclude that ‘with probability x the we have a statistical significant result.’ Usually, the hypothesis is rejected only if the test statistic's outcome is more extreme than the critical value, reducing the actual size considerably.

The randomized unconditional Fisher exact is constructed (using Neyman–structure arguments) by deriving a conditional randomized test randomizing at critical values c(t) by probabilities γ(t), that both depend on the total number of successes T (the complete-sufficient statistic for the nuisance parameter—the common success probability) conditioned upon.

In this paper, the Fisher exact is approximated by deriving nonrandomized conditional tests with critical region including the critical value only if γ (t) > γ0, for a fixed threshold value γ0, such that the size of the unconditional modified test is for all value of the nuisance parameter—the common success probability—smaller, but as close as possible to α. It will be seen that this greatly improves the size of the test as compared with the conservative nonrandomized Fisher exact test.

Size, power, and p value comparison with the (virtual) randomized Fisher exact test, and the conservative nonrandomized Fisher exact, Pearson's chi-square test, with the more competitive mid-p value, the McDonald's modification, and Boschloo's modifications are performed under the assumption of two binomial samples.  相似文献   

3.
The mid-p-value is the standard p-value for a test minus half the difference between it and the nearest lower possible value. Its smaller size lends it an obvious appeal to users — it provides a more significant-looking summary of the evidence against the null hypothesis. This paper examines the possibility that the user might overstate the significance of the evidence by using the smaller mid-p in place of the standard p-value. Routine use of the mid-p is shown to control a quantity related to the Type I error rate. This related quantity is appropriate to consider when the decision to accept or reject the null hypothesis is not always firm. The natural, subjective interpretation of a p-value as the probability that the null hypothesis is true is also examined. The usual asymptotic correspondence between these two probabilities for one-sided hypotheses is shown to be strengthened when the standard p-value is replaced by the mid-p.  相似文献   

4.
The current status and panel count data frequently arise from cancer and tumorigenicity studies when events currently occur. A common and widely used class of two sample tests, for current status and panel count data, is the permutation class. We manipulate the double saddlepoint method to calculate the exact mid-p-values of the underlying permutation distributions of this class of tests. Permutation simulations are replaced by analytical saddlepoint computations which provide extremely accurate mid-p-values that are exact for most practical purposes and almost always more accurate than normal approximations. The method is illustrated using two real tumorigenicity panel count data. To compare the saddlepoint approximation with the normal asymptotic approximation, a simulation study is conducted. The speed and accuracy of the saddlepoint method facilitate the calculation of the confidence interval for the treatment effect. The inversion of the mid-p-values to calculate the confidence interval for the mean rate of development of the recurrent event is discussed.  相似文献   

5.
Nonparametric tests are proposed for the equality of two unknown p-variate distributions. Empirical probability measures are defined from samples from the two distributions and used to construct test statistics as the supremum of the absolute differences between empirical probabilities, the supremum being taken over all possible events. The test statistics are truly multivariate in not requiring the artificial ranking of multivariate observations, and they are distribution-free in the general p-variate case. Asymptotic null distributions are obtained. Powers of the proposed tests and a competitor are examined by Monte Carlo techniques.  相似文献   

6.
Exact unconditional tests for comparing two binomial probabilities are generally more powerful than conditional tests like Fisher's exact test. Their power can be further increased by the Berger and Boos confidence interval method, where a p-value is found by restricting the common binomial probability under H 0 to a 1?γ confidence interval. We studied the average test power for the exact unconditional z-pooled test for a wide range of cases with balanced and unbalanced sample sizes, and significance levels 0.05 and 0.01. The detailed results are available online on the web. Among the values 10?3, 10?4, …, 10?10, the value γ=10?4 gave the highest power, or close to the highest power, in all the cases we looked at, and can be given as a general recommendation as an optimal γ.  相似文献   

7.
This article considers the problem of choosing between two treatments that have binary outcomes with unknown success probabilities p1 and p2. The choice is based upon the information provided by two observations X1B(n1, p1) and X2B(n2, p2) from independent binomial distributions. Standard approaches to this problem utilize basic statistical inference methodologies such as hypothesis tests and confidence intervals for the difference p1 ? p2 of the success probabilities. However, in this article the analysis of win-probabilities is considered. If X*1 represents a potential future observation from Treatment 1 while X*2 represents a potential future observation from Treatment 2, win-probabilities are defined in terms of the comparisons of X*1 and X*2. These win-probabilities provide a direct assessment of the relative advantages and disadvantages of choosing either treatment for one future application, and their interpretation can be combined with other factors such as costs, side-effects, and the availabilities of the two treatments. In this article, it is shown how confidence intervals for the win-probabilities can be constructed, and examples of their use are provided. Computer code for the implementation of this new methodology is available from the authors.  相似文献   

8.
Despite the simplicity of the Bernoulli process, developing good confidence interval procedures for its parameter—the probability of success p—is deceptively difficult. The binary data yield a discrete number of successes from a discrete number of trials, n. This discreteness results in actual coverage probabilities that oscillate with the n for fixed values of p (and with p for fixed n). Moreover, this oscillation necessitates a large sample size to guarantee a good coverage probability when p is close to 0 or 1.

It is well known that the Wilson procedure is superior to many existing procedures because it is less sensitive to p than any other procedures, therefore it is less costly. The procedures proposed in this article work as well as the Wilson procedure when 0.1 ≤p ≤ 0.9, and are even less sensitive (i.e., more robust) than the Wilson procedure when p is close to 0 or 1. Specifically, when the nominal coverage probability is 0.95, the Wilson procedure requires a sample size 1, 021 to guarantee that the coverage probabilities stay above 0.92 for any 0.001 ≤ min {p, 1 ?p} <0.01. By contrast, our procedures guarantee the same coverage probabilities but only need a sample size 177 without increasing either the expected interval width or the standard deviation of the interval width.  相似文献   

9.
Taku Moriyama 《Statistics》2018,52(5):1096-1115
We discuss smoothed rank statistics for testing the location shift parameter of the two-sample problem. They are based on discrete test statistics – the median and Wilcoxon's rank sum tests. For the one-sample problem, Maesono et al. [Smoothed nonparametric tests and their properties. arXiv preprint. 2016; ArXiv:1610.02145] reported that some nonparametric discrete tests have a problem with their p-values because of their discreteness. The p-values of Wilcoxon's test are frequently smaller than those of the median test in the tail area. This leads to an arbitrary choice of the median and Wilcoxon's rank sum tests. To overcome this problem, we propose smoothed versions of those tests. The smoothed tests inherit the good properties of the original tests and are asymptotically equivalent to them. We study the significance probabilities and local asymptotic powers of the proposed tests.  相似文献   

10.
The proportional odds model (POM) is commonly used in regression analysis to predict the outcome for an ordinal response variable. The maximum likelihood estimation (MLE) approach is typically used to obtain the parameter estimates. The likelihood estimates do not exist when the number of parameters, p, is greater than the number of observations n. The MLE also does not exist if there are no overlapping observations in the data. In a situation where the number of parameters is less than the sample size but p is approaching to n, the likelihood estimates may not exist, and if they exist they may have quite large standard errors. An estimation method is proposed to address the last two issues, i.e. complete separation and the case when p approaches n, but not the case when p>n. The proposed method does not use any penalty term but uses pseudo-observations to regularize the observed responses by downgrading their effect so that they become close to the underlying probabilities. The estimates can be computed easily with all commonly used statistical packages supporting the fitting of POMs with weights. Estimates are compared with MLE in a simulation study and an application to the real data.  相似文献   

11.
A simulation study was done to compare seven confidence interval methods, based on the normal approximation, for the difference of two binomial probabilities. Cases considered included minimum expected cell sizes ranging from 2 to 15 and smallest group sizes (NMIN) ranging from 6 to 100. Our recommendation is to use a continuity correction of 1/(2 NMIN) combined with the use of (N ? 1) rather than N in the estimate of the standard error. For all of the cases considered with minimum expected cell size of at least 3, this method gave coverage probabilities close to or greater than the nominal 90% and 95%. The Yates method is also acceptable, but it is slightly more conservative. At the other extreme, the usual method (with no continuity correction) does not provide adequate coverage even at the larger sample sizes. For the 99% intervals, our recommended method and the Yates correction performed equally well and are reasonable for minimum expected cell sizes of at least 5. None of the methods performed consistently well for a minimum expected cell size of 2.  相似文献   

12.
ABSTRACT

This article examines the evidence contained in t statistics that are marginally significant in 5% tests. The bases for evaluating evidence are likelihood ratios and integrated likelihood ratios, computed under a variety of assumptions regarding the alternative hypotheses in null hypothesis significance tests. Likelihood ratios and integrated likelihood ratios provide a useful measure of the evidence in favor of competing hypotheses because they can be interpreted as representing the ratio of the probabilities that each hypothesis assigns to observed data. When they are either very large or very small, they suggest that one hypothesis is much better than the other in predicting observed data. If they are close to 1.0, then both hypotheses provide approximately equally valid explanations for observed data. I find that p-values that are close to 0.05 (i.e., that are “marginally significant”) correspond to integrated likelihood ratios that are bounded by approximately 7 in two-sided tests, and by approximately 4 in one-sided tests.

The modest magnitude of integrated likelihood ratios corresponding to p-values close to 0.05 clearly suggests that higher standards of evidence are needed to support claims of novel discoveries and new effects.  相似文献   

13.
Abstract

Numerous methods—based on exact and asymptotic distributions—can be used to obtain confidence intervals for the odds ratio in 2 × 2 tables. We examine ten methods for generating these intervals based on coverage probability, closeness of coverage probability to target, and length of confidence intervals. Based on these criteria, Cornfield’s method, without the continuity correction, performed the best of the methods examined here. A drawback to use of this method is the significant possibility that the attained coverage probability will not meet the nominal confidence level. Use of a mid-P value greatly improves methods based on the “exact” distribution. When combined with the Wilson rule for selection of a rejection set, the resulting method is a procedure that performed very well. Crow’s method, with use of a mid-P, performed well, although it was only a slight improvement over the Wilson mid-P method. Its cumbersome calculations preclude its general acceptance. Woolf's (logit) method—with the Haldane–Anscombe correction— performed well, especially with regard to length of confidence intervals, and is recommended based on ease of computation.  相似文献   

14.
When the outcome of a screening test is expressed by the probabilities of k possible outcomes among individuals with a certain physiologic condition and by the corresponding probabilities among individuals without the condition, the screening usefulness of the test depends on the relative likelihood that its result may properly alter the management of a given patient. New statistical methods are introduced to apply the screening usefulness concept to the assessment of combined or multivalued tests. The method is applied to assess the usefulness of genotypes at cytochrome P450 IAI and glutathione-S-transferase-μ as biomarkers of susceptibility to developing lung cancer. The argument and methods developed should be widely applicable to the statistical assessment of screening tests for a wide range of physiologic conditions.  相似文献   

15.
Abstract

The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent misinterpretation of P-values as error probabilities. It then discusses several properties of P-values that have been presented as fatal flaws: That P-values exhibit extreme variation across samples (and thus are “unreliable”), confound effect size with sample size, are sensitive to sample size, and depend on investigator sampling intentions. These properties are often criticized from a likelihood or Bayesian framework, yet they are exactly the properties P-values should exhibit when they are constructed and interpreted correctly within their originating framework. Other common criticisms are that P-values force users to focus on irrelevant hypotheses and overstate evidence against those hypotheses. These problems are not however properties of P-values but are faults of researchers who focus on null hypotheses and overstate evidence based on misperceptions that p?=?0.05 represents enough evidence to reject hypotheses. Those problems are easily seen without use of Bayesian concepts by translating the observed P-value p into the Shannon information (S-value or surprisal) –log2(p).  相似文献   

16.
The efficiency of a sequential test is related to the “importance” of the trials within the test. This relationship is used to find the optimal test for selecting the greater of two binomial probabilities, pα and pb, namely, the stopping rule is “gambler's ruin” and the optimal discipline when pα+pb 1 (≥ 1) is play-the-winner (loser), i.e. an α-trial which results in a success is followed by an α-trial (b-trial) whereas an α-trial which results in a failure is followed by α b-trid (α-trial) and correspondingly for b-trials.  相似文献   

17.
In this article, we consider the problem of testing the mean vector in the multivariate normal distribution, where the dimension p is greater than the sample size N. We propose a new test TBlock and obtain its asymptotic distribution. We also compare the proposed test with other two tests. The simulation results suggest that the performance of the new test is comparable to the existing two tests, and under some circumstances it may have higher power. Therefore, the new statistic can be employed in practice as an alternative choice.  相似文献   

18.
Combining p-values from statistical tests across different studies is the most commonly used approach in meta-analysis for evolutionary biology. The most commonly used p-value combination methods mainly incorporate the z-transform tests (e.g., the un-weighted z-test and the weighted z-test) and the gamma-transform tests (e.g., the CZ method [Z. Chen, W. Yang, Q. Liu, J.Y. Yang, J. Li, and M.Q. Yang, A new statistical approach to combining p-values using gamma distribution and its application to genomewide association study, Bioinformatics 15 (2014), p. S3]). However, among these existing p-value combination methods, no method is uniformly most powerful in all situations [Chen et al. 2014]. In this paper, we propose a meta-analysis method based on the gamma distribution, MAGD, by pooling the p-values from independent studies. The newly proposed test, MAGD, allows for flexible accommodating of the different levels of heterogeneity of effect sizes across individual studies. The MAGD simultaneously retains all the characters of the z-transform tests and the gamma-transform tests. We also propose an easy-to-implement resampling approach for estimating the empirical p-values of MAGD for the finite sample size. Simulation studies and two data applications show that the proposed method MAGD is essentially as powerful as the z-transform tests (the gamma-transform tests) under the circumstance with the homogeneous (heterogeneous) effect sizes across studies.  相似文献   

19.
The lognormal and Weibull distributions are the most popular distributions for modeling lifetime data. In practical applications, they usually fit the data at hand well. However, their predictions may lead to large differences. The main purpose of the present article is to investigate the impacts of mis-specification between the lognormal and Weibull distributions on the interval estimation of a pth quantile of the distributions for complete data. The coverage probabilities of the confidence intervals (CIs) with mis-specification are evaluated. The results indicate that for both the lognormal and the Weibull distributions, the coverage probabilities are significantly influenced by mis-specification, especially for a small or a large p on lower or upper tail of the distributions. In addition, based on the coverage probabilities with correct and mis-specification, a maxmin criterion is proposed to make a choice between these two distributions. The numerical results indicate that for p ≤ 0.05 and 0.6 ≤ p ≤ 0.8, Weibull distribution is suggested to evaluate CIs of a pth quantile of the distributions, while, for 0.2 ≤ p ≤ 0.5 and p = 0.99, lognormal distribution is suggested to evaluate CIs of a pth quantile of the distributions. Besides, for p = 0.9 and 0.95, lognormal distribution is suggested if the sample size is large enough, while, for p = 0.1, Weibull distribution is suggested if the sample size is large enough. Finally, a simulation study is conducted to evaluate the efficiency of the proposed method.  相似文献   

20.
Estimation of the Pareto tail index from extreme order statistics is an important problem in many settings. The upper tail of the distribution, where data are sparse, is typically fitted with a model, such as the Pareto model, from which quantities such as probabilities associated with extreme events are deduced. The success of this procedure relies heavily not only on the choice of the estimator for the Pareto tail index but also on the procedure used to determine the number k of extreme order statistics that are used for the estimation. The authors develop a robust prediction error criterion for choosing k and estimating the Pareto index. A Monte Carlo study shows the good performance of the new estimator and the analysis of real data sets illustrates that a robust procedure for selection, and not just for estimation, is needed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号