首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
ABSTRACT

This article argues that researchers do not need to completely abandon the p-value, the best-known significance index, but should instead stop using significance levels that do not depend on sample sizes. A testing procedure is developed using a mixture of frequentist and Bayesian tools, with a significance level that is a function of sample size, obtained from a generalized form of the Neyman–Pearson Lemma that minimizes a linear combination of α, the probability of rejecting a true null hypothesis, and β, the probability of failing to reject a false null, instead of fixing α and minimizing β. The resulting hypothesis tests do not violate the Likelihood Principle and do not require any constraints on the dimensionalities of the sample space and parameter space. The procedure includes an ordering of the entire sample space and uses predictive probability (density) functions, allowing for testing of both simple and compound hypotheses. Accessible examples are presented to highlight specific characteristics of the new tests.  相似文献   

2.
We consider the general one-sided hypotheses testing problem expressed as H0: θ1 ? h2) versus H1: θ1 < h2), where h( · ) is not necessary differentiable. The values of the right and the left differential coefficients, h?( · ) and h+( · ), at nondifferentiable points play an essential role in constructing the appropriate testing procedures with asymptotic size α on the basis of the likelihood ratio principle. The likelihood ratio testing procedure is related to an intersection–union testing procedure when h?2) ? h+2) for all θ2, and to a union–intersection testing procedure when there exists a θ2 such that h?2) < h+2).  相似文献   

3.
The practice for testing homogeneity of several rival models is of interest. In this article, we consider a non parametric multiple test for non nested distributions in the context of the model selection. Based on the linear sign rank test, and the known union–intersection principle, we let the magnitude of the data to give a better performance to the test statistic. We consider the sample and the non nested rival models as blocks and treatments, respectively, and introduce the extended Friedman test version to compare with the results of the test based on the linear sign rank test. A real dataset based on the waiting time to earthquake is considered to illustrate the results.  相似文献   

4.
The Kolassa method implemented in the nQuery Advisor software has been widely used for approximating the power of the Wilcoxon–Mann–Whitney (WMW) test for ordered categorical data, in which Edgeworth approximation is used to estimate the power of an unconditional test based on the WMW U statistic. When the sample size is small or when the sizes in the two groups are unequal, Kolassa’s method may yield quite poor approximation to the power of the conditional WMW test that is commonly implemented in statistical packages. Two modifications of Kolassa’s formula are proposed and assessed by simulation studies.  相似文献   

5.
ABSTRACT

There is no established procedure for testing for trend with nominal outcomes that would provide both a global hypothesis test and outcome-specific inference. We derive a simple formula for such a test using a weighted sum of Cochran–Armitage test statistics evaluating the trend in each outcome separately. The test is shown to be equivalent to the score test for multinomial logistic regression, however, the new formulation enables the derivation of a sample size formula and multiplicity-adjusted inference for individual outcomes. The proposed methods are implemented in the R package multiCA.  相似文献   

6.
In the last few years, two adaptive tests for paired data have been proposed. One test proposed by Freidlin et al. [On the use of the Shapiro–Wilk test in two-stage adaptive inference for paired data from moderate to very heavy tailed distributions, Biom. J. 45 (2003), pp. 887–900] is a two-stage procedure that uses a selection statistic to determine which of three rank scores to use in the computation of the test statistic. Another statistic, proposed by O'Gorman [Applied Adaptive Statistical Methods: Tests of Significance and Confidence Intervals, Society for Industrial and Applied Mathematics, Philadelphia, 2004], uses a weighted t-test with the weights determined by the data. These two methods, and an earlier rank-based adaptive test proposed by Randles and Hogg [Adaptive Distribution-free Tests, Commun. Stat. 2 (1973), pp. 337–356], are compared with the t-test and to Wilcoxon's signed-rank test. For sample sizes between 15 and 50, the results show that the adaptive test proposed by Freidlin et al. and the adaptive test proposed by O'Gorman have higher power than the other tests over a range of moderate to long-tailed symmetric distributions. The results also show that the test proposed by O'Gorman has greater power than the other tests for short-tailed distributions. For sample sizes greater than 50 and for small sample sizes the adaptive test proposed by O'Gorman has the highest power for most distributions.  相似文献   

7.
In the formula of the likelihood ratio test on fourfold tables with matched pairs of binary data, only the two parts b and c, which represent changes, are considered; the retained parts a and d, which represent concordant observations, are not included. To develop the test by considering all the four parts and the mixture distribution of likelihood ratio chi-squares, a formula based on the entire sample is proposed. The revised formula is the same as the unrevised one when a + d is zero. The revised test is more valid than the revised McNemar's test in most cases.  相似文献   

8.
In the paper, tests for multivariate normality (MVN) of Jarque-Bera type, based on skewness and kurtosis, have been considered. Tests proposed by Mardia and Srivastava, and the combined tests based on skewness and kurtosis defined by Jarque and Bera have been taken into account. In the Monte Carlo simulations, for each combination of p = 2, 3, 4, 5 number of traits and n = 10(5)50(10)100 sample sizes 10,000 runs have been done to calculate empirical Type I errors of tests under consideration, and empirical power against different alternative distributions. Simulation results have been compared to the Henze–Zirkler’s test. It should be stressed that no test yet proposed is uniformly better than all the others in every combination of conditions examined.  相似文献   

9.
In this paper, the maximum likelihood (ML) and Bayes, by using Markov chain Monte Carlo (MCMC), methods are considered to estimate the parameters of three-parameter modified Weibull distribution (MWD(β, τ, λ)) based on a right censored sample of generalized order statistics (gos). Simulation experiments are conducted to demonstrate the efficiency of the proposed methods. Some comparisons are carried out between the ML and Bayes methods by computing the mean squared errors (MSEs), Akaike's information criteria (AIC) and Bayesian information criteria (BIC) of the estimates to illustrate the paper. Three real data sets from Weibull(α, β) distribution are introduced and analyzed using the MWD(β, τ, λ) and also using the Weibull(α, β) distribution. A comparison is carried out between the mentioned models based on the corresponding Kolmogorov–Smirnov (KS) test statistic, {AIC and BIC} to emphasize that the MWD(β, τ, λ) fits the data better than the other distribution. All parameters are estimated based on type-II censored sample, censored upper record values and progressively type-II censored sample which are generated from the real data sets.  相似文献   

10.
One important property of any drug product is its stability over time. Drug stability studies are routinely carried out in the pharmaceutical industry in order to measure the degradation of an active pharmaceutical ingredient of a drug product. One important study objective is to estimate the shelf-life of the drug; the estimated shelf-life is required by the US Food and Drug Administration to be printed on the package label of the drug. This involves a suitable definition of the true shelf-life and the construction of an appropriate estimate of the true shelf-life. In this paper, the true shelf-life Tβ is defined as the time point at which 100β% of all the individual dosage units (e.g. tablets) of the drug have the active ingredient content no less than the lowest acceptable limit L, where β and L are prespecified constants. The value of Tβ depends on the parameters of the assumed degradation model of the active ingredient content and so is unknown. A lower confidence bound T?β for Tβ is then provided and used as the estimated shelf-life of the drug.  相似文献   

11.
The importance of the normal distribution for fitting continuous data is well known. However, in many practical situations data distribution departs from normality. For example, the sample skewness and the sample kurtosis are far away from 0 and 3, respectively, which are nice properties of normal distributions. So, it is important to have formal tests of normality against any alternative. D'Agostino et al. [A suggestion for using powerful and informative tests of normality, Am. Statist. 44 (1990), pp. 316–321] review four procedures Z 2(g 1), Z 2(g 2), D and K 2 for testing departure from normality. The first two of these procedures are tests of normality against departure due to skewness and kurtosis, respectively. The other two tests are omnibus tests. An alternative to the normal distribution is a class of skew-normal distributions (see [A. Azzalini, A class of distributions which includes the normal ones, Scand. J. Statist. 12 (1985), pp. 171–178]). In this paper, we obtain a score test (W) and a likelihood ratio test (LR) of goodness of fit of the normal regression model against the skew-normal family of regression models. It turns out that the score test is based on the sample skewness and is of very simple form. The performance of these six procedures, in terms of size and power, are compared using simulations. The level properties of the three statistics LR, W and Z 2(g 1) are similar and close to the nominal level for moderate to large sample sizes. Also, their power properties are similar for small departure from normality due to skewness (γ1≤0.4). Of these, the score test statistic has a very simple form and computationally much simpler than the other two statistics. The LR statistic, in general, has highest power, although it is computationally much complex as it requires estimates of the parameters under the normal model as well as those under the skew-normal model. So, the score test may be used to test for normality against small departure from normality due to skewness. Otherwise, the likelihood ratio statistic LR should be used as it detects general departure from normality (due to both skewness and kurtosis) with, in general, largest power.  相似文献   

12.
In this paper we examine the failure-censored sampling plans for the two–parameter exponential distri- bution based on m random samples, each of size n. The suggested procedure is based on exact results and only the first failure time of each sample is needed. The values of the acceptability constant are also tabulated for selected values of p α 1 p β 1, α and β. Further, a comparison of the proposed sampling plans with ordinary sampling plans using a sample of size mn is made. When compared to ordinary sampling plans, the proposed plan has an advantage in terms of shorter test-time and a saving of resources.  相似文献   

13.
The present paper has as its objective an accurate quantification of the robustness of the two–sample t-test over an extensive practical range of distributions. The method is that of a major Monte Carlo study over the Pearson system of distributions and the details indicate that the results are quite accurate. The study was conducted over the range β 1 =0.0(0.4)2.0 (negative and positive skewness) and β 2 =1.4 (0.4)7.8 with equal sample sizes and for both the one-and two-tail t-tests. The significance level and power levels (for nominal values of 0.05, 0.50, and 0.95, respectively) were evaluated for each underlying distribution and for each sample size, with each probability evaluated from 100,000 generated values of the test-statistic. The results precisely quantify the degree of robustness inherent in the two-sample t-test and indicate to a user the degree of confidence one can have in this procedure over various regions of the Pearson system. The results indicate that the equal-sample size two-sample t-test is quite robust with respect to departures from normality, perhaps even more so than most people realize.  相似文献   

14.
Two-treatment multicentre clinical trials are very common in practice. In cases where a non-parametric analysis is appropriate, a rank-sum test for grouped data called the van Elteren test can be applied. As an alternative approach, one may apply a combination test such as Fisher's combination test or the inverse normal combination test (also called Liptak's method) in order to combine centre-specific P-values. If there are no ties and no differences between centres with regard to the groups’ sample sizes, the inverse normal combination test using centre-specific Wilcoxon rank-sum tests is equivalent to the van Elteren test. In this paper, the van Elteren test is compared with Fisher's combination test based on Wilcoxon rank-sum tests. Data from two multicentre trials as well as simulated data indicate that Fisher's combination of P-values is more powerful than the van Elteren test in realistic scenarios, i.e. when there are large differences between the centres’ P-values, some quantitative interaction between treatment and centre, and/or heterogeneity in variability. The combination approach opens the possibility of using statistics other than the rank sum, and it is also a suitable method for more complicated designs, e.g. when covariates such as age or gender are included in the analysis.  相似文献   

15.
ABSTRACT

In this paper, the maximum value test is proposed and considered for two-sample problem solving with lifetime data. This test is a distribution-free test under non-censoring and is a not distribution-free test under censoring. The formula of the limit distribution of the proposed maximal value test is represented in the general case. The distribution of the test statistic has been studied experimentally. Also, we propose the estimate of a p-value calculation of the maximum value test instead of the Monte-Carlo simulation. This test is useful and applicable in case of choosing among the logrank test, the Cox–Mantel test, the Q test and Generalized Wilcoxon tests, for instance, the Gehan's Generalized Wilcoxon test and the Peto and Peto's Generalized Wilcoxon test.  相似文献   

16.
In a clinical trial comparing drug with placebo, where there are multiple primary endpoints, we consider testing problems where an efficacious drug effect can be claimed only if statistical significance is demonstrated at the nominal level for all endpoints. Under the assumption that the data are multivariate normal, the multiple endpoint-testing problem is formulated. The usual testing procedure involves testing each endpoint separately at the same significance level using two-sample t-tests, and claiming drug efficacy only if each t-statistic is significant. In this paper we investigate properties of this procedure. We show that it is identical to both an intersection union test and the likelihood ratio test. A simple expression for the p-value is given. The level and power function are studied; it is shown that the test may be conservative and that it is biased. Computable bounds for the power function are established.  相似文献   

17.
Abstract. It is well known that curved exponential families can have multimodal likelihoods. We investigate the relationship between flat or multimodal likelihoods and model lack of fit, the latter measured by the score (Rao) test statistic W U of the curved model as embedded in the corresponding full model. When data yield a locally flat or convex likelihood (root of multiplicity >1, terrace point, saddle point, local minimum), we provide a formula for W U in such points, or a lower bound for it. The formula is related to the statistical curvature of the model, and it depends on the amount of Fisher information. We use three models as examples, including the Behrens–Fisher model, to see how a flat likelihood, etc. by itself can indicate a bad fit of the model. The results are related (dual) to classical results by Efron from 1978.  相似文献   

18.
The quantile–quantile plot is widely used to check normality. The plot depends on the plotting positions. Many commonly used plotting positions do not depend on the sample values. We propose an adaptive plotting position that depends on the relative distances of the two neighbouring sample values. The correlation coefficient obtained from the adaptive plotting position is used to test normality. The test using the adaptive plotting position is better than the Shapiro–Wilk W test for small samples and has larger power than Hazen's and Blom's plotting positions for symmetric alternatives with shorter tail than normal and skewed alternatives when n is 20 or larger. The Brown–Hettmansperger T* test is designed for detecting bad tail behaviour, so it does not have power for symmetric alternatives with shorter tail than normal, but it is generally better than the other tests when β2 is greater than 3.25.  相似文献   

19.
In this paper, we introduce a new estimator of entropy of a continuous random variable. We compare the proposed estimator with the existing estimators, namely, Vasicek [A test for normality based on sample entropy, J. Roy. Statist. Soc. Ser. B 38 (1976), pp. 54–59], van Es [Estimating functionals related to a density by class of statistics based on spacings, Scand. J. Statist. 19 (1992), pp. 61–72], Correa [A new estimator of entropy, Commun. Statist. Theory and Methods 24 (1995), pp. 2439–2449] and Wieczorkowski-Grzegorewski [Entropy estimators improvements and comparisons, Commun. Statist. Simulation and Computation 28 (1999), pp. 541–567]. We next introduce a new test for normality. By simulation, the powers of the proposed test under various alternatives are compared with normality tests proposed by Vasicek (1976) and Esteban et al. [Monte Carlo comparison of four normality tests using different entropy estimates, Commun. Statist.–Simulation and Computation 30(4) (2001), pp. 761–785].  相似文献   

20.
For location–scale families, we consider a random distance between the sample order statistics and the quasi sample order statistics derived from the null distribution as a measure of discrepancy. The conditional qth quantile and expectation of the random discrepancy on the given sample are chosen as test statistics. Simulation results of powers against various alternatives are illustrated under the normal and exponential hypotheses for moderate sample size. The proposed tests, especially the qth quantile tests with a small or large q, are shown to be more powerful than other prominent goodness-of-fit tests in most cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号