首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

The Kruskal–Wallis test is a popular nonparametric test for comparing k independent samples. In this article we propose a new algorithm to compute the exact null distribution of the Kruskal–Wallis test. Generating the exact null distribution of the Kruskal–Wallis test is needed to compare several approximation methods. The 5% cut-off points of the exact null distribution which StatXact cannot produce are obtained as by-products. We also investigate graphically a reason that the exact and approximate distributions differ, and hope that it will be a useful tutorial tool to teach about the Kruskal–Wallis test in undergraduate course.  相似文献   

2.
We explore criteria that data must meet in order for the Kruskal–Wallis test to reject the null hypothesis by computing the number of unique ranked datasets in the balanced case where each of the m alternatives has n observations. We show that the Kruskal–Wallis test tends to be conservative in rejecting the null hypothesis, and we offer a correction that improves its performance. We then compute the number of possible datasets producing unique rank-sums. The most commonly occurring data lead to an uncommonly small set of possible rank-sums. We extend prior findings about row- and column-ordered data structures.  相似文献   

3.
New statistical procedures are introduced to analyse typical microRNA expression data sets. For each separate microRNA expression, the null hypothesis to be tested is that there is no difference between the distributions of the expression in different groups. The test statistics are then constructed having certain type of alternatives in mind. To avoid strong (parametric) distributional assumptions, the alternatives are formulated using probabilities of different orders of pairs or triples of observations coming from different groups, and the test statistics are then constructed using corresponding several‐sample U‐statistics, natural estimates of these probabilities. Classical several‐sample rank test statistics, such as the Kruskal–Wallis and Jonckheere–Terpstra tests, are special cases in our approach. Also, as the number of variables (microRNAs) is huge, we confront a serious simultaneous testing problem. Different approaches to control the family‐wise error rate or the false discovery rate are shortly discussed, and it is shown how the Chen–Stein theorem can be used to show that family‐wise error rate can be controlled for cluster‐dependent microRNAs under weak assumptions. The theory is illustrated with an analysis of real data, a microRNA expression data set on Finnish (aggressive and non‐aggressive) prostate cancer patients and their controls.  相似文献   

4.
Lachenbruch ( 1976 , 2001 ) introduced two‐part tests for comparison of two means in zero‐inflated continuous data. We are extending this approach and compare k independent distributions (by comparing their means, either overall or the departure from equal proportion of zeros and equal means of nonzero values) by introducing two tests: a two‐part Wald test and a two‐part likelihood ratio test. If the continuous part of the distributions is lognormal then the proposed two test statistics have asymptotically chi‐square distribution with $2(k-1)$ degrees of freedom. A simulation study was conducted to compare the performance of the proposed tests with several well‐known tests such as ANOVA, Welch ( 1951 ), Brown & Forsythe ( 1974 ), Kruskal–Wallis, and one‐part Wald test proposed by Tu & Zhou ( 1999 ). Results indicate that the proposed tests keep the nominal type I error and have consistently best power among all tests being compared. An application to rainfall data is provided as an example. The Canadian Journal of Statistics 39: 690–702; 2011. © 2011 Statistical Society of Canada  相似文献   

5.
The purpose of this note is to criticize Nguyen (1985) for his account of the literature on the generalization of Fisher's exact test and to point out parallels with existing algorithms of the algorithm proposed by Nguyen. Subsequently we will briefly raise some questions on the methodology proposed by Nguyen.

Nguyen (1985) suggests that all literature on exact testing prior to Nguyen & Sampson (1985) is based on the “more probable” relation or Exact Probability Test (EPT) as a test statistic. This is not correct. Yates (1934 - Pearson's X2), Lewontin & Felsenstein (1965 - X2), Agresti & Wackerly (1977 - X2, Kendall's tau, Kruskal & Goodman's gamma), Klotz (1966 - Wilcoxon), Klotz & Teng (1977 - Kruskall & Wallis' H), Larntz (1978 - X2, loglike-lihood-ratio statistic G2, Freeman & Tukey statistic), and several others have investigated exact tests with other statistics than the EPT. In fact, Bennett & Nakamura (1963) are incorrectly cited as they investigated both X2 and G2, rather than EPT. Also, Freeman & Halton (1951) are incorrectly cited for they generalized Fisher's exact test to pxq tables and not 2xq tables as stated. And they are even predated by Yates (1934) who extended the test to 2×3 tables.  相似文献   

6.

We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene–gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene–gene interactions extracted from the online search tool “Zodiac.”

  相似文献   

7.
We use the empirical likelihood ratio approach introduced by Owen (Biometrika 75 (1988), 237–249) to test for or against a set of inequality constraints when the parameters are defined by estimating functions. Our objective in this paper is to show that under fairly general conditions, the limiting distributions of the empirical likelihood ratio test statistics are of chi-bar square type (as in the parametric case) and give the expression of the weighting values. The results obtained here are similar to those in El Barmi and Dykstra (1995) where a full distributional model is assumed. This work presents also an extension of the results in Qin and Lawless (1995).  相似文献   

8.
A novel distribution-free k-sample test of differences in location shifts based on the analysis of kernel density functional estimation is introduced and studied. The proposed test parallels one-way analysis of variance and the Kruskal–Wallis (KW) test aiming at testing locations of unknown distributions. In contrast to the rank (score)-transformed non-parametric approach, such as the KW test, the proposed F-test uses the measurement responses along with well-known kernel density estimation (KDE) to estimate the locations and construct the test statistic. A practical optimal bandwidth selection procedure is also provided. Our simulation studies and real data example indicate that the proposed analysis of kernel density functional estimate (ANDFE) test is superior to existing competitors for fat-tailed or heavy-tailed distributions when the k groups differ mainly in location rather than shape, especially with unbalanced data. ANDFE is also highly recommended when it is unclear whether test groups differ mainly in shape or location. The Canadian Journal of Statistics 48: 167–186; 2020 © 2019 Statistical Society of Canada  相似文献   

9.
Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the non-parametric multivariate Kruskal–Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete case analyses.  相似文献   

10.
Liu and Singh (1993, 2006) introduced a depth‐based d‐variate extension of the nonparametric two sample scale test of Siegel and Tukey (1960). Liu and Singh (2006) generalized this depth‐based test for scale homogeneity of k ≥ 2 multivariate populations. Motivated by the work of Gastwirth (1965), we propose k sample percentile modifications of Liu and Singh's proposals. The test statistic is shown to be asymptotically normal when k = 2, and compares favorably with Liu and Singh (2006) if the underlying distributions are either symmetric with light tails or asymmetric. In the case of skewed distributions considered in this paper the power of the proposed tests can attain twice the power of the Liu‐Singh test for d ≥ 1. Finally, in the k‐sample case, it is shown that the asymptotic distribution of the proposed percentile modified Kruskal‐Wallis type test is χ2 with k ? 1 degrees of freedom. Power properties of this k‐sample test are similar to those for the proposed two sample one. The Canadian Journal of Statistics 39: 356–369; 2011 © 2011 Statistical Society of Canada  相似文献   

11.
We consider the classic problem of interval estimation of a proportion p based on binomial sampling. The ‘exact’ Clopper–Pearson confidence interval for p is known to be unnecessarily conservative. We propose coverage adjustments of the Clopper–Pearson interval that incorporate prior or posterior beliefs into the interval. Using heatmap‐type plots for comparing confidence intervals, we show that the coverage‐adjusted intervals have satisfying coverage and shorter expected lengths than competing intervals found in the literature.  相似文献   

12.
This paper considers the problem of testing the randomness of Gaussian and non–Gaussian time series. A general class of parametric portmanteau statistics, which include the Box–Pierce and the Ljung–Box statistics, is introduced. Using the exact first and second moments of the sample autocorrelations when the observations are i.i.d. normal with unknown mean, the exact expected value of any portmanteau statistics is obtained for this case. Two new portmanteau statistics, which exploit the exact moments of the sample autocorrelations, are studied. For the nonparametric case, a rank portmanteau statistic is introduced. The latter has the same distribution for any series of exchangeable random variables and uses the exact moments of the rank autocorrelations. We show that its asymptotic distribution is chi–squate. Simulation results indicate that the new portmanteau statistics are better approximated by the chi–square asymptotic distribution than the Ljung–Box statistics. Several analytical results presented in the paper were derived by usig a symbolic manipulation program.  相似文献   

13.
In this paper, we construct a new ranked set sampling protocol that maximizes the Pitman asymptotic efficiency of the signed rank test. The new sampling design is a function of the set size and independent order statistics. If the set size is odd and the underlying distribution is symmetric and unimodal, then the new sampling protocol quantifies only the middle observation. On the other hand, if the set size is even, the new sampling design quantifies the two middle observations. This data collection procedure for use in the signed rank test outperforms the data collection procedure in the standard ranked set sample. We show that the exact null distribution of the signed rank statistic WRSS+ based on a data set generated by the new ranked set sample design for odd set sizes is the same as the null distribution of the simple random sample signed rank statistic WSRS+ based on the same number of measured observations. For even set sizes, the exact null distribution of WRSS+ is simulated.  相似文献   

14.
Without the exchangeability assumption, permutation tests for comparing two population means do not provide exact control of the probability of making a Type I error. Another drawback of permutation tests is that it cannot be used to test hypothesis about one population. In this paper, we propose a new type of permutation tests for testing the difference between two population means: the split sample permutation t-tests. We show that the split sample permutation t-tests do not require the exchangeability assumption, are asymptotically exact and can be easily extended to testing hypothesis about one population. Extensive simulations were carried out to evaluate the performance of two specific split sample permutation t-tests: the split in the middle permutation t-test and the split in the end permutation t-test. The simulation results show that the split in the middle permutation t-test has comparable performance to the permutation test if the population distributions are symmetric and satisfy the exchangeability assumption. Otherwise, the split in the end permutation t-test has significantly more accurate control of level of significance than the split in the middle permutation t-test and other existing permutation tests.  相似文献   

15.
We develop and show applications of two new test statistics for deciding if one ARIMA model provides significantly better h-step-ahead forecasts than another, as measured by the difference of approximations to their asymptotic mean square forecast errors. The two statistics differ in the variance estimates used for normalization. Both variance estimates are consistent even when the models considered are incorrect. Our main variance estimate is further distinguished by accounting for parameter estimation, while the simpler variance estimate treats parameters as fixed. Their broad consistency properties offer improvements to what are known as tests of Diebold and Mariano (1995) type, which are tests that treat parameters as fixed and use variance estimates that are generally not consistent in our context. We show how these statistics can be calculated for any pair of ARIMA models with the same differencing operator.  相似文献   

16.
The classical adjustments for the inadequacy of the asymptotic distribution of Pearson's X2 statistic, when some cells are sparse or the cell expectations are small, use continuity corrections and exact moments; the recent approach is to use computer based ‘exact inference’. In this paper we observe that the original exact test due to Freeman and Halton (Biometrika 38 (1951), 141–149) and its computer implementation are theoretically unsound. Furthermore, the corrected algorithmic version for the exact p-value in StatXact is practically useful in very few cases, and the results of its present version which includes Monte Carlo estimates can be highly variable. We then derive asymptotic expansions for the moments of the null distribution of Pearson's X2, introduce a new method of correcting for discreteness and finite range of Pearson's X2 as an alternative to the classical continuity correction, and use them to construct new and improved approximations for the null distribution. We also offer diagnostic criteria applicable to the tables for selecting an appropriate approximation. The exact methods and the competing approximations are studied and compared using thirteen test cases from the literature. It is concluded that the accuracy of the appropriate approximation is comparable with the truly exact method whenever it is available. The use of approximations is therefore preferable if the truly exact computer intensive solutions are unavailable or infeasible.  相似文献   

17.
A new characterization of the Pareto distribution is proposed, and new goodness-of-fit tests based on it are constructed. Test statistics are functionals of U-empirical processes. The first of these statistics is of integral type, it is similar to the classical statistics \(\omega _n^1\). The second one is a Kolmogorov type statistic. We show that the kernels of our statistics are non-degenerate. The limiting distribution and large deviations asymptotics of the new statistics under null hypothesis are described. Their local Bahadur efficiency for parametric alternatives is calculated. This type of efficiency is mostly appropriate for the solution of our problem since the Kolmogorov type statistic is not asymptotically normal, and the Pitman approach is not applicable to this statistic. For the second statistic we evaluate the critical values by using Monte-Carlo methods. Also conditions of local optimality of new statistics in the sense of Bahadur are discussed and examples of such special alternatives are given. For small sample size we compare the power of those tests with some common goodness-of-fit tests.  相似文献   

18.
This study considers the exact hypothesis test for the shape parameter of a new two-parameter distribution with the shape of a bathtub or increasing failure rate function under type II progressive censoring with random removals, where the number of units removed at each failure time follows a binomial or a uniform distribution. Several test statistics are proposed and one numerical example is provided to illustrate the proposed hypothesis test for the shape parameter. Finally, a simulation study is performed to compare the power performances of all proposed test statistics. We concluded that the test statistic w 1 is more attractive than other methods as it has better performance than other test statistics for most cases based on the criteria of maximum power.  相似文献   

19.
Multinomial goodness-of-fit tests arise in a diversity of milieu. The long history of the problem has spawned a multitude of asymptotic tests. If the sample size relative to the number of categories is small, the accuracy of these tests is compromised. In that case, an exact test is a prudent option. But such tests are computationally intensive and need efficient algorithms. This paper gives a conceptual overview, and empirical comparisons of two avenues, namely the network and fast Fourier transform (FFT) algorithms, for an exact goodness-of-fit test on a multinomial. We show that a recursive execution of a polynomial product forms the basis of both these approaches. Specific details to implement the network method, and techniques to enhance the efficiency of the FFT algorithm are given. Our empirical comparisons show that for exact analysis with the chi-square and likelihood ratio statistics, the network-cum-polynomial multiplication algorithm is the more efficient and accurate of the two.  相似文献   

20.
We present a new algorithm for computing the exact null distribution of the Spearman rank correlation statistic ρ, which also works in the case of ties. The algorithm is based on symmetries in the representation of the probability generating function as a permanent with monomial entries. We present new critical values for sample sizes 19⩽n⩽22. Finally, we show how to derive the exact null distribution of Page's L statistic from the null distribution of ρ.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号