期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A method of choosing multiway partitions for classification and decision trees

David Biggs Barry De Ville Ed Suen 《Journal of applied statistics》1991,18(1):49-62

A method is given of choosing k-way partitions (where 2 ≤ k ≤ (number of categories of predictor variable)) in classification or decision tree analyses. The method, like that proposed by Kass, chooses the best partition on the basis of statistical significanceand uses the Bonferroni inequality to calculate the significance. Unlike Kass's algorithm, the algorithm does not favour simple partitions (low values of k) nor does it discriminate against free-type (no restriction on order of values) predictor variables with many categories. A method of adjusting the significance for the number of predictor variables and of using multiple comparisons to put an upper bound on the significance is given. Monte Carlo tests show that the algorithm gives slightly conservative tests of significance for both small and large samples and does not favour one type of predictor variable over another. The algorithm is incoroporated in a PC software program, Knowledgeseeker,which is briefly described. 相似文献

2.

Variance of the number of false discoveries

Art B. Owen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(3):411-426

Summary. In high throughput genomic work, a very large number d of hypotheses are tested based on n ≪ d data samples. The large number of tests necessitates an adjustment for false discoveries in which a true null hypothesis was rejected. The expected number of false discoveries is easy to obtain. Dependences between the hypothesis tests greatly affect the variance of the number of false discoveries. Assuming that the tests are independent gives an inadequate variance formula. The paper presents a variance formula that takes account of the correlations between test statistics. That formula involves O ( d ²) correlations, and so a naïve implementation has cost O ( nd ²). A method based on sampling pairs of tests allows the variance to be approximated at a cost that is independent of d . 相似文献

3.

Omnibus testing and gene filtration in microarray data analysis

Hongying Dai Richard Charnigo 《Journal of applied statistics》2008,35(1):31-47

When thousands of tests are performed simultaneously to detect differentially expressed genes in microarray analysis, the number of Type I errors can be immense if a multiplicity adjustment is not made. However, due to the large scale, traditional adjustment methods require very stringen significance levels for individual tests, which yield low power for detecting alterations. In this work, we describe how two omnibus tests can be used in conjunction with a gene filtration process to circumvent difficulties due to the large scale of testing. These two omnibus tests, the D-test and the modified likelihood ratio test (MLRT), can be used to investigate whether a collection of P-values has arisen from the Uniform(0,1) distribution or whether the Uniform(0,1) distribution contaminated by another Beta distribution is more appropriate. In the former case, attention can be directed to a smaller part of the genome; in the latter event, parameter estimates for the contamination model provide a frame of reference for multiple comparisons. Unlike the likelihood ratio test (LRT), both the D-test and MLRT enjoy simple limiting distributions under the null hypothesis of no contamination, so critical values can be obtained from standard tables. Simulation studies demonstrate that the D-test and MLRT are superior to the AIC, BIC, and Kolmogorov-Smirnov test. A case study illustrates omnibus testing and filtration. 相似文献

4.

An empirical power comparison of univariate goodness-of-fit tests for normality

《Journal of Statistical Computation and Simulation》2012,82(5):545-591

相似文献

5.

The accuracy of normal approximation in a heterogeneous panel data unit root test

Kristian Jönsson 《Statistical Papers》2008,49(3):565-579

Tests for unit roots in panel data have become very popular. Two attractive features of panel data unit root tests are the increased power compared to time-series tests, and the often well-behaved limiting distributions of the tests. In this paper we apply Monte Carlo simulations to investigate how well the normal approximation works for a heterogeneous panel data unit root test when there are only a few cross sections in the sample. We find that the normal approximation, which should be valid for large numbers of cross-sectional units, works well, at conventional significance levels, even when the number of cross sections is as small as two. This finding is valuable for the applied researcher since critical values will be easy to obtain and p-values will be readily available. 相似文献

6.

Stringency-based ranking of normality tests

Tanweer Ul Islam 《统计学通讯:模拟与计算》2017,46(1):655-668

Many parametric statistical inferential procedures in finite samples depend crucially on the underlying normal distribution assumption. Dozens of normality tests are available in the literature to test the hypothesis of normality. Availability of such a large number of normality tests has generated a large number of simulation studies to find a best test but no one arrived at a definite answer as all depends critically on the alternative distributions which cannot be specified. A new framework, based on stringency concept, is devised to evaluate the performance of the existing normality tests. Mixture of t-distributions is used to generate the alternative space. The LR-tests, based on Neyman–Pearson Lemma, have been computed to construct a power envelope for calculating the stringencies of the selected normality tests. While evaluating the stringencies, Anderson–Darling (AD) statistic turns out to be the best normality test. 相似文献

7.

The Effect of a Variable Data Point on Hypothesis Tests for Means

D. R. Grimmett J. R. Ridenhour 《The American statistician》2013,67(2):145-150

The effect of a single variable data point, x, on the usual test statistics for traditional hypothesis tests for means is analyzed. It is shown that an outlier may have a profound and unexpected effect on the test statistic. Although it might appear that an outlier would tend to lend support to the alternate hypothesis, it may in fact detract from the significance of the test. In one-population tests and analysis of variance (ANOVA), the value of x that maximizes the significance of the test statistic is given. This value does not have to be unusually large or small. In fact, it often falls within the range of the other sample points. In the general one-population case, the limiting value for the test statistic is shown to be +1. In the case involving more than one population, it is shown that the limiting value of the test statistic is a function only of the number of members in the samples and not their relative values. Special cases are identified in which the test statistic is shown to have unique characteristics depending on the characteristics of the data. 相似文献

8.

Two-moment approximations to some null distributions in order-restricted inference: Unequal sample sizes

Bahadur Singh F. T. Wright 《Revue canadienne de statistique》1988,16(3):269-282

Bartholomew's statistics for testing homogeneity of normal means with ordered alternatives have null distributions which are mixtures of chi-squared or beta distributions according as the variances are known or not. If the sample sizes are not equal, the mixing coefficients can be difficult to compute. For a simple order and a simple tree ordering, approximations to the significance levels of these tests have been developed which are based on patterns in the weight sets. However, for a moderate or large number of means, these approximations can be tedious to implement. Employing the same approach that was used in the development of these approximations, two-moment chisquared and beta approximations are derived for these significance levels. Approximations are also developed for the testing situation in which the order restriction is the null hypothesis. Numerical studies show that in each of the cases the two-moment approximation is quite satisfactory for most practical purposes. 相似文献

9.

Optimal Sample Designs With Preliminary Tests of Significance

Jean Baldwin Grossman 《商业与经济统计学杂志》2013,31(2):171-176

A common problem faced in social experiments is that of designing a sampling strategy before it is known whether various control groups can be pooled. The classical sample choice is between a large supposedly unpoolable sample or a smaller supposedly poolable sample. This article suggests a compromise strategy between these two extremes based on preliminary tests of significance that allow one to embed judgments about the likelihood of pooling into a classical sample design problem. 相似文献

10.

Data-driven smooth tests of the proportional hazards assumption

Kraus D 《Lifetime data analysis》2007,13(1):1-16

A new test of the proportional hazards assumption in the Cox model is proposed. The idea is based on Neyman’s smooth tests. The Cox model with proportional hazards (i.e. time-constant covariate effects) is embedded in a model with a smoothly time-varying covariate effect that is expressed as a combination of some basis functions (e.g., Legendre polynomials, cosines). Then the smooth test is the score test for significance of these artificial covariates. Furthermore, we apply a modification of Schwarz’s selection rule to choosing the dimension of the smooth model (the number of the basis functions). The score test is then used in the selected model. In a simulation study, we compare the proposed tests with standard tests based on the score process. 相似文献

11.

Robust testing for stationarity of global surface temperature

Erhard Reschenhofer 《Journal of applied statistics》2013,40(6):1349-1361

Surface temperature is a major indicator of climate change. To test for the presence of an upward trend in surface-temperature (global warming), sophisticated statistical methods are typically used which depend on implausible and/or unverifiable assumptions, in particular on the availability of a very large number of measurements. In this paper, the validity of these methods is challenged. It is argued that the available series are simply not long enough to justify the use of methods which are based on asymptotic arguments, because only a small fraction of the information contained in the data is utilizable to distinguish between a trend and natural variability. Thus, a simple frequency-domain test is proposed for the case when all but a very small number of frequencies may be corrupted by transitory fluctuations. Simulations confirm its robustness against short-term autocorrelation. When applied to a global surface-temperature series, significance can be achieved with far fewer frequencies than required by conventional tests. 相似文献

12.

Tests for exponentiality

Girish Patwardhan 《统计学通讯:理论与方法》2013,42(11):3705-3722

The idea of measuring the departure of data bu a plot of obeserved observations against their expectation has been expeetations has been exploited in this paper to develop tests for exponentiality the tests are for parameter two parameter exponential distribution with complete sample and one parameter exponential distribution with complete sample and one large sample distributions of the test statistics critical points have been computed for different levels of significance and applications of these have been computed for differents levels of significance and applications of these tests have been discussed in case of three data sets. 相似文献

13.

Statistical simulations on the BBC microcomputer: significance tests of the pseudo-random number generator

T. F. Gordon 《Journal of applied statistics》1985,12(2):147-155

Statistical tests of significance are carried out on the feedback shift register pseudo-random number generator employed on the BBC microcomputer. The tests are based on the practicalities of using a microcomputer in simulations for statistical education. The results indicate that the generator is not universally acceptable in this role. 相似文献

14.

Methods for high-dimensional multivariate and multi-group repeated measures data under non-normality

Solomon W. Harrar John Z. Hossler 《Statistics》2016,50(5):1056-1074

Asymptotic tests for multivariate repeated measures are derived under non-normality and unspecified dependence structure. Notwithstanding their broader scope of application, the methods are particularly useful when a random vector of large number of repeated measurements are collected from each subject but the number of subjects per treatment group is limited. In some experimental situations, replicating the experiment large number of times could be expensive or infeasible. Although taking large number of repeated measurements could be relatively cheaper, due to within subject dependence the number of parameters involved could get large pretty quickly. Under mild conditions on the persistence of the dependence, we have derived asymptotic multivariate tests for the three testing problems in repeated measures analysis. The simulation results provide evidence in favour of the accuracy of the approximations to the null distributions. 相似文献

15.

The use of statistics to examine the association between fluoride in drinking water and cancer death rates

Maritz JS Jarrett RG 《Journal of the Royal Statistical Society. Series C, Applied statistics》1983,32(2):97-101

"The basis of statistical tests of significance of association between fluoride level in drinking water and cancer death rates is discussed. Reference is made to two reported studies in each of which cancer death rates of a number of [U.S.] cities were used. It is argued that between city variation should be taken into account when performing tests of significance. In one of the two studies this was done informally; in the other between city variation was ignored." 相似文献

16.

Procedure for determining sample size at each stage in group sequential test

Tsunehisa Imada Hideyuki Douke 《统计学通讯:模拟与计算》2013,42(3):987-1000

The purpose of our study is to propose a. procedure for determining the sample size at each stage of the repeated group significance, tests intended to compare the efficacy of two treatments when a response variable is normal. It is necessary to devise a procedure for reducing the maximum sample size because a large number of sample size are often used in group sequential test. In order to reduce the sample size at each stage, we construct the repeated confidence boundaries which enable us to find which of the two treatments is the more effective at an early stage. Thus we use the recursive formulae of numerical integrations to determine the sample size at the intermediate stage. We compare our procedure with Pocock's in terms of maximum sample size and average sample size in the simulations. 相似文献

17.

Permuiation tesis for k-sample binomial data with comparisons of exact and approximate P-levels

Andrew P. Soms 《统计学通讯:理论与方法》2013,42(1):217-233

Exact ksample permutation tests for binary data for three commonly encountered hypotheses tests are presented,, The tests are derived both under the population and randomization models . The generating function for the number of cases in the null distribution is obtained, The asymptotic distributions of the test statistics are derived . Actual significance levels are computed for the asymptotic test versions , Random sampling of the null distribution is suggested as a superior alternative to the asymptotics and an efficient computer technique for implementing the random sampling is described., finally, some numerical examples are presented and sample size guidelines given for computer implementation of the exact tests. 相似文献

18.

Model selection using discriminant analysis

Timothy J. Novotny Lyman L. Mcdonald 《Journal of applied statistics》1986,13(2):159-165

A researcher is often confronted with the difficult and subjective task of determining which of m models best fits a set of observed data. A general robust statistical procedure for model selection is examined which uses discriminant analysis on significance levels resulting from various tests of hypotheses concerning the models. The use of Monte Carlo simulation to obtain the significance levels associated with the tests is presented. The technique is illustrated by application to four band recovery models useful in wildlife studies. Error rates due to misclassification are also reported. 相似文献

19.

IDENTIFYING TREATMENT EFFECTS IN MULTI-CHANNEL MEASUREMENTS IN ELECTROENCEPHALOGRAPHIC STUDIES: MULTIVARIATE PERMUTATION TESTS AND MULTIPLE COMPARISONS

M. C. Wheldon M. J. Anderson B. W. Johnson 《Australian & New Zealand Journal of Statistics》2007,49(4):397-413

A versatile procedure is described comprising an application of statistical techniques to the analysis of the large, multi‐dimensional data arrays produced by electroencephalographic (EEG) measurements of human brain function. Previous analytical methods have been unable to identify objectively the precise times at which statistically significant experimental effects occur, owing to the large number of variables (electrodes) and small number of subjects, or have been restricted to two‐treatment experimental designs. Many time‐points are sampled in each experimental trial, making adjustment for multiple comparisons mandatory. Given the typically large number of comparisons and the clear dependence structure among time‐points, simple Bonferroni‐type adjustments are far too conservative. A three‐step approach is proposed: (i) summing univariate statistics across variables; (ii) using permutation tests for treatment effects at each time‐point; and (iii) adjusting for multiple comparisons using permutation distributions to control family‐wise error across the whole set of time‐points. Our approach provides an exact test of the individual hypotheses while asymptotically controlling family‐wise error in the strong sense, and can provide tests of interaction and main effects in factorial designs. An application to two experimental data sets from EEG studies is described, but the approach has application to the analysis of spatio‐temporal multivariate data gathered in many other contexts. 相似文献

20.

Dominance refinements of the Smirnov two-sample test

《Journal of statistical planning and inference》1998,66(1):51-60

We prove the following conjecture of Narayana: there are no nontrivial dominance refinements of the Smirnov two-sample test if and only if the two sample sizes are relatively prime. We also count the number of natural significance levels of the Smirnov two-sample test in terms of the sample sizes and relate this to the Narayana conjecture. In particular, Smirnov tests with relatively prime sample sizes turn out to have many more natural significance levels than do Smirnov tests whose sample sizes are not relatively prime (for example, equal sample sizes). 相似文献