期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluating the Accuracy of Small P‐Values In Genetic Association Studies Using Edgeworth Expansions

《Scandinavian Journal of Statistics》2018,45(1):1-33

The asymptotic distributions of many classical test statistics are normal. The resulting approximations are often accurate for commonly used significance levels, 0.05 or 0.01. In genome‐wide association studies, however, the significance level can be as low as 1×10⁻⁷, and the accuracy of the p‐values can be challenging. We study the accuracies of these small p‐values are using two‐term Edgeworth expansions for three commonly used test statistics in GWAS. These tests have nuisance parameters not defined under the null hypothesis but estimable. We derive results for this general form of testing statistics using Edgeworth expansions, and find that the commonly used score test, maximin efficiency robust test and the chi‐squared test are second order accurate in the presence of the nuisance parameter, justifying the use of the p‐values obtained from these tests in the genome‐wide association studies. 相似文献

2.

Adaptive plotting position and test of normality

《Journal of Statistical Computation and Simulation》2012,82(4):413-420

The quantile–quantile plot is widely used to check normality. The plot depends on the plotting positions. Many commonly used plotting positions do not depend on the sample values. We propose an adaptive plotting position that depends on the relative distances of the two neighbouring sample values. The correlation coefficient obtained from the adaptive plotting position is used to test normality. The test using the adaptive plotting position is better than the Shapiro–Wilk W test for small samples and has larger power than Hazen's and Blom's plotting positions for symmetric alternatives with shorter tail than normal and skewed alternatives when n is 20 or larger. The Brown–Hettmansperger T* test is designed for detecting bad tail behaviour, so it does not have power for symmetric alternatives with shorter tail than normal, but it is generally better than the other tests when β₂ is greater than 3.25. 相似文献

3.

Was Quetelet’s Average Man Normal?

Eugene D. Gallagher 《The American statistician》2020,74(3):301-306

Abstract

Quetelet’s data on Scottish chest girths are analyzed with eight normality tests. In contrast to Quetelet’s conclusion that the data are fit well by what is now known as the normal distribution, six of eight normality tests provide strong evidence that the chest circumferences are not normally distributed. Using corrected chest circumferences from Stigler, the χ² test no longer provides strong evidence against normality, but five commonly used normality tests do. The D’Agostino–Pearson K² and Jarque–Bera tests, based only on skewness and kurtosis, find that both Quetelet’s original data and the Stigler-corrected data are consistent with the hypothesis of normality. The major reason causing most normality tests to produce low p-values, indicating that Quetelet’s data are not normally distributed, is that the chest circumferences were reported in whole inches and rounding of large numbers of observations can produce many tied values that strongly affect most normality tests. Users should be cautious using many standard normality tests if data have ties, are rounded, and the ratio of the standard deviation to rounding interval is small. 相似文献

4.

Goodness of fit test for the generalized Rayleigh distribution with unknown parameters

《Journal of Statistical Computation and Simulation》2012,82(3):357-366

The use of goodness-of-fit test based on Anderson–Darling (AD) statistic is discussed, with reference to the composite hypothesis that a sample of observations comes from a generalized Rayleigh distribution whose parameters are unspecified. Monte Carlo simulation studies were performed to calculate the critical values for AD test. These critical values are then used for testing whether a set of observations follows a generalized Rayleigh distribution when the scale and shape parameters are unspecified and are estimated from the sample. Functional relationship between the critical values of AD is also examined for each shape parameter (α), sample size (n) and significance level (γ). The power study is performed with the hypothesized generalized Rayleigh against alternate distributions. 相似文献

5.

A Comparison of Two Random Number Generators for the Birnbaum–Saunders Distribution

《统计学通讯:理论与方法》2013,42(5):929-934

Abstract

The Birnbaum–Saunders distribution was developed to describe fatigue failure lifetimes, however, the distribution has been shown to be applicable for a variety of situations that frequently occur in the engineering sciences. In general, the distribution can be used for situations that involve stochastic wear–out failure. The distribution does not have an exponential family structure, and it is often necessary to use simulation methods to study the properties of statistical inference procedures for this distribution. Two random number generators for the Birnbaum–Saunders distribution have appeared in the literature. The purpose of this article is to present and compare these two random number generators to determine which is more efficient. It is shown that one of these generators is a special case of the other and is simpler and more efficient to use. 相似文献

6.

Notes on Kolassa’s Method for Estimating the Power of Wilcoxon–Mann–Whitney Test

Yongqiang Tang 《统计学通讯:模拟与计算》2016,45(1):240-251

The Kolassa method implemented in the nQuery Advisor software has been widely used for approximating the power of the Wilcoxon–Mann–Whitney (WMW) test for ordered categorical data, in which Edgeworth approximation is used to estimate the power of an unconditional test based on the WMW U statistic. When the sample size is small or when the sizes in the two groups are unequal, Kolassa’s method may yield quite poor approximation to the power of the conditional WMW test that is commonly implemented in statistical packages. Two modifications of Kolassa’s formula are proposed and assessed by simulation studies. 相似文献

7.

The Yule–Walker equations as a weighted least-squares problem and the association with tapering

Joan Parrish Bee Leng Lee 《统计学通讯:理论与方法》2013,42(17):5112-5122

ABSTRACT

A common method for estimating the time-domain parameters of an autoregressive process is to use the Yule–Walker equations. Tapering has been shown intuitively and proven theoretically to reduce the bias of the periodogram in the frequency domain, but the intuition for the similar bias reduction in the time-domain estimates has been lacking. We provide insightful reasoning for why tapering reduces the bias in the Yule–Walker estimates by showing them to be equivalent to a weighted least-squares problem. This leads to the derivation of an optimal taper which behaves similarly to commonly used tapers. 相似文献

8.

Wild-bootstrapped variance-ratio test for autocorrelation in the presence of heteroskedasticity

Jinook Jeong Byunguk Kang 《Journal of applied statistics》2012,39(7):1531-1542

The Breusch–Godfrey LM test is one of the most popular tests for autocorrelation. However, it has been shown that the LM test may be erroneous when there exist heteroskedastic errors in a regression model. Recently, remedies have been proposed by Godfrey and Tremayne [9] and Shim et al. [21]. This paper suggests three wild-bootstrapped variance-ratio (WB-VR) tests for autocorrelation in the presence of heteroskedasticity. We show through a Monte Carlo simulation that our WB-VR tests have better small sample properties and are robust to the structure of heteroskedasticity. 相似文献

9.

Robust gene–environment interaction analysis using penalized trimmed regression

《Journal of Statistical Computation and Simulation》2012,82(18):3502-3528

ABSTRACT

In biomedical and epidemiological studies, gene–environment (G–E) interactions have been shown to importantly contribute to the etiology and progression of many complex diseases. Most existing approaches for identifying G–E interactions are limited by the lack of robustness against outliers/contaminations in response and predictor spaces. In this study, we develop a novel robust G–E identification approach using the trimmed regression technique under joint modelling. A robust data-driven criterion and stability selection are adopted to determine the trimmed subset which is free from both vertical outliers and leverage points. An effective penalization approach is developed to identify important G–E interactions, respecting the ‘main effects, interactions’ hierarchical structure. Extensive simulations demonstrate the better performance of the proposed approach compared to multiple alternatives. Interesting findings with superior prediction accuracy and stability are observed in the analysis of The Cancer Genome Atlas data on cutaneous melanoma and breast invasive carcinoma. 相似文献

10.

Testing Conditional Mean Independence Under Symmetry

Tao Chen Yuanyuan Ji Yahong Zhou Pingfang Zhu 《商业与经济统计学杂志》2013,31(4):615-627

Conditional mean independence (CMI) is one of the most widely used assumptions in the treatment effect literature to achieve model identification. We propose a Kolmogorov–Smirnov-type statistic to test CMI under a specific symmetry condition. We also propose a bootstrap procedure to obtain the p-values and critical values that are required to carry out the test. Results from a simulation study suggest that our test can work very well even in small to moderately sized samples. As an empirical illustration, we apply our test to a dataset that has been used in the literature to estimate the return on college education in China, to check whether the assumption of CMI is supported by the dataset and show the plausibility of the extra symmetry condition that is necessary for this new test. 相似文献

11.

Outlier detection by robust principal components analysis

C. Caroni 《统计学通讯:模拟与计算》2013,42(1):139-151

The robust principal components analysis (RPCA) introduced by Campbell (Applied Statistics 1980, 29, 231–237) provides in addition to robust versions of the usual output of a principal components analysis, weights for the contribution of each point to the robust estimation of each component. Low weights may thus be used to indicate outliers. The present simulation study provides critical values for testing the kth smallest weight in the RPCA of a sample of n p-dimensional vectors, under the null hypothesis of a multivariate normal distribution. The cases p=2(2)10, 15, 20 for n=20, 30, 40, 50, 75, 100 subject to n≥p/2, are examined, with k≤√n. 相似文献

12.

Rejoinder

《统计学通讯:模拟与计算》2013,42(4):1249-1250

Abstract

In this article several formulae for the approximation of the critical values for tests on the actual values of the process capability indices CPL, CPU, and C_pk are provided. These formulae are based on different approximations of the percentiles of the noncentral t distribution and their performance is evaluated by comparing the values assessed through them from the exact critical values, for several significance levels, test values, and sample sizes. As supported by the obtained results, some of the presented techniques constitute valuable tools in situations where the exact critical values of the tests are not available, since one may approximate them readily and rather accurately through them. 相似文献

13.

Tests for an end-of-sample bubble in financial time series

Sam Astill David I. Harvey Stephen J. Leybourne 《Econometric Reviews》2017,36(6-9):651-666

ABSTRACT

In this paper, we examine the issue of detecting explosive behavior in economic and financial time series when an explosive episode is both ongoing at the end of the sample and of finite length. We propose a testing strategy based on a subsampling method in which a suitable test statistic is calculated on a finite number of end-of-sample observations, with a critical value obtained using subsample test statistics calculated on the remaining observations. This approach also has the practical advantage that, by virtue of how the critical values are obtained, it can deliver tests which are robust to, among other things, conditional heteroskedasticity and serial correlation in the driving shocks. We also explore modifications of the raw statistics to account for unconditional heteroskedasticity using studentization and a White-type correction. We evaluate the finite sample size and power properties of our proposed procedures and find that they offer promising levels of power, suggesting the possibility for earlier detection of end-of-sample bubble episodes compared to existing procedures. 相似文献

14.

A Nonrandomized,Nonconservative Version of the Fisher Exact Test

Egbert A. van der Meulen 《统计学通讯:理论与方法》2013,42(5):699-708

The Fisher exact test has been unjustly dismissed by some as ‘only conditional,’ whereas it is unconditionally the uniform most powerful test among all unbiased tests, tests of size α and with power greater than its nominal level of significance α. The problem with this truly optimal test is that it requires randomization at the critical value(s) to be of size α. Obviously, in practice, one does not want to conclude that ‘with probability x the we have a statistical significant result.’ Usually, the hypothesis is rejected only if the test statistic's outcome is more extreme than the critical value, reducing the actual size considerably.

The randomized unconditional Fisher exact is constructed (using Neyman–structure arguments) by deriving a conditional randomized test randomizing at critical values c(t) by probabilities γ(t), that both depend on the total number of successes T (the complete-sufficient statistic for the nuisance parameter—the common success probability) conditioned upon.

In this paper, the Fisher exact is approximated by deriving nonrandomized conditional tests with critical region including the critical value only if γ (t) > γ₀, for a fixed threshold value γ₀, such that the size of the unconditional modified test is for all value of the nuisance parameter—the common success probability—smaller, but as close as possible to α. It will be seen that this greatly improves the size of the test as compared with the conservative nonrandomized Fisher exact test.

Size, power, and p value comparison with the (virtual) randomized Fisher exact test, and the conservative nonrandomized Fisher exact, Pearson's chi-square test, with the more competitive mid-p value, the McDonald's modification, and Boschloo's modifications are performed under the assumption of two binomial samples. 相似文献

15.

Statistical analysis of the number of self-overlapping leftmost repeats in an homogeneous stationary Markov chain on finite states

Ferhat Ziram Dominique Cellier François Charlot 《统计学通讯:理论与方法》2013,42(20):6087-6101

ABSTRACT

This article addresses the problem of repeats detection used in the comparison of significant repeats in sequences. The case of self-overlapping leftmost repeats for large sequences generated by an homogeneous stationary Markov chain has not been treated in the literature. In this work, we are interested by the approximation of the number of self-overlapping leftmost long enough repeats distribution in an homogeneous stationary Markov chain. Using the Chen–Stein method, we show that the number of self-overlapping leftmost long enough repeats distribution is approximated by the Poisson distribution. Moreover, we show that this approximation can be extended to the case where the sequences are generated by a m-order Markov chain. 相似文献

16.

A Smooth Nonparametric,Multivariate, Mixed-Data Location-Scale Test

Jeffrey S. Racine Ingrid Van Keilegom 《商业与经济统计学杂志》2020,38(4):784-795

Abstract

A number of tests have been proposed for assessing the location-scale assumption that is often invoked by practitioners. Existing approaches include Kolmogorov–Smirnov and Cramer–von Mises statistics that each involve measures of divergence between unknown joint distribution functions and products of marginal distributions. In practice, the unknown distribution functions embedded in these statistics are typically approximated using nonsmooth empirical distribution functions (EDFs). In a recent article, Li, Li, and Racine establish the benefits of smoothing the EDF for inference, though their theoretical results are limited to the case where the covariates are observed and the distributions unobserved, while in the current setting some covariates and their distributions are unobserved (i.e., the test relies on population error terms from a location-scale model) which necessarily involves a separate theoretical approach. We demonstrate how replacing the nonsmooth distributions of unobservables with their kernel-smoothed sample counterparts can lead to substantial power improvements, and extend existing approaches to the smooth multivariate and mixed continuous and discrete data setting in the presence of unobservables. Theoretical underpinnings are provided, Monte Carlo simulations are undertaken to assess finite-sample performance, and illustrative applications are provided. 相似文献

17.

An adjusted cumulative Kullback-Leibler information with application to test of exponentiality

《统计学通讯:理论与方法》2012,41(1):44-60

Abstract

In order to discriminate between two probability distributions extensions of Kullback–Leibler (KL) information have been proposed in the literature. In recent years, an extension called cumulative Kullback–Leibler (CKL) information is considered by authors which is closely related to equilibrium distributions. In this paper, we propose an adjusted version of CKL based on equilibrium distributions. Some properties of the proposed measure of divergence are investigated. A test of exponentiality based on the adjusted measure, is proposed. The empirical power of the presented test is calculated and compared with some existing standard tests of exponentiality. The results show that our proposed test, for some important alternative distributions, has better performance than some of the existing tests. 相似文献

18.

An Examination of the Robustness to Non Normality of the EWMA Control Charts for the Dispersion

Petros E. Maravelakis Stelios Psarakis 《统计学通讯:模拟与计算》2013,42(4):1069-1079

ABSTRACT

The EWMA control chart is used to detect small shifts in a process. It has been shown that, for certain values of the smoothing parameter, the EWMA chart for the mean is robust to non normality. In this article, we examine the case of non normality in the EWMA charts for the dispersion. It is shown that we can have an EWMA chart for dispersion robust to non normality when non normality is not extreme. 相似文献

19.

An entropy test for the Rayleigh distribution and power comparison

《Journal of Statistical Computation and Simulation》2012,82(1):151-158

In this paper, a goodness-of-fit test is proposed for the Rayleigh distribution. This test is based on the Kullback–Leibler discrimination methodology proposed by Song [2002, Goodness of fit tests based on Kullback–Leibler discrimination, IEEE Trans. Inf. Theory 48(5), pp. 1103–1117]. The critical values and powers for some alternatives are obtained by simulation. The proposed test is compared with other tests, namely Kolmogorov–Smirnov, Kuiper, Cramer–von Mises, Watson and Anderson–Darling. The use of the proposed test is shown in a real example. 相似文献

20.

Analysis of ordered categorical data with score averaging: with applications to case-control genetic associations

Q. Li G. Zheng R. Tiwari 《Journal of applied statistics》2011,38(9):1833-1843

The trend test is often used for the analysis of 2×K ordered categorical data, in which K pre-specified increasing scores are used. There have been discussions on how to assign these scores and the impact of the outcomes on different scores. The scores are often assigned based on the data-generating model. When this model is unknown, using the trend test is not robust. We discuss the weighted average of a trend test over all scientifically plausible choices of scores or models. This approach is more computationally efficient than a commonly used robust test MAX when K is large. Our discussion is for any ordered 2×K table, but simulation and applications to real data are focused on case-control genetic association studies. Although there is no single test optimal for all choices of scores, our numerical results show that some score averaging tests can achieve the performance of MAX. 相似文献