期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Asymptotic comparison of the critical values of step-down and step-up multiple comparison procedures

《Journal of statistical planning and inference》1999,79(1):11-30

We consider the problem of comparing step-down and step-up multiple test procedures for testing n hypotheses when independent p-values or independent test statistics are available. The defining critical values of these procedures for independent test statistics are asymptotically equal, which yields a theoretical argument for the numerical observation that the step-up procedure is mostly more powerful than the step-down procedure. The main aim of this paper is to quantify the differences between the critical values more precisely. As a by-product we also obtain more information about the gain when we consider two subsequent steps of these procedures. Moreover, we investigate how liberal the step-up procedure becomes when the step-up critical values are replaced by their step-down counterparts or by more refined approximate values. The results for independent p-values are the basis for obtaining corresponding results when independent real-valued test statistics are at hand. It turns out that the differences of step-down and step-up critical values as well as the differences between subsequent steps tend to zero for many distributions, except for heavy-tailed distributions. The Cauchy distribution yields an example where the critical values of both procedures are nearly linearly increasing in n. 相似文献

2.

A combined p-value test for multiple hypothesis testing

Shunpu Zhang Huann-Sheng Chen Ruth M. Pfeiffer 《Journal of statistical planning and inference》2013

Tests that combine p-values, such as Fisher's product test, are popular to test the global null hypothesis H₀ that each of n component null hypotheses, H₁,…,H_n, is true versus the alternative that at least one of H₁,…,H_n is false, since they are more powerful than classical multiple tests such as the Bonferroni test and the Simes tests. Recent modifications of Fisher's product test, popular in the analysis of large scale genetic studies include the truncated product method (TPM) of Zaykin et al. (2002), the rank truncated product (RTP) test of Dudbridge and Koeleman (2003) and more recently, a permutation based test—the adaptive rank truncated product (ARTP) method of Yu et al. (2009). The TPM and RTP methods require users' specification of a truncation point. The ARTP method improves the performance of the RTP method by optimizing selection of the truncation point over a set of pre-specified candidate points. In this paper we extend the ARTP by proposing to use all the possible truncation points {1,…,n} as the candidate truncation points. Furthermore, we derive the theoretical probability distribution of the test statistic under the global null hypothesis H₀. Simulations are conducted to compare the performance of the proposed test with the Bonferroni test, the Simes test, the RTP test, and Fisher's product test. The simulation results show that the proposed test has higher power than the Bonferroni test and the Simes test, as well as the RTP method. It is also significantly more powerful than Fisher's product test when the number of truly false hypotheses is small relative to the total number of hypotheses, and has comparable power to Fisher's product test otherwise. 相似文献

3.

Empirical Bayes testing for equivalence

Lee-Shen Chen Ming-Chung Yang 《Journal of statistical planning and inference》2011,141(8):2670-2681

For a fixed point θ₀ and a positive value c₀, this paper studies the problem of testing the hypotheses H₀:|θ−θ₀|≤c₀ against H₁:|θ−θ₀|>c₀ for the normal mean parameter θ using the empirical Bayes approach. With the accumulated past data, a monotone empirical Bayes test is constructed by mimicking the behavior of a monotone Bayes test. Such an empirical Bayes test is shown to be asymptotically optimal and its regret converges to zero at a rate (lnn)^2.5/n where n is the number of past data available, when the current testing problem is considered. A simulation study is also given, and the results show that the proposed empirical Bayes procedure has good performance for small to moderately large sample sizes. Our proposed method can be applied for testing close to a control problem or testing the therapeutic equivalence of one standard treatment compared to another in clinical trials. 相似文献

4.

A p-value for testing the equivalence of the variances of a bivariate normal distribution

Thomas Mathew Gitanjali Paul 《Journal of statistical planning and inference》2008

A p-value is developed for testing the equivalence of the variances of a bivariate normal distribution. The unknown correlation coefficient is a nuisance parameter in the problem. If the correlation is known, the proposed p-value provides an exact test. For large samples, the p-value can be computed by replacing the unknown correlation by the sample correlation, and the resulting test is quite satisfactory. For small samples, it is proposed to compute the p-value by replacing the unknown correlation by a scalar multiple of the sample correlation. However, a single scalar is not satisfactory, and it is proposed to use different scalars depending on the magnitude of the sample correlation coefficient. In order to implement this approach, tables are obtained providing sub-intervals for the sample correlation coefficient, and the scalars to be used if the sample correlation coefficient belongs to a particular sub-interval. Once such tables are available, the proposed p-value is quite easy to compute since it has an explicit analytic expression. Numerical results on the type I error probability and power are reported on the performance of such a test, and the proposed p-value test is also compared to another test based on a rejection region. The results are illustrated with two examples: an example dealing with the comparability of two measuring devices, and an example dealing with the assessment of bioequivalence. 相似文献

5.

Penalized least-squares estimation for regression coefficients in high-dimensional partially linear models

Huey-Fan Ni 《Journal of statistical planning and inference》2012,142(2):379-389

We consider a partially linear model with diverging number of groups of parameters in the parametric component. The variable selection and estimation of regression coefficients are achieved simultaneously by using the suitable penalty function for covariates in the parametric component. An MM-type algorithm for estimating parameters without inverting a high-dimensional matrix is proposed. The consistency and sparsity of penalized least-squares estimators of regression coefficients are discussed under the setting of some nonzero regression coefficients with very small values. It is found that the root p_n/n-consistency and sparsity of the penalized least-squares estimators of regression coefficients cannot be given consideration simultaneously when the number of nonzero regression coefficients with very small values is unknown, where p_n and n, respectively, denote the number of regression coefficients and sample size. The finite sample behaviors of penalized least-squares estimators of regression coefficients and the performance of the proposed algorithm are studied by simulation studies and a real data example. 相似文献

6.

A tutorial on rank-based coefficient estimation for censored data in small- and large-scale problems

Matthias Chung Qi Long Brent A. Johnson 《Statistics and Computing》2013,23(5):601-614

The analysis of survival endpoints subject to right-censoring is an important research area in statistics, particularly among econometricians and biostatisticians. The two most popular semiparametric models are the proportional hazards model and the accelerated failure time (AFT) model. Rank-based estimation in the AFT model is computationally challenging due to optimization of a non-smooth loss function. Previous work has shown that rank-based estimators may be written as solutions to linear programming (LP) problems. However, the size of the LP problem is O(n ²+p) subject to n ² linear constraints, where n denotes sample size and p denotes the dimension of parameters. As n and/or p increases, the feasibility of such solution in practice becomes questionable. Among data mining and statistical learning enthusiasts, there is interest in extending ordinary regression coefficient estimators for low-dimensions into high-dimensional data mining tools through regularization. Applying this recipe to rank-based coefficient estimators leads to formidable optimization problems which may be avoided through smooth approximations to non-smooth functions. We review smooth approximations and quasi-Newton methods for rank-based estimation in AFT models. The computational cost of our method is substantially smaller than the corresponding LP problem and can be applied to small- or large-scale problems similarly. The algorithm described here allows one to couple rank-based estimation for censored data with virtually any regularization and is exemplified through four case studies. 相似文献

7.

Bayesian feature selection for classification with possibly large number of classes

Justin DavisMarianna Pensky William Crampton 《Journal of statistical planning and inference》2011,141(9):3256-3266

In what follows, we introduce two Bayesian models for feature selection in high-dimensional data, specifically designed for the purpose of classification. We use two approaches to the problem: one which discards the components which have “almost constant” values (Model 1) and another which retains the components for which variations in-between the groups are larger than those within the groups (Model 2). We assume that p?n, i.e. the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization of FAIR to the case of L>2 classes. The performance of the methodology is studies via simulations and using a biological dataset of animal communication signals comprising 43 groups of electric signals recorded from tropical South American electric knife fishes. 相似文献

8.

Consistency of Bayesian linear model selection with a growing number of parameters

Zuofeng Shang Murray K. Clayton 《Journal of statistical planning and inference》2011,141(11):3463-3474

Linear models with a growing number of parameters have been widely used in modern statistics. One important problem about this kind of model is the variable selection issue. Bayesian approaches, which provide a stochastic search of informative variables, have gained popularity. In this paper, we will study the asymptotic properties related to Bayesian model selection when the model dimension p is growing with the sample size n. We consider p≤n and provide sufficient conditions under which: (1) with large probability, the posterior probability of the true model (from which samples are drawn) uniformly dominates the posterior probability of any incorrect models; and (2) the posterior probability of the true model converges to one in probability. Both (1) and (2) guarantee that the true model will be selected under a Bayesian framework. We also demonstrate several situations when (1) holds but (2) fails, which illustrates the difference between these two properties. Finally, we generalize our results to include g-priors, and provide simulation examples to illustrate the main results. 相似文献

9.

On the construction of orthogonal F-squares of order n from an orthogonal array (n, k, s, 2) and an OL(s, t) set

John P. Mandeli F.-C.Helen Lee Walter T. Federer 《Journal of statistical planning and inference》1981,5(3):267-272

Complete sets of orthogonal F-squares of order n = s^p, where g is a prime or prime power and p is a positive integer have been constructed by Hedayat, Raghavarao, and Seiden (1975). Federer (1977) has constructed complete sets of orthogonal F-squares of order n = 4t, where t is a positive integer. We give a general procedure for constructing orthogonal F-squares of order n from an orthogonal array (n, k, s, 2) and an OL(s, t) set, where n is not necessarily a prime or prime power. In particular, we show how to construct sets of orthogonal F-squares of order n = 2s^p, where s is a prime or prime power and p is a positive integer. These sets are shown to be near complete and approach complete sets as s and/or p become large. We have also shown how to construct orthogonal arrays by these methods. In addition, the best upper bound on the number t of orthogonal F(n, λ₁), F(n, λ₂), …, F(n, λ₁) squares is given. 相似文献

10.

Testing linear hypotheses in high-dimensional regressions

Zhidong Bai Dandan Jiang Jian-feng Yao 《Statistics》2013,47(6):1207-1223

For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative hypothesis requires complex analytic approximations, and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say p≤20. On the other hand, assuming that the data dimension p as well as the number q of regression variables are fixed while the sample size n grows, several asymptotic approximations are proposed in the literature for Wilk's Λ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilk's test in a high-dimensional context, specifically assuming a high data dimension p and a large sample size n. Based on recent random matrix theory, the correction we propose to Wilk's test is asymptotically Gaussian under the null hypothesis and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large p and large n context, but also for moderately large data dimensions such as p=30 or p=50. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in multivariate analysis of variance which is valid for high-dimensional data. 相似文献

11.

Asymptotic properties of the EPMC for modified linear discriminant analysis when sample size and dimension are both large

Masashi Hyodo Takayuki Yamada 《Journal of statistical planning and inference》2010

We deal with the problem of classifying a new observation vector into one of two known multivariate normal distributions when the dimension p and training sample size N are both large with p<N

p < N

. Modified linear discriminant analysis (MLDA) was suggested by Xu et al. [10]. Error rate of MLDA is smaller than the one of LDA. However, if p and N are moderately large, error rate of MLDA is close to the one of LDA. These results are conditional ones, so we should investigate whether they hold unconditionally. In this paper, we give two types of asymptotic approximations of expected probability of misclassification (EPMC) for MLDA as n→∞

n \to \infty

with p=O(n^δ)

p = O (n^{δ})

, 0<δ<1

0 < δ < 1

. The one of two is the same as the asymptotic approximation of LDA, and the other is corrected version of the approximation. Simulation reveals that the modified version of approximation has good accuracy for the case in which p and N are moderately large. 相似文献

12.

Heteroscedastic ANOVA: old p values, new views

Julia Volaufova 《Statistical Papers》2009,50(4):943-962

The generalization of the Behrens–Fisher problem to comparing more than two means from nonhomogeneous populations has attracted the attention of statisticians for many decades. Several approaches offer different approximations to the distribution of the test statistic. The question of statistical properties of these approximations is still alive. Here, we present a brief overview of several approaches suggested in the literature and implemented in software with a focus on investigating the accuracy of p values as well as their dependence on nuisance parameters and on the underlying assumption of normality. We illustrate by simulation the behavior of p values. In addition to the Satterthwaite–Fai–Cornelius test, the Kenward–Roger test, the simple ANOVA F test, the parametric bootstrap test, and the generalized F test will be briefly discussed. 相似文献

13.

Bayesian Additive Regression Trees using Bayesian model averaging

Belinda Hernández Adrian E. Raftery Stephen R Pennington Andrew C. Parnell 《Statistics and Computing》2018,28(4):869-890

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high-dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian model averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the “small n large p” scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git. 相似文献

14.

Bad Probability,Good Statistics,and Group Testing for Binomial Estimation

Milton W. Loyer 《The American statistician》2013,67(1):57-59

This article examines some improperly stated but often used textbook probability problems. Moving from a probabilistic to a statistical setting provides insight into group testing (i.e., observing only whether one or more of a group responds and not the response of each individual). Exact methods are used to construct tables showing (i) that group testing n times to estimate p can be more efficient than n individual tests even for small n and large p, (ii) optimal grouping strategies for various (n, p) combinations, and (iii) the efficiencies and biases achieved. 相似文献

15.

Tests for high-dimensional covariance matrices using the theory of U-statistics

M. Rauf Ahmad D. von Rosen 《Journal of Statistical Computation and Simulation》2015,85(13):2619-2631

Test statistics for sphericity and identity of the covariance matrix are presented, when the data are multivariate normal and the dimension, p, can exceed the sample size, n. Under certain mild conditions mainly on the traces of the unknown covariance matrix, and using the asymptotic theory of U-statistics, the test statistics are shown to follow an approximate normal distribution for large p, also when p?n. The accuracy of the statistics is shown through simulation results, particularly emphasizing the case when p can be much larger than n. A real data set is used to illustrate the application of the proposed test statistics. 相似文献

16.

Estimating a sparse reduction for general regression in high dimensions

Tao Wang Mengjie Chen Hongyu Zhao Lixing Zhu 《Statistics and Computing》2018,28(1):33-46

Although the concept of sufficient dimension reduction that was originally proposed has been there for a long time, studies in the literature have largely focused on properties of estimators of dimension-reduction subspaces in the classical “small p, and large n” setting. Rather than the subspace, this paper considers directly the set of reduced predictors, which we believe are more relevant for subsequent analyses. A principled method is proposed for estimating a sparse reduction, which is based on a new, revised representation of an existing well-known method called the sliced inverse regression. A fast and efficient algorithm is developed for computing the estimator. The asymptotic behavior of the new method is studied when the number of predictors, p, exceeds the sample size, n, providing a guide for choosing the number of sufficient dimension-reduction predictors. Numerical results, including a simulation study and a cancer-drug-sensitivity data analysis, are presented to examine the performance. 相似文献

17.

Balanced Bayesian LASSO for heavy tails

《Journal of Statistical Computation and Simulation》2012,82(6):1115-1132

Regression procedures are not only hindered by large p and small n, but can also suffer in cases when outliers are present or the data generating mechanisms are heavy tailed. Since the penalized estimates like the least absolute shrinkage and selection operator (LASSO) are equipped to deal with the large p small n by encouraging sparsity, we combine a LASSO type penalty with the absolute deviation loss function, instead of the standard least squares loss, to handle the presence of outliers and heavy tails. The model is cast in a Bayesian setting and a Gibbs sampler is derived to efficiently sample from the posterior distribution. We compare our method to existing methods in a simulation study as well as on a prostate cancer data set and a base deficit data set from trauma patients. 相似文献

18.

Some robust tests of independence in symmetrical multivariate distributions

Narayan C. Giri 《Revue canadienne de statistique》1988,16(4):419-428

Let X =(x)^ij=(11¹, …, X,)_T, i = l, …n, be an n X random matrix having multivariate symmetrical distributions with parameters μ, Σ. The p-variate normal with mean μ and covariance matrix is a member of this family. Let be the squared multiple correlation coefficient between the first and the succeeding p₁ components, and let p² = + be the squared multiple correlation coefficient between the first and the remaining p₁ + p₂ =p – 1 components of the p-variate normal vector. We shall consider here three testing problems for multivariate symmetrical distributions. They are (A) to test p² =0 against; (B) to test against =0, 0; (C) to test against p² =0, We have shown here that for problem (A) the uniformly most powerful invariant (UMPI) and locally minimax test for the multivariate normal is UMPI and is locally minimax as p² 0 for multivariate symmetrical distributions. For problem (B) the UMPI and locally minimax test is UMPI and locally minimax as for multivariate symmetrical distributions. For problem (C) the locally best invariant (LBI) and locally minimax test for the multivariate normal is also LBI and is locally minimax as for multivariate symmetrical distributions. 相似文献

19.

A Bayesian test of independence in a two-way contingency table using surrogate sampling

Balgobin Nandram Dilli Bhatta Joe Sedransk Dhiman Bhadra 《Journal of statistical planning and inference》2013

We consider a Bayesian approach to the study of independence in a two-way contingency table which has been obtained from a two-stage cluster sampling design. If a procedure based on single-stage simple random sampling (rather than the appropriate cluster sampling) is used to test for independence, the p-value may be too small, resulting in a conclusion that the null hypothesis is false when it is, in fact, true. For many large complex surveys the Rao–Scott corrections to the standard chi-squared (or likelihood ratio) statistic provide appropriate inference. For smaller surveys, though, the Rao–Scott corrections may not be accurate, partly because the chi-squared test is inaccurate. In this paper, we use a hierarchical Bayesian model to convert the observed cluster samples to simple random samples. This provides surrogate samples which can be used to derive the distribution of the Bayes factor. We demonstrate the utility of our procedure using an example and also provide a simulation study which establishes our methodology as a viable alternative to the Rao–Scott approximations for relatively small two-stage cluster samples. We also show the additional insight gained by displaying the distribution of the Bayes factor rather than simply relying on a summary of the distribution. 相似文献

20.

A projection pursuit index for large <Emphasis Type="Italic">p</Emphasis> small <Emphasis Type="Italic">n</Emphasis> data

Eun-Kyung Lee Dianne Cook 《Statistics and Computing》2010,20(3):381-392

In high-dimensional data, one often seeks a few interesting low-dimensional projections which reveal important aspects of the data. Projection pursuit for classification finds projections that reveal differences between classes. Even though projection pursuit is used to bypass the curse of dimensionality, most indexes will not work well when there are a small number of observations relative to the number of variables, known as a large p (dimension) small n (sample size) problem. This paper discusses the relationship between the sample size and dimensionality on classification and proposes a new projection pursuit index that overcomes the problem of small sample size for exploratory classification. 相似文献