首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the problem of comparing step-down and step-up multiple test procedures for testing n hypotheses when independent p-values or independent test statistics are available. The defining critical values of these procedures for independent test statistics are asymptotically equal, which yields a theoretical argument for the numerical observation that the step-up procedure is mostly more powerful than the step-down procedure. The main aim of this paper is to quantify the differences between the critical values more precisely. As a by-product we also obtain more information about the gain when we consider two subsequent steps of these procedures. Moreover, we investigate how liberal the step-up procedure becomes when the step-up critical values are replaced by their step-down counterparts or by more refined approximate values. The results for independent p-values are the basis for obtaining corresponding results when independent real-valued test statistics are at hand. It turns out that the differences of step-down and step-up critical values as well as the differences between subsequent steps tend to zero for many distributions, except for heavy-tailed distributions. The Cauchy distribution yields an example where the critical values of both procedures are nearly linearly increasing in n.  相似文献   

2.
Tests that combine p-values, such as Fisher's product test, are popular to test the global null hypothesis H0 that each of n component null hypotheses, H1,…,Hn, is true versus the alternative that at least one of H1,…,Hn is false, since they are more powerful than classical multiple tests such as the Bonferroni test and the Simes tests. Recent modifications of Fisher's product test, popular in the analysis of large scale genetic studies include the truncated product method (TPM) of Zaykin et al. (2002), the rank truncated product (RTP) test of Dudbridge and Koeleman (2003) and more recently, a permutation based test—the adaptive rank truncated product (ARTP) method of Yu et al. (2009). The TPM and RTP methods require users' specification of a truncation point. The ARTP method improves the performance of the RTP method by optimizing selection of the truncation point over a set of pre-specified candidate points. In this paper we extend the ARTP by proposing to use all the possible truncation points {1,…,n} as the candidate truncation points. Furthermore, we derive the theoretical probability distribution of the test statistic under the global null hypothesis H0. Simulations are conducted to compare the performance of the proposed test with the Bonferroni test, the Simes test, the RTP test, and Fisher's product test. The simulation results show that the proposed test has higher power than the Bonferroni test and the Simes test, as well as the RTP method. It is also significantly more powerful than Fisher's product test when the number of truly false hypotheses is small relative to the total number of hypotheses, and has comparable power to Fisher's product test otherwise.  相似文献   

3.
For a fixed point θ0 and a positive value c0, this paper studies the problem of testing the hypotheses H0:|θθ0|≤c0 against H1:|θθ0|>c0 for the normal mean parameter θ using the empirical Bayes approach. With the accumulated past data, a monotone empirical Bayes test is constructed by mimicking the behavior of a monotone Bayes test. Such an empirical Bayes test is shown to be asymptotically optimal and its regret converges to zero at a rate (lnn)2.5/n where n is the number of past data available, when the current testing problem is considered. A simulation study is also given, and the results show that the proposed empirical Bayes procedure has good performance for small to moderately large sample sizes. Our proposed method can be applied for testing close to a control problem or testing the therapeutic equivalence of one standard treatment compared to another in clinical trials.  相似文献   

4.
A p-value is developed for testing the equivalence of the variances of a bivariate normal distribution. The unknown correlation coefficient is a nuisance parameter in the problem. If the correlation is known, the proposed p-value provides an exact test. For large samples, the p-value can be computed by replacing the unknown correlation by the sample correlation, and the resulting test is quite satisfactory. For small samples, it is proposed to compute the p-value by replacing the unknown correlation by a scalar multiple of the sample correlation. However, a single scalar is not satisfactory, and it is proposed to use different scalars depending on the magnitude of the sample correlation coefficient. In order to implement this approach, tables are obtained providing sub-intervals for the sample correlation coefficient, and the scalars to be used if the sample correlation coefficient belongs to a particular sub-interval. Once such tables are available, the proposed p-value is quite easy to compute since it has an explicit analytic expression. Numerical results on the type I error probability and power are reported on the performance of such a test, and the proposed p-value test is also compared to another test based on a rejection region. The results are illustrated with two examples: an example dealing with the comparability of two measuring devices, and an example dealing with the assessment of bioequivalence.  相似文献   

5.
We consider a partially linear model with diverging number of groups of parameters in the parametric component. The variable selection and estimation of regression coefficients are achieved simultaneously by using the suitable penalty function for covariates in the parametric component. An MM-type algorithm for estimating parameters without inverting a high-dimensional matrix is proposed. The consistency and sparsity of penalized least-squares estimators of regression coefficients are discussed under the setting of some nonzero regression coefficients with very small values. It is found that the root pn/n-consistency and sparsity of the penalized least-squares estimators of regression coefficients cannot be given consideration simultaneously when the number of nonzero regression coefficients with very small values is unknown, where pn and n, respectively, denote the number of regression coefficients and sample size. The finite sample behaviors of penalized least-squares estimators of regression coefficients and the performance of the proposed algorithm are studied by simulation studies and a real data example.  相似文献   

6.
The analysis of survival endpoints subject to right-censoring is an important research area in statistics, particularly among econometricians and biostatisticians. The two most popular semiparametric models are the proportional hazards model and the accelerated failure time (AFT) model. Rank-based estimation in the AFT model is computationally challenging due to optimization of a non-smooth loss function. Previous work has shown that rank-based estimators may be written as solutions to linear programming (LP) problems. However, the size of the LP problem is O(n 2+p) subject to n 2 linear constraints, where n denotes sample size and p denotes the dimension of parameters. As n and/or p increases, the feasibility of such solution in practice becomes questionable. Among data mining and statistical learning enthusiasts, there is interest in extending ordinary regression coefficient estimators for low-dimensions into high-dimensional data mining tools through regularization. Applying this recipe to rank-based coefficient estimators leads to formidable optimization problems which may be avoided through smooth approximations to non-smooth functions. We review smooth approximations and quasi-Newton methods for rank-based estimation in AFT models. The computational cost of our method is substantially smaller than the corresponding LP problem and can be applied to small- or large-scale problems similarly. The algorithm described here allows one to couple rank-based estimation for censored data with virtually any regularization and is exemplified through four case studies.  相似文献   

7.
In what follows, we introduce two Bayesian models for feature selection in high-dimensional data, specifically designed for the purpose of classification. We use two approaches to the problem: one which discards the components which have “almost constant” values (Model 1) and another which retains the components for which variations in-between the groups are larger than those within the groups (Model 2). We assume that p?n, i.e. the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization of FAIR to the case of L>2 classes. The performance of the methodology is studies via simulations and using a biological dataset of animal communication signals comprising 43 groups of electric signals recorded from tropical South American electric knife fishes.  相似文献   

8.
Linear models with a growing number of parameters have been widely used in modern statistics. One important problem about this kind of model is the variable selection issue. Bayesian approaches, which provide a stochastic search of informative variables, have gained popularity. In this paper, we will study the asymptotic properties related to Bayesian model selection when the model dimension p is growing with the sample size n. We consider pn and provide sufficient conditions under which: (1) with large probability, the posterior probability of the true model (from which samples are drawn) uniformly dominates the posterior probability of any incorrect models; and (2) the posterior probability of the true model converges to one in probability. Both (1) and (2) guarantee that the true model will be selected under a Bayesian framework. We also demonstrate several situations when (1) holds but (2) fails, which illustrates the difference between these two properties. Finally, we generalize our results to include g-priors, and provide simulation examples to illustrate the main results.  相似文献   

9.
Complete sets of orthogonal F-squares of order n = sp, where g is a prime or prime power and p is a positive integer have been constructed by Hedayat, Raghavarao, and Seiden (1975). Federer (1977) has constructed complete sets of orthogonal F-squares of order n = 4t, where t is a positive integer. We give a general procedure for constructing orthogonal F-squares of order n from an orthogonal array (n, k, s, 2) and an OL(s, t) set, where n is not necessarily a prime or prime power. In particular, we show how to construct sets of orthogonal F-squares of order n = 2sp, where s is a prime or prime power and p is a positive integer. These sets are shown to be near complete and approach complete sets as s and/or p become large. We have also shown how to construct orthogonal arrays by these methods. In addition, the best upper bound on the number t of orthogonal F(n, λ1), F(n, λ2), …, F(n, λ1) squares is given.  相似文献   

10.
For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative hypothesis requires complex analytic approximations, and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say p≤20. On the other hand, assuming that the data dimension p as well as the number q of regression variables are fixed while the sample size n grows, several asymptotic approximations are proposed in the literature for Wilk's Λ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilk's test in a high-dimensional context, specifically assuming a high data dimension p and a large sample size n. Based on recent random matrix theory, the correction we propose to Wilk's test is asymptotically Gaussian under the null hypothesis and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large p and large n context, but also for moderately large data dimensions such as p=30 or p=50. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in multivariate analysis of variance which is valid for high-dimensional data.  相似文献   

11.
We deal with the problem of classifying a new observation vector into one of two known multivariate normal distributions when the dimension p and training sample size N   are both large with p<Np<N. Modified linear discriminant analysis (MLDA) was suggested by Xu et al. [10]. Error rate of MLDA is smaller than the one of LDA. However, if p and N   are moderately large, error rate of MLDA is close to the one of LDA. These results are conditional ones, so we should investigate whether they hold unconditionally. In this paper, we give two types of asymptotic approximations of expected probability of misclassification (EPMC) for MLDA as n→∞n with p=O(nδ)p=O(nδ), 0<δ<10<δ<1. The one of two is the same as the asymptotic approximation of LDA, and the other is corrected version of the approximation. Simulation reveals that the modified version of approximation has good accuracy for the case in which p and N are moderately large.  相似文献   

12.
The generalization of the Behrens–Fisher problem to comparing more than two means from nonhomogeneous populations has attracted the attention of statisticians for many decades. Several approaches offer different approximations to the distribution of the test statistic. The question of statistical properties of these approximations is still alive. Here, we present a brief overview of several approaches suggested in the literature and implemented in software with a focus on investigating the accuracy of p values as well as their dependence on nuisance parameters and on the underlying assumption of normality. We illustrate by simulation the behavior of p values. In addition to the Satterthwaite–Fai–Cornelius test, the Kenward–Roger test, the simple ANOVA F test, the parametric bootstrap test, and the generalized F test will be briefly discussed.  相似文献   

13.
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high-dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian model averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the “small n large p” scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.  相似文献   

14.
This article examines some improperly stated but often used textbook probability problems. Moving from a probabilistic to a statistical setting provides insight into group testing (i.e., observing only whether one or more of a group responds and not the response of each individual). Exact methods are used to construct tables showing (i) that group testing n times to estimate p can be more efficient than n individual tests even for small n and large p, (ii) optimal grouping strategies for various (n, p) combinations, and (iii) the efficiencies and biases achieved.  相似文献   

15.
Test statistics for sphericity and identity of the covariance matrix are presented, when the data are multivariate normal and the dimension, p, can exceed the sample size, n. Under certain mild conditions mainly on the traces of the unknown covariance matrix, and using the asymptotic theory of U-statistics, the test statistics are shown to follow an approximate normal distribution for large p, also when p?n. The accuracy of the statistics is shown through simulation results, particularly emphasizing the case when p can be much larger than n. A real data set is used to illustrate the application of the proposed test statistics.  相似文献   

16.
Although the concept of sufficient dimension reduction that was originally proposed has been there for a long time, studies in the literature have largely focused on properties of estimators of dimension-reduction subspaces in the classical “small p, and large n” setting. Rather than the subspace, this paper considers directly the set of reduced predictors, which we believe are more relevant for subsequent analyses. A principled method is proposed for estimating a sparse reduction, which is based on a new, revised representation of an existing well-known method called the sliced inverse regression. A fast and efficient algorithm is developed for computing the estimator. The asymptotic behavior of the new method is studied when the number of predictors, p, exceeds the sample size, n, providing a guide for choosing the number of sufficient dimension-reduction predictors. Numerical results, including a simulation study and a cancer-drug-sensitivity data analysis, are presented to examine the performance.  相似文献   

17.
Regression procedures are not only hindered by large p and small n, but can also suffer in cases when outliers are present or the data generating mechanisms are heavy tailed. Since the penalized estimates like the least absolute shrinkage and selection operator (LASSO) are equipped to deal with the large p small n by encouraging sparsity, we combine a LASSO type penalty with the absolute deviation loss function, instead of the standard least squares loss, to handle the presence of outliers and heavy tails. The model is cast in a Bayesian setting and a Gibbs sampler is derived to efficiently sample from the posterior distribution. We compare our method to existing methods in a simulation study as well as on a prostate cancer data set and a base deficit data set from trauma patients.  相似文献   

18.
Let X =(x)ij=(111, …, X,)T, i = l, …n, be an n X random matrix having multivariate symmetrical distributions with parameters μ, Σ. The p-variate normal with mean μ and covariance matrix is a member of this family. Let be the squared multiple correlation coefficient between the first and the succeeding p1 components, and let p2 = + be the squared multiple correlation coefficient between the first and the remaining p1 + p2 =p – 1 components of the p-variate normal vector. We shall consider here three testing problems for multivariate symmetrical distributions. They are (A) to test p2 =0 against; (B) to test against =0, 0; (C) to test against p2 =0, We have shown here that for problem (A) the uniformly most powerful invariant (UMPI) and locally minimax test for the multivariate normal is UMPI and is locally minimax as p2 0 for multivariate symmetrical distributions. For problem (B) the UMPI and locally minimax test is UMPI and locally minimax as for multivariate symmetrical distributions. For problem (C) the locally best invariant (LBI) and locally minimax test for the multivariate normal is also LBI and is locally minimax as for multivariate symmetrical distributions.  相似文献   

19.
We consider a Bayesian approach to the study of independence in a two-way contingency table which has been obtained from a two-stage cluster sampling design. If a procedure based on single-stage simple random sampling (rather than the appropriate cluster sampling) is used to test for independence, the p-value may be too small, resulting in a conclusion that the null hypothesis is false when it is, in fact, true. For many large complex surveys the Rao–Scott corrections to the standard chi-squared (or likelihood ratio) statistic provide appropriate inference. For smaller surveys, though, the Rao–Scott corrections may not be accurate, partly because the chi-squared test is inaccurate. In this paper, we use a hierarchical Bayesian model to convert the observed cluster samples to simple random samples. This provides surrogate samples which can be used to derive the distribution of the Bayes factor. We demonstrate the utility of our procedure using an example and also provide a simulation study which establishes our methodology as a viable alternative to the Rao–Scott approximations for relatively small two-stage cluster samples. We also show the additional insight gained by displaying the distribution of the Bayes factor rather than simply relying on a summary of the distribution.  相似文献   

20.
In high-dimensional data, one often seeks a few interesting low-dimensional projections which reveal important aspects of the data. Projection pursuit for classification finds projections that reveal differences between classes. Even though projection pursuit is used to bypass the curse of dimensionality, most indexes will not work well when there are a small number of observations relative to the number of variables, known as a large p (dimension) small n (sample size) problem. This paper discusses the relationship between the sample size and dimensionality on classification and proposes a new projection pursuit index that overcomes the problem of small sample size for exploratory classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号