首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In practical settings such as microarray data analysis, multiple hypotheses with dependence within but not between equal-sized blocks often need to be tested. We consider an adaptive BH procedure to test the hypotheses. Under the condition of positive regression dependence on a subset of the true null hypotheses, the proposed adaptive procedure is shown to control the false discovery rate. The proposed approach is compared to the existing methods in simulation under block dependence and totally uniform pairwise dependence. It is observed that the proposed method performs better than the existing methods in several situations.  相似文献   

2.
Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1?λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.  相似文献   

3.
Most of current false discovery rate (FDR) procedures in a microarray experiment assume restrictive dependence structures, resulting in being less reliable. FDR controlling procedure under suitable dependence structures based on Poisson distributional approximation is shown. Unlike other procedures, the distribution of false null hypotheses is estimated by using kernel density estimation allowing for dependent structures among the genes. Furthermore, we develop an FDR framework that minimizes the false nondiscovery rate (FNR) with a constraint on the controlled level of the FDR. The performance of the proposed FDR procedure is compared with that of other existing FDR controlling procedures, with an application to the microarray study of simulated data.  相似文献   

4.
Summary.  The use of a fixed rejection region for multiple hypothesis testing has been shown to outperform standard fixed error rate approaches when applied to control of the false discovery rate. In this work it is demonstrated that, if the original step-up procedure of Benjamini and Hochberg is modified to exercise adaptive control of the false discovery rate, its performance is virtually identical to that of the fixed rejection region approach. In addition, the dependence of both methods on the proportion of true null hypotheses is explored, with a focus on the difficulties that are involved in the estimation of this quantity.  相似文献   

5.
Many exploratory studies such as microarray experiments require the simultaneous comparison of hundreds or thousands of genes. It is common to see that most genes in many microarray experiments are not expected to be differentially expressed. Under such a setting, a procedure that is designed to control the false discovery rate (FDR) is aimed at identifying as many potential differentially expressed genes as possible. The usual FDR controlling procedure is constructed based on the number of hypotheses. However, it can become very conservative when some of the alternative hypotheses are expected to be true. The power of a controlling procedure can be improved if the number of true null hypotheses (m 0) instead of the number of hypotheses is incorporated in the procedure [Y. Benjamini and Y. Hochberg, On the adaptive control of the false discovery rate in multiple testing with independent statistics, J. Edu. Behav. Statist. 25(2000), pp. 60–83]. Nevertheless, m 0 is unknown, and has to be estimated. The objective of this article is to evaluate some existing estimators of m 0 and discuss the feasibility of these estimators in incorporating into FDR controlling procedures under various experimental settings. The results of simulations can help the investigator to choose an appropriate procedure to meet the requirement of the study.  相似文献   

6.
Summary.  We consider the problem of estimating the proportion of true null hypotheses, π 0, in a multiple-hypothesis set-up. The tests are based on observed p -values. We first review published estimators based on the estimator that was suggested by Schweder and Spjøtvoll. Then we derive new estimators based on nonparametric maximum likelihood estimation of the p -value density, restricting to decreasing and convex decreasing densities. The estimators of π 0 are all derived under the assumption of independent test statistics. Their performance under dependence is investigated in a simulation study. We find that the estimators are relatively robust with respect to the assumption of independence and work well also for test statistics with moderate dependence.  相似文献   

7.
This article considers multiple hypotheses testing with the generalized familywise error rate k-FWER control, which is the probability of at least k false rejections. We first assume the p-values corresponding to the true null hypotheses are independent, and propose adaptive generalized Bonferroni procedure with k-FWER control based on the estimation of the number of true null hypotheses. Then, we assume the p-values are dependent, satisfying block dependence, and propose adaptive procedure with k-FWER control. Extensive simulations compare the performance of the adaptive procedures with different estimators.  相似文献   

8.
In multiple hypothesis test, an important problem is estimating the proportion of true null hypotheses. Existing methods are mainly based on the p-values of the single tests. In this paper, we propose two new estimations for this proportion. One is a natural extension of the commonly used methods based on p-values and the other is based on a mixed distribution. Simulations show that the first method is comparable with existing methods and performs better under some cases. And the method based on a mixed distribution can get accurate estimators even if the variance of data is large or the difference between the null hypothesis and alternative hypothesis is very small.  相似文献   

9.
Summary.  Estimation of the number or proportion of true null hypotheses in multiple-testing problems has become an interesting area of research. The first important work in this field was performed by Schweder and Spjøtvoll. Among others, they proposed to use plug-in estimates for the proportion of true null hypotheses in multiple-test procedures to improve the power. We investigate the problem of controlling the familywise error rate FWER when such estimators are used as plug-in estimators in single-step or step-down multiple-test procedures. First we investigate the case of independent p -values under the null hypotheses and show that a suitable choice of plug-in estimates leads to control of FWER in single-step procedures. We also investigate the power and study the asymptotic behaviour of the number of false rejections. Although step-down procedures are more difficult to handle we briefly consider a possible solution to this problem. Anyhow, plug-in step-down procedures are not recommended here. For dependent p -values we derive a condition for asymptotic control of FWER and provide some simulations with respect to FWER and power for various models and hypotheses.  相似文献   

10.
The idea of modifying, and potentially improving, classical multiple testing methods controlling the familywise error rate (FWER) via an estimate of the unknown number of true null hypotheses has been around for a long time without a formal answer to the question whether or not such adaptive methods ultimately maintain the strong control of FWER, until Finner and Gontscharuk (2009) and Guo (2009) have offered some answers. A class of adaptive Bonferroni and S?idàk methods larger than considered in those papers is introduced, with the FWER control now proved under a weaker distributional setup. Numerical results show that there are versions of adaptive Bonferroni and S?idàk methods that can perform better under certain positive dependence situations than those previously considered. A different adaptive Holm method and its stepup analog, referred to as an adaptive Hochberg method, are also introduced, and their FWER control is proved asymptotically, as in those papers. These adaptive Holm and Hochberg methods are numerically seen to often outperform the previously considered adaptive Holm method.  相似文献   

11.
Many exploratory experiments such as DNA microarray or brain imaging require simultaneously comparisons of hundreds or thousands of hypotheses. Under such a setting, using the false discovery rate (FDR) as an overall Type I error is recommended (Benjamini and Hochberg in J. R. Stat. Soc. B 57:289–300, 1995). Many FDR controlling procedures have been proposed. However, when evaluating the performance of FDR-controlling procedures, researchers are often focused on the ability of procedures to control the FDR and to achieve high power. Meanwhile, under the multiple hypotheses, it may be also likely to commit a false non-discovery or fail to claim a true non-significance. In addition, various experimental parameters such as the number of hypotheses, the proportion of the number of true null hypotheses to the number of hypotheses, the samples size and the correlation structure may affect the performance of FDR controlling procedures. The purpose of this paper is to illustrate the performance of some existing FDR controlling procedures in terms of four indices, i.e., the FDR, the false non-discovery rate, the sensitivity and the specificity. Analytical results of these indices for the FDR controlling procedures are derived. Simulations are also performed to evaluate the performance of controlling procedures in terms of these indices under various experimental parameters. The result can be used to summarize as a guidance for practitioners to properly choose a FDR controlling procedure.  相似文献   

12.
This article describes an algorithm for the identification of outliers in multivariate data based on the asymptotic theory for location estimation as described typically for the trimmed likelihood estimator and in particular for the minimum covariance determinant estimator. The strategy is to choose a subset of the data which minimizes an appropriate measure of the asymptotic variance of the multivariate location estimator. Observations not belonging to this subset are considered potential outliers which should be trimmed. For α less than about 0.5, the correct trimming proportion is taken to be that α > 0 for which the minimum of any minima of this measure of the asymptotic variance occurs. If no minima occur for an α > 0 then the data set will be considered outlier free.  相似文献   

13.
The likelihood ratio (LR) measures the relative weight of forensic data regarding two hypotheses. Several levels of uncertainty arise if frequentist methods are chosen for its assessment: the assumed population model only approximates the true one, and its parameters are estimated through a limited database. Moreover, it may be wise to discard part of data, especially that only indirectly related to the hypotheses. Different reductions define different LRs. Therefore, it is more sensible to talk about ‘a’ LR instead of ‘the’ LR, and the error involved in the estimation should be quantified. Two frequentist methods are proposed in the light of these points for the ‘rare type match problem’, that is, when a match between the perpetrator's and the suspect's DNA profile, never observed before in the database of reference, is to be evaluated.  相似文献   

14.
The aim of this paper is to propose methods of detecting change in the coefficients of a multinomial logistic regression model for categorical time series offline. The alternatives to the null hypothesis of stationarity can be either the hypothesis that it is not true, or that there is a temporary change in the sequence. We use the efficient score vector of the partial likelihood function. This has several advantages. First, the alternative value of the parameter does not have to be estimated; hence, we have a procedure that has a simple structure with only one parameter estimation using all available observations. This is in contrast with the generalized likelihood ratio-based change point tests. The efficient score vector is used in various ways. As a vector, its components correspond to the different components of the multinomial logistic regression model’s parameter vector. Using its quadratic form a test can be defined, where the presence of a change in any or all parameters is tested for. If there are too many parameters one can test for any subset while treating the rest as nuisance parameters. Our motivating example is a DNA sequence of four categories, and our test result shows that in the published data the distribution of the four categories is not stationary.  相似文献   

15.
Staudte  R.G.  Zhang  J. 《Lifetime data analysis》1997,3(4):383-398
The p-value evidence for an alternative to a null hypothesis regarding the mean lifetime can be unreliable if based on asymptotic approximations when there is only a small sample of right-censored exponential data. However, a guarded weight of evidence for the alternative can always be obtained without approximation, no matter how small the sample, and has some other advantages over p-values. Weights of evidence are defined as estimators of 0 when the null hypothesis is true and 1 when the alternative is true, and they are judged on the basis of the ensuing risks, where risk is mean squared error of estimation. The evidence is guarded in that a preassigned bound is placed on the risk under the hypothesis. Practical suggestions are given for choosing the bound and for interpreting the magnitude of the weight of evidence. Acceptability profiles are obtained by inversion of a family of guarded weights of evidence for two-sided alternatives to point hypotheses, just as confidence intervals are obtained from tests; these profiles are arguably more informative than confidence intervals, and are easily determined for any level and any sample size, however small. They can help understand the effects of different amounts of censoring. They are found for several small size data sets, including a sample of size 12 for post-operative cancer patients. Both singly Type I and Type II censored examples are included. An examination of the risk functions of these guarded weights of evidence suggests that if the censoring time is of the same magnitude as the mean lifetime, or larger, then the risks in using a guarded weight of evidence based on a likelihood ratio are not much larger than they would be if the parameter were known.  相似文献   

16.
Summary.  In the empirical literature on assortative matching using linked employer–employee data, unobserved worker quality appears to be negatively correlated with unobserved firm quality. We show that this can be caused by standard estimation error. We develop formulae that show that the estimated correlation is biased downwards if there is true positive assortative matching and when any conditioning covariates are uncorrelated with the firm and worker fixed effects. We show that this bias is bigger the fewer movers there are in the data, which is 'limited mobility bias'. This result applies to any two-way (or higher) error components model that is estimated by fixed effects methods. We apply these bias corrections to a large German linked employer–employee data set. We find that, although the biases can be considerable, they are not sufficiently large to remove the negative correlation entirely.  相似文献   

17.
The positive false discovery rate (pFDR) is the average proportion of false rejections given that the overall number of rejections is greater than zero. Assuming that the proportion of true null hypotheses, proportion of false positives, and proportion of true positives all converge pointwise, the pFDR converges to a continuous limit uniformly over all significance levels. We are showing that the uniform convergence still holds given a weaker assumption that the proportion of true positives converges in L 1.  相似文献   

18.
Rank tests are known to be robust to outliers and violation of distributional assumptions. Two major issues besetting microarray data are violation of the normality assumption and contamination by outliers. In this article, we formulate the normal theory simultaneous tests and their aligned rank transformation (ART) analog for detecting differentially expressed genes. These tests are based on the least-squares estimates of the effects when data follow a linear model. Application of the two methods are then demonstrated on a real data set. To evaluate the performance of the aligned rank transform method with the corresponding normal theory method, data were simulated according to the characteristics of a real gene expression data. These simulated data are then used to compare the two methods with respect to their sensitivity to the distributional assumption and to outliers for controlling the family-wise Type I error rate, power, and false discovery rate. It is demonstrated that the ART generally possesses the robustness of validity property even for microarray data with small number of replications. Although these methods can be applied to more general designs, in this article the simulation study is carried out for a dye-swap design since this design is broadly used in cDNA microarray experiments.  相似文献   

19.
Abstract

Inferential methods based on ranks present robust and powerful alternative methodology for testing and estimation. In this article, two objectives are followed. First, develop a general method of simultaneous confidence intervals based on the rank estimates of the parameters of a general linear model and derive the asymptotic distribution of the pivotal quantity. Second, extend the method to high dimensional data such as gene expression data for which the usual large sample approximation does not apply. It is common in practice to use the asymptotic distribution to make inference for small samples. The empirical investigation in this article shows that for methods based on the rank-estimates, this approach does not produce a viable inference and should be avoided. A method based on the bootstrap is outlined and it is shown to provide a reliable and accurate method of constructing simultaneous confidence intervals based on rank estimates. In particular it is shown that commonly applied methods of normal or t-approximation are not satisfactory, particularly for large-scale inferences. Methods based on ranks are uniquely suitable for analysis of microarray gene expression data since they often involve large scale inferences based on small samples containing a large number of outliers and violate the assumption of normality. A real microarray data is analyzed using the rank-estimate simultaneous confidence intervals. Viability of the proposed method is assessed through a Monte Carlo simulation study under varied assumptions.  相似文献   

20.
A common approach to analysing clinical trials with multiple outcomes is to control the probability for the trial as a whole of making at least one incorrect positive finding under any configuration of true and false null hypotheses. Popular approaches are to use Bonferroni corrections or structured approaches such as, for example, closed-test procedures. As is well known, such strategies, which control the family-wise error rate, typically reduce the type I error for some or all the tests of the various null hypotheses to below the nominal level. In consequence, there is generally a loss of power for individual tests. What is less well appreciated, perhaps, is that depending on approach and circumstances, the test-wise loss of power does not necessarily lead to a family wise loss of power. In fact, it may be possible to increase the overall power of a trial by carrying out tests on multiple outcomes without increasing the probability of making at least one type I error when all null hypotheses are true. We examine two types of problems to illustrate this. Unstructured testing problems arise typically (but not exclusively) when many outcomes are being measured. We consider the case of more than two hypotheses when a Bonferroni approach is being applied while for illustration we assume compound symmetry to hold for the correlation of all variables. Using the device of a latent variable it is easy to show that power is not reduced as the number of variables tested increases, provided that the common correlation coefficient is not too high (say less than 0.75). Afterwards, we will consider structured testing problems. Here, multiplicity problems arising from the comparison of more than two treatments, as opposed to more than one measurement, are typical. We conduct a numerical study and conclude again that power is not reduced as the number of tested variables increases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号