Summary.  The paper considers the problem of multiple testing under dependence in a compound decision theoretic framework. The observed data are assumed to be generated from an underlying two-state hidden Markov model. We propose oracle and asymptotically optimal data-driven procedures that aim to minimize the false non-discovery rate FNR subject to a constraint on the false discovery rate FDR. It is shown that the performance of a multiple-testing procedure can be substantially improved by adaptively exploiting the dependence structure among hypotheses, and hence conventional FDR procedures that ignore this structural information are inefficient. Both theoretical properties and numerical performances of the procedures proposed are investigated. It is shown that the procedures proposed control FDR at the desired level, enjoy certain optimality properties and are especially powerful in identifying clustered non-null cases. The new procedure is applied to an influenza-like illness surveillance study for detecting the timing of epidemic periods.  相似文献   

Multiple hypothesis testing literature has recently experienced a growing development with particular attention to the control of the false discovery rate (FDR) based on p-values. While these are not the only methods to deal with multiplicity, inference with small samples and large sets of hypotheses depends on the specific choice of the p-value used to control the FDR in the presence of nuisance parameters. In this paper we propose to use the partial posterior predictive p-value [Bayarri, M.J., Berger, J.O., 2000. p-values for composite null models. J. Amer. Statist. Assoc. 95, 1127–1142] that overcomes this difficulty. This choice is motivated by theoretical considerations and examples. Finally, an application to a controlled microarray experiment is presented.  相似文献   

High-throughput data analyses are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. False discovery rate (FDR) has been considered a proper type I error rate to control for discovery-based high-throughput data analysis. Various multiple testing procedures have been proposed to control the FDR. The power and stability properties of some commonly used multiple testing procedures have not been extensively investigated yet, however. Simulation studies were conducted to compare power and stability properties of five widely used multiple testing procedures at different proportions of true discoveries for various sample sizes for both independent and dependent test statistics. Storey's two linear step-up procedures showed the best performance among all tested procedures considering FDR control, power, and variance of true discoveries. Leukaemia and ovarian cancer microarray studies were used to illustrate the power and stability characteristics of these five multiple testing procedures with FDR control.  相似文献   

Traditional multiple hypothesis testing procedures fix an error rate and determine the corresponding rejection region. In 2002 Storey proposed a fixed rejection region procedure and showed numerically that it can gain more power than the fixed error rate procedure of Benjamini and Hochberg while controlling the same false discovery rate (FDR). In this paper it is proved that when the number of alternatives is small compared to the total number of hypotheses, Storey's method can be less powerful than that of Benjamini and Hochberg. Moreover, the two procedures are compared by setting them to produce the same FDR. The difference in power between Storey's procedure and that of Benjamini and Hochberg is near zero when the distance between the null and alternative distributions is large, but Benjamini and Hochberg's procedure becomes more powerful as the distance decreases. It is shown that modifying the Benjamini and Hochberg procedure to incorporate an estimate of the proportion of true null hypotheses as proposed by Black gives a procedure with superior power.  相似文献   

Multi-arm trials are an efficient way of simultaneously testing several experimental treatments against a shared control group. As well as reducing the sample size required compared to running each trial separately, they have important administrative and logistical advantages. There has been debate over whether multi-arm trials should correct for the fact that multiple null hypotheses are tested within the same experiment. Previous opinions have ranged from no correction is required, to a stringent correction (controlling the probability of making at least one type I error) being needed, with regulators arguing the latter for confirmatory settings. In this article, we propose that controlling the false-discovery rate (FDR) is a suitable compromise, with an appealing interpretation in multi-arm clinical trials. We investigate the properties of the different correction methods in terms of the positive and negative predictive value (respectively how confident we are that a recommended treatment is effective and that a non-recommended treatment is ineffective). The number of arms and proportion of treatments that are truly effective is varied. Controlling the FDR provides good properties. It retains the high positive predictive value of FWER correction in situations where a low proportion of treatments is effective. It also has a good negative predictive value in situations where a high proportion of treatments is effective. In a multi-arm trial testing distinct treatment arms, we recommend that sponsors and trialists consider use of the FDR.  相似文献   

Summary. We investigate the operating characteristics of the Benjamini–Hochberg false discovery rate procedure for multiple testing. This is a distribution-free method that controls the expected fraction of falsely rejected null hypotheses among those rejected. The paper provides a framework for understanding more about this procedure. We first study the asymptotic properties of the `deciding point' D that determines the critical p -value. From this, we obtain explicit asymptotic expressions for a particular risk function. We introduce the dual notion of false non-rejections and we consider a risk function that combines the false discovery rate and false non-rejections. We also consider the optimal procedure with respect to a measure of conditional risk.  相似文献   

Measurement error and misclassification arise commonly in various data collection processes. It is well-known that ignoring these features in the data analysis usually leads to biased inference. With the generalized linear model setting, Yi et al. [Functional and structural methods with mixed measurement error and misclassification in covariates. J Am Stat Assoc. 2015;110:681–696] developed inference methods to adjust for the effects of measurement error in continuous covariates and misclassification in discrete covariates simultaneously for the scenario where validation data are available. The augmented simulation-extrapolation (SIMEX) approach they developed generalizes the usual SIMEX method which is only applicable to handle continuous error-prone covariates. To implement this method, we develop an R package, augSIMEX, for public use. Simulation studies are conducted to illustrate the use of the algorithm. This package is available at CRAN.  相似文献   

We are considered with the problem of m simultaneous statistical test problems with composite null hypotheses. Usually, marginal p-values are computed under least favorable parameter configurations (LFCs), thus being over-conservative under non-LFCs. Our proposed randomized p-value leads to a tighter exhaustion of the marginal (local) significance level. In turn, it is stochastically larger than the LFC-based p-value under alternatives. While these distributional properties are typically nonsensical for m  =1, the exhaustion of the local significance level is extremely helpful for cases with m>1m>1 in connection with data-adaptive multiple tests as we will demonstrate by considering multiple one-sided tests for Gaussian means.  相似文献   

Summary.  To help to design vaccines for acquired immune deficiency syndrome that protect broadly against many genetic variants of the human immunodeficiency virus, the mutation rates at 118 positions in HIV amino-acid sequences of subtype C versus those of subtype B were compared. The false discovery rate (FDR) multiple-comparisons procedure can be used to determine statistical significance. When the test statistics have discrete distributions, the FDR procedure can be made more powerful by a simple modification. The paper develops a modified FDR procedure for discrete data and applies it to the human immunodeficiency virus data. The new procedure detects 15 positions with significantly different mutation rates compared with 11 that are detected by the original FDR method. Simulations delineate conditions under which the modified FDR procedure confers large gains in power over the original technique. In general FDR adjustment methods can be improved for discrete data by incorporating the modification proposed.  相似文献   

In this paper we consider the impact of both missing data and measurement errors on a longitudinal analysis of participation in higher education in Australia. We develop a general method for handling both discrete and continuous measurement errors that also allows for the incorporation of missing values and random effects in both binary and continuous response multilevel models. Measurement errors are allowed to be mutually dependent and their distribution may depend on further covariates. We show that our methodology works via two simple simulation studies. We then consider the impact of our measurement error assumptions on the analysis of the real data set.  相似文献   

Many exploratory studies such as microarray experiments require the simultaneous comparison of hundreds or thousands of genes. It is common to see that most genes in many microarray experiments are not expected to be differentially expressed. Under such a setting, a procedure that is designed to control the false discovery rate (FDR) is aimed at identifying as many potential differentially expressed genes as possible. The usual FDR controlling procedure is constructed based on the number of hypotheses. However, it can become very conservative when some of the alternative hypotheses are expected to be true. The power of a controlling procedure can be improved if the number of true null hypotheses (m 0) instead of the number of hypotheses is incorporated in the procedure [Y. Benjamini and Y. Hochberg, On the adaptive control of the false discovery rate in multiple testing with independent statistics, J. Edu. Behav. Statist. 25(2000), pp. 60–83]. Nevertheless, m 0 is unknown, and has to be estimated. The objective of this article is to evaluate some existing estimators of m 0 and discuss the feasibility of these estimators in incorporating into FDR controlling procedures under various experimental settings. The results of simulations can help the investigator to choose an appropriate procedure to meet the requirement of the study.  相似文献   

Empirical Bayes estimates of the local false discovery rate can reflect uncertainty about the estimated prior by supplementing their Bayesian posterior probabilities with confidence levels as posterior probabilities. This use of coherent fiducial inference with hierarchical models generates set estimators that propagate uncertainty to varying degrees. Some of the set estimates approach estimates from plug-in empirical Bayes methods for high numbers of comparisons and can come close to the usual confidence sets given a sufficiently low number of comparisons.  相似文献   

In this article, we propose a new class of distribution which is based on the concept of exponentiated generalization with some modification so as to provide a better result in terms of flexibility. Our proposed distribution accommodates various shapes of hazard rate including the bathtub. Exponential distribution has been taken as the baseline distribution. Various statistical properties of the proposed distribution have been studied. We have used the method of maximum likelihood for estimation of the parameters of the proposed model. Last, we have analyzed four real datasets to illustrate the flexibility of the model in comparison to eight existing well-known distributions.  相似文献   

Summary.  Given a large number of test statistics, a small proportion of which represent departures from the relevant null hypothesis, a simple rule is given for choosing those statistics that are indicative of departure. It is based on fitting by moments a mixture model to the set of test statistics and then deriving an estimated likelihood ratio. Simulation suggests that the procedure has good properties when the departure from an overall null hypothesis is not too small.  相似文献   

The aim of this paper is to propose a survival credit risk model that jointly accommodates three types of time-to-default found in bank loan portfolios. It leads to a new framework that extends the standard cure rate model introduced by Berkson and Gage (1952 Berkson, J., and R. P. Gage. 1952. Survival curve for cancer patients following treatment. Journal of the American Statistical Association 47 (259):50115.[Taylor & Francis Online], [Web of Science ®] [Google Scholar]) regarding the accommodation of zero-inflations. In other words, we propose a new survival model that takes into account three different types of individuals which have so far not been jointly accounted for: (i) an individual with an event at the starting time (zero time); (ii) non susceptible for the event, or (iii) susceptible for the event. Considering this, the zero-inflated Weibull non default rate regression models, which include a multinomial logistic link for the three classes, are presented using an application for credit scoring data. The parameter estimation is reached by the maximum-likelihood estimation procedure and Monte Carlo simulations are carried out to assess its finite sample performance.  相似文献   

Recent approaches to the statistical analysis of adverse event (AE) data in clinical trials have proposed the use of groupings of related AEs, such as by system organ class (SOC). These methods have opened up the possibility of scanning large numbers of AEs while controlling for multiple comparisons, making the comparative performance of the different methods in terms of AE detection and error rates of interest to investigators. We apply two Bayesian models and two procedures for controlling the false discovery rate (FDR), which use groupings of AEs, to real clinical trial safety data. We find that while the Bayesian models are appropriate for the full data set, the error controlling methods only give similar results to the Bayesian methods when low incidence AEs are removed. A simulation study is used to compare the relative performances of the methods. We investigate the differences between the methods over full trial data sets, and over data sets with low incidence AEs and SOCs removed. We find that while the removal of low incidence AEs increases the power of the error controlling procedures, the estimated power of the Bayesian methods remains relatively constant over all data sizes. Automatic removal of low-incidence AEs however does have an effect on the error rates of all the methods, and a clinically guided approach to their removal is needed. Overall we found that the Bayesian approaches are particularly useful for scanning the large amounts of AE data gathered.  相似文献   

Historically, the cure rate model has been used for modeling time-to-event data within which a significant proportion of patients are assumed to be cured of illnesses, including breast cancer, non-Hodgkin lymphoma, leukemia, prostate cancer, melanoma, and head and neck cancer. Perhaps the most popular type of cure rate model is the mixture model introduced by Berkson and Gage [1]. In this model, it is assumed that a certain proportion of the patients are cured, in the sense that they do not present the event of interest during a long period of time and can found to be immune to the cause of failure under study. In this paper, we propose a general hazard model which accommodates comprehensive families of cure rate models as particular cases, including the model proposed by Berkson and Gage. The maximum-likelihood-estimation procedure is discussed. A simulation study analyzes the coverage probabilities of the asymptotic confidence intervals for the parameters. A real data set on children exposed to HIV by vertical transmission illustrates the methodology.  相似文献   

With special reference to the family of skew-normal distributions, we consider geometric curvature of a probability density function as a means to define and identify rare or catastrophic events—a phenomenon common in studying the financial instruments. Further, we study the statistical curvature properties of this family of distributions and discuss the sample size issue, to assess, to what extent the linear and likelihood-based inference of exponential family of distribution can be applicable for the skew-normal family.  相似文献   

Piecewise-deterministic Markov processes form a general class of non diffusion stochastic models that involve both deterministic trajectories and random jumps at random times. In this paper, we state a new characterization of the jump rate of such a process with discrete transitions. We deduce from this result a non parametric technique for estimating this feature of interest. We state the uniform convergence in probability of the estimator. The methodology is illustrated on a numerical example.  相似文献   

