首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A Bayesian mixture model for differential gene expression   总被引:3,自引:0,他引:3  
Summary.  We propose model-based inference for differential gene expression, using a nonparametric Bayesian probability model for the distribution of gene intensities under various conditions. The probability model is a mixture of normal distributions. The resulting inference is similar to a popular empirical Bayes approach that is used for the same inference problem. The use of fully model-based inference mitigates some of the necessary limitations of the empirical Bayes method. We argue that inference is no more difficult than posterior simulation in traditional nonparametric mixture-of-normal models. The approach proposed is motivated by a microarray experiment that was carried out to identify genes that are differentially expressed between normal tissue and colon cancer tissue samples. Additionally, we carried out a small simulation study to verify the methods proposed. In the motivating case-studies we show how the nonparametric Bayes approach facilitates the evaluation of posterior expected false discovery rates. We also show how inference can proceed even in the absence of a null sample of known non-differentially expressed scores. This highlights the difference from alternative empirical Bayes approaches that are based on plug-in estimates.  相似文献   

2.
Identifying differentially expressed genes is a basic objective in microarray experiments. Many statistical methods for detecting differentially expressed genes in multiple-slide experiments have been proposed. However, sometimes with limited experimental resources, only a single cDNA array or two Oligonuleotide arrays could be made or only insufficient replicated arrays could be conducted. Many current statistical models cannot be used because of the non-availability of replicated data. Simply using fold changes is also unreliable and inefficient [Chen et al. 1997. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics 2, 364–374; Newton et al. 2001. On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8, 37–52; Pan et al. 2002. How many replicates of arrays are required to detect gene expression changes in microarray experiments? a mixture model approach. Genome Biol. 3, research0022.1-0022.10]. We propose a new method. If the log-transformed ratios for the expressed genes as well as unexpressed genes have equal variance, we use a Hadamard matrix to construct a t-test from a single array data. Basically, we test whether each doubtful gene has significantly differential expression compared to the unexpressed genes. We form some new random variables corresponding to the rows of a Hadamard matrix using the algebraic sum of gene expressions. A one-sample t-test is constructed and the p-value is calculated for each doubtful gene based on these random variables. By using any method for multiple testing, adjusted p-values could be obtained from original p-values and significance of doubtful genes can be determined. When the variance of expressed genes differs from the variance of unexpressed genes, we construct a z-statistic based on the result from application of Hadamard matrix and find the confidence interval to retain the null hypothesis. Using the interval, we determine differentially expressed genes. This method is also useful for multiple microarrays, especially when sufficient replicated data are not available for a traditional t-test. We apply our methodology to ApoAI data. The results appear to be promising. They not only confirm the early known differentially expressed genes, but also indicate more genes to be differentially expressed.  相似文献   

3.
The main purpose of this paper is to introduce first a new family of empirical test statistics for testing a simple null hypothesis when the vector of parameters of interest is defined through a specific set of unbiased estimating functions. This family of test statistics is based on a distance between two probability vectors, with the first probability vector obtained by maximizing the empirical likelihood (EL) on the vector of parameters, and the second vector defined from the fixed vector of parameters under the simple null hypothesis. The distance considered for this purpose is the phi-divergence measure. The asymptotic distribution is then derived for this family of test statistics. The proposed methodology is illustrated through the well-known data of Newcomb's measurements on the passage time for light. A simulation study is carried out to compare its performance with that of the EL ratio test when confidence intervals are constructed based on the respective statistics for small sample sizes. The results suggest that the ‘empirical modified likelihood ratio test statistic’ provides a competitive alternative to the EL ratio test statistic, and is also more robust than the EL ratio test statistic in the presence of contamination in the data. Finally, we propose empirical phi-divergence test statistics for testing a composite null hypothesis and present some asymptotic as well as simulation results for evaluating the performance of these test procedures.  相似文献   

4.
In this paper, we propose a general kth correlation coefficient between the density function and distribution function of a continuous variable as a measure of symmetry and asymmetry. We first propose a root-n moment-based estimator of the kth correlation coefficient and present its asymptotic results. Next, we consider statistical inference of the kth correlation coefficient by using the empirical likelihood (EL) method. The EL statistic is shown to be asymptotically a standard chi-squared distribution. Last, we propose a residual-based estimator of the kth correlation coefficient for a parametric regression model to test whether the density function of the true model error is symmetric or not. We present the asymptotic results of the residual-based kth correlation coefficient estimator and also construct its EL-based confidence intervals. Simulation studies are conducted to examine the performance of the proposed estimators, and we also use our proposed estimators to analyze the air quality dataset.  相似文献   

5.
We propose methods for detecting structural changes in time series with discrete‐valued observations. The detector statistics come in familiar L2‐type formulations incorporating the empirical probability generating function. Special emphasis is given to the popular models of integer autoregression and Poisson autoregression. For both models, we study mainly structural changes due to a change in distribution, but we also comment for the classical problem of parameter change. The asymptotic properties of the proposed test statistics are studied under the null hypothesis as well as under alternatives. A Monte Carlo power study on bootstrap versions of the new methods is also included along with a real data example.  相似文献   

6.
We consider a recurrent event wherein the inter‐event times are independent and identically distributed with a common absolutely continuous distribution function F. In this article, interest is in the problem of testing the null hypothesis that F belongs to some parametric family where the q‐dimensional parameter is unknown. We propose a general Chi‐squared test in which cell boundaries are data dependent. An estimator of the parameter obtained by minimizing a quadratic form resulting from a properly scaled vector of differences between Observed and Expected frequencies is used to construct the test. This estimator is known as the minimum chi‐square estimator. Large sample properties of the proposed test statistic are established using empirical processes tools. A simulation study is conducted to assess the performance of the test under parameter misspecification, and our procedures are applied to a fleet of Boeing 720 jet planes' air conditioning system failures.  相似文献   

7.
Summary.  In microarray experiments, accurate estimation of the gene variance is a key step in the identification of differentially expressed genes. Variance models go from the too stringent homoscedastic assumption to the overparameterized model assuming a specific variance for each gene. Between these two extremes there is some room for intermediate models. We propose a method that identifies clusters of genes with equal variance. We use a mixture model on the gene variance distribution. A test statistic for ranking and detecting differentially expressed genes is proposed. The method is illustrated with publicly available complementary deoxyribonucleic acid microarray experiments, an unpublished data set and further simulation studies.  相似文献   

8.
In this paper, we apply empirical likelihood for two-sample problems with growing high dimensionality. Our results are demonstrated for constructing confidence regions for the difference of the means of two p-dimensional samples and the difference in value between coefficients of two p-dimensional sample linear model. We show that empirical likelihood based estimator has the efficient property. That is, as p → ∞ for high-dimensional data, the limit distribution of the EL ratio statistic for the difference of the means of two samples and the difference in value between coefficients of two-sample linear model is asymptotic normal distribution. Furthermore, empirical likelihood (EL) gives efficient estimator for regression coefficients in linear models, and can be as efficient as a parametric approach. The performance of the proposed method is illustrated via numerical simulations.  相似文献   

9.
We consider a nonparametric autoregression model under conditional heteroscedasticity with the aim to test whether the innovation distribution changes in time. To this end, we develop an asymptotic expansion for the sequential empirical process of nonparametrically estimated innovations (residuals). We suggest a Kolmogorov–Smirnov statistic based on the difference of the estimated innovation distributions built from the first ?ns?and the last n ? ?ns? residuals, respectively (0 ≤ s ≤ 1). Weak convergence of the underlying stochastic process to a Gaussian process is proved under the null hypothesis of no change point. The result implies that the test is asymptotically distribution‐free. Consistency against fixed alternatives is shown. The small sample performance of the proposed test is investigated in a simulation study and the test is applied to a data example.  相似文献   

10.
In this paper, we propose a nonparametric method based on jackknife empirical likelihood ratio to test the equality of two variances. The asymptotic distribution of the test statistic has been shown to follow χ2 distribution with the degree of freedom 1. Simulations have been conducted to show the type I error and the power compared to Levene's test and F test under different distribution settings. The proposed method has been applied to a real data set to illustrate the testing procedure.  相似文献   

11.
In this paper, we develop an info-metric framework for testing hypotheses about structural instability in nonlinear, dynamic models estimated from the information in population moment conditions. Our methods are designed to distinguish between three states of the world: (i) the model is structurally stable in the sense that the population moment condition holds at the same parameter value throughout the sample; (ii) the model parameters change at some point in the sample but otherwise the model is correctly specified; and (iii) the model exhibits more general forms of instability than a single shift in the parameters. An advantage of the info-metric approach is that the null hypotheses concerned are formulated in terms of distances between various choices of probability measures constrained to satisfy (i) and (ii), and the empirical measure of the sample. Under the alternative hypotheses considered, the model is assumed to exhibit structural instability at a single point in the sample, referred to as the break point; our analysis allows for the break point to be either fixed a priori or treated as occuring at some unknown point within a certain fraction of the sample. We propose various test statistics that can be thought of as sample analogs of the distances described above, and derive their limiting distributions under the appropriate null hypothesis. The limiting distributions of our statistics are nonstandard but coincide with various distributions that arise in the literature on structural instability testing within the Generalized Method of Moments framework. A small simulation study illustrates the finite sample performance of our test statistics.  相似文献   

12.
This paper discusses the problem of fitting a distribution function to the marginal distribution of a long memory moving average process. Because of the uniform reduction principle, unlike in the i.i.d. set up, classical tests based on empirical process are relatively easy to implement. More importantly, we discuss fitting the marginal distribution of the error process in location, scale, location–scale and linear regression models. An interesting observation is that in the location model, location–scale model, or more generally in the linear regression models with non-zero intercept parameter, the null weak limit of the first order difference between the residual empirical process and the null model is degenerate at zero, and hence it cannot be used to fit an error distribution in these models for the large samples. This finding is in sharp contrast to a recent claim of Chan and Ling (2008) that the null weak limit of such a process is a continuous Gaussian process. This note also proposes some tests based on the second order difference for the location case. Another finding is that residual empirical process tests in the scale problem are robust against not knowing the scale parameter.  相似文献   

13.
Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well‐known model‐based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross‐validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co‐data”. In particular, we present two novel examples that allow for co‐data: first, a Bayesian spike‐and‐slab setting that facilitates inclusion of multiple co‐data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.  相似文献   

14.
Yi Wan  Min Deng 《Statistics》2013,47(6):1379-1394
In this paper, we investigate the problem of testing for the equality of two distributions. We employ a two-sample Jackknife Empirical Likelihood (JEL) approach to construct a test statistic whose limiting distribution is Chi-square distribution with degree of freedom 1, no matter what the data dimension (fixed) is. A variety of synthetic data experiments demonstrate that our JEL test statistic performs very well, with a very neat asymptotic distribution under the null hypothesis. Furthermore, we apply the test procedure to a real dataset to obtain competitive results.  相似文献   

15.
A common feature for compound Poisson and Katz distributions is that both families may be viewed as generalizations of the Poisson law. In this paper, we present a unified approach in testing the fit to any distribution belonging to either of these families. The test involves the probability generating function, and it is shown to be consistent under general alternatives. The asymptotic null distribution of the test statistic is obtained, and an effective bootstrap procedure is employed in order to investigate the performance of the proposed test with real and simulated data. Comparisons with classical methods based on the empirical distribution function are also included.  相似文献   

16.
In this article, we propose a new class of semiparametric instrumental variable models with partially varying coefficients, in which the structural function has a partially linear form and the impact of endogenous structural variables can vary over different levels of some exogenous variables. We propose a three-step estimation procedure to estimate both functional and constant coefficients. The consistency and asymptotic normality of these proposed estimators are established. Moreover, a generalized F-test is developed to test whether the functional coefficients are of particular parametric forms with some underlying economic intuitions, and furthermore, the limiting distribution of the proposed generalized F-test statistic under the null hypothesis is established. Finally, we illustrate the finite sample performance of our approach with simulations and two real data examples in economics.  相似文献   

17.
Pao-sheng Shen 《Statistics》2015,49(3):602-613
For the regression parameter β in the Cox model, there have been several estimates based on different types of approximated likelihood. For right-censored data, Ren and Zhou [Full likelihood inferences in the Cox model: an empirical approach. Ann Inst Statist Math. 2011;63:1005–1018] derive the full likelihood function for (β, F0), where F0 is the baseline distribution function in the Cox model. In this article, we extend their results to left-truncated and right-censored data with discrete covariates. Using the empirical likelihood parameterization, we obtain the full-profile likelihood function for β when covariates are discrete. Simulation results indicate that the maximum likelihood estimator outperforms Cox's partial likelihood estimator in finite samples.  相似文献   

18.
Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1?λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.  相似文献   

19.
There is an increasing amount of literature focused on Bayesian computational methods to address problems with intractable likelihood. One approach is a set of algorithms known as Approximate Bayesian Computational (ABC) methods. One of the problems with these algorithms is that their performance depends on the appropriate choice of summary statistics, distance measure and tolerance level. To circumvent this problem, an alternative method based on the empirical likelihood has been introduced. This method can be easily implemented when a set of constraints, related to the moments of the distribution, is specified. However, the choice of the constraints is sometimes challenging. To overcome this difficulty, we propose an alternative method based on a bootstrap likelihood approach. The method is easy to implement and in some cases is actually faster than the other approaches considered. We illustrate the performance of our algorithm with examples from population genetics, time series and stochastic differential equations. We also test the method on a real dataset.  相似文献   

20.
Selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. In this paper, we propose a flexible rank-based nonparametric procedure for gene selection from microarray data. In the method we propose a statistic for testing whether area under receiver operating characteristic curve (AUC) for each gene is equal to 0.5 allowing different variance for each gene. The contribution to this “single gene” statistic is the studentization of the empirical AUC, which takes into account the variances associated with each gene in the experiment. Delong et al. proposed a nonparametric procedure for calculating a consistent variance estimator of the AUC. We use their variance estimation technique to get a test statistic, and we focus on the primary step in the gene selection process, namely, the ranking of genes with respect to a statistical measure of differential expression. Two real datasets are analyzed to illustrate the methods and a simulation study is carried out to assess the relative performance of different statistical gene ranking measures. The work includes how to use the variance information to produce a list of significant targets and assess differential gene expressions under two conditions. The proposed method does not involve complicated formulas and does not require advanced programming skills. We conclude that the proposed methods offer useful analytical tools for identifying differentially expressed genes for further biological and clinical analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号