首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary.  In microarray experiments, accurate estimation of the gene variance is a key step in the identification of differentially expressed genes. Variance models go from the too stringent homoscedastic assumption to the overparameterized model assuming a specific variance for each gene. Between these two extremes there is some room for intermediate models. We propose a method that identifies clusters of genes with equal variance. We use a mixture model on the gene variance distribution. A test statistic for ranking and detecting differentially expressed genes is proposed. The method is illustrated with publicly available complementary deoxyribonucleic acid microarray experiments, an unpublished data set and further simulation studies.  相似文献   

2.
The authors present a new nonparametric approach to test for interaction in two‐way layouts. Based on the concept of composite linear rank statistics, they combine the correlated row and column ranking information to construct the test statistic. They determine the limiting distributions of the proposed test statistic under the null hypothesis and Pitman alternatives. They also propose consistent estimators for the limiting covariance matrices associated with the test. They illustrate the application of their test in practical settings using a microarray data set.  相似文献   

3.
Many methods based on ranked set sampling (RSS) assume perfect ranking of the samples. Here, by using the data measured by a balanced RSS scheme, we propose a nonparametric test for the assumption of perfect ranking. The test statistic that we use formally corresponds to the Jonckheere-Terpstra-type test statistic. We show formal relations of the proposed test for perfect ranking to other methods proposed recently in the literature. Through an empirical power study, we demonstrate that the proposed method performs favorably compared to many of its competitors.  相似文献   

4.
Identifying differentially expressed genes is a basic objective in microarray experiments. Many statistical methods for detecting differentially expressed genes in multiple-slide experiments have been proposed. However, sometimes with limited experimental resources, only a single cDNA array or two Oligonuleotide arrays could be made or only insufficient replicated arrays could be conducted. Many current statistical models cannot be used because of the non-availability of replicated data. Simply using fold changes is also unreliable and inefficient [Chen et al. 1997. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics 2, 364–374; Newton et al. 2001. On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8, 37–52; Pan et al. 2002. How many replicates of arrays are required to detect gene expression changes in microarray experiments? a mixture model approach. Genome Biol. 3, research0022.1-0022.10]. We propose a new method. If the log-transformed ratios for the expressed genes as well as unexpressed genes have equal variance, we use a Hadamard matrix to construct a t-test from a single array data. Basically, we test whether each doubtful gene has significantly differential expression compared to the unexpressed genes. We form some new random variables corresponding to the rows of a Hadamard matrix using the algebraic sum of gene expressions. A one-sample t-test is constructed and the p-value is calculated for each doubtful gene based on these random variables. By using any method for multiple testing, adjusted p-values could be obtained from original p-values and significance of doubtful genes can be determined. When the variance of expressed genes differs from the variance of unexpressed genes, we construct a z-statistic based on the result from application of Hadamard matrix and find the confidence interval to retain the null hypothesis. Using the interval, we determine differentially expressed genes. This method is also useful for multiple microarrays, especially when sufficient replicated data are not available for a traditional t-test. We apply our methodology to ApoAI data. The results appear to be promising. They not only confirm the early known differentially expressed genes, but also indicate more genes to be differentially expressed.  相似文献   

5.
Subset selection procedures based on ranks have been investigated by a number of authors previously. Their methods are based on ranking the samples from all the populations jointly. However, as was pointed out by Rizvi and Woodworth (1970), the procedures they proposed cannot control the probability of a correct selection over the entire parameter space. In this paper, we propose a subset selection procedure based on pairwise rather than joint ranking of the samples. It is shown that this procedure controls the probability of a correct selection over the entire parameter space. It is also shown that the Pitman efficiency of this nonparametric procedure relative to the multivariate t procedure of Gupta (1956, 1965) is the same as the Pitman efficiency of the Mann-Whitney-Wilcoxon test relative to the t-test.  相似文献   

6.
A Bayesian mixture model for differential gene expression   总被引:3,自引:0,他引:3  
Summary.  We propose model-based inference for differential gene expression, using a nonparametric Bayesian probability model for the distribution of gene intensities under various conditions. The probability model is a mixture of normal distributions. The resulting inference is similar to a popular empirical Bayes approach that is used for the same inference problem. The use of fully model-based inference mitigates some of the necessary limitations of the empirical Bayes method. We argue that inference is no more difficult than posterior simulation in traditional nonparametric mixture-of-normal models. The approach proposed is motivated by a microarray experiment that was carried out to identify genes that are differentially expressed between normal tissue and colon cancer tissue samples. Additionally, we carried out a small simulation study to verify the methods proposed. In the motivating case-studies we show how the nonparametric Bayes approach facilitates the evaluation of posterior expected false discovery rates. We also show how inference can proceed even in the absence of a null sample of known non-differentially expressed scores. This highlights the difference from alternative empirical Bayes approaches that are based on plug-in estimates.  相似文献   

7.
In this article, empirical likelihood inferences for semiparametric varying-coefficient partially linear models with longitudinal data are investigated. We propose a groupwise empirical likelihood procedure to handle the inter-series dependence of the longitudinal data. By using residual-adjustment, an empirical likelihood ratio function for the nonparametric component is constructed, and a nonparametric version Wilks' phenomenons is proved. Compared with methods based on normal approximations, the empirical likelihood does not require consistent estimators for the asymptotic variance and bias. A simulation study is undertaken to assess the finite sample performance of the proposed confidence regions.  相似文献   

8.
Fan J  Feng Y  Niu YS 《Annals of statistics》2010,38(5):2723-2750
Estimation of genewise variance arises from two important applications in microarray data analysis: selecting significantly differentially expressed genes and validation tests for normalization of microarray data. We approach the problem by introducing a two-way nonparametric model, which is an extension of the famous Neyman-Scott model and is applicable beyond microarray data. The problem itself poses interesting challenges because the number of nuisance parameters is proportional to the sample size and it is not obvious how the variance function can be estimated when measurements are correlated. In such a high-dimensional nonparametric problem, we proposed two novel nonparametric estimators for genewise variance function and semiparametric estimators for measurement correlation, via solving a system of nonlinear equations. Their asymptotic normality is established. The finite sample property is demonstrated by simulation studies. The estimators also improve the power of the tests for detecting statistically differentially expressed genes. The methodology is illustrated by the data from MicroArray Quality Control (MAQC) project.  相似文献   

9.
This paper deals with the nonparametric estimation of the mean and variance functions of univariate time series data. We propose a nonparametric dimension reduction technique for both mean and variance functions of time series. This method does not require any model specification and instead we seek directions in both the mean and variance functions such that the conditional distribution of the current observation given the vector of past observations is the same as that of the current observation given a few linear combinations of the past observations without loss of inferential information. The directions of the mean and variance functions are estimated by maximizing the Kullback–Leibler distance function. The consistency of the proposed estimators is established. A computational procedure is introduced to detect lags of the conditional mean and variance functions in practice. Numerical examples and simulation studies are performed to illustrate and evaluate the performance of the proposed estimators.  相似文献   

10.
In this paper, we study the multi-class differential gene expression detection for microarray data. We propose a likelihood-based approach to estimating an empirical null distribution to incorporate gene interactions and provide a more accurate false-positive control than the commonly used permutation or theoretical null distribution-based approach. We propose to rank important genes by p-values or local false discovery rate based on the estimated empirical null distribution. Through simulations and application to lung transplant microarray data, we illustrate the competitive performance of the proposed method.  相似文献   

11.
One of the major unresolved problems in the area of nonparametric statistics is the need for satisfactory rank-based test procedures for non-additive models in the two-way layout, especially when there is only one observation on each combination of the levels of the experimental factors. In this paper we consider an arbitrary non-additive model for the two-way layout with n levels of each factor. We utilize both alignment and ranking of the data together with basic properties of Latin squares to develop rank tests for interaction (non-additivity). Our technique involves first aligning within one of the main effects, ranking within the other main effects (columns and rows) and then adding the resulting ranks within “interaction bands” corresponding to orthogonal partitions of the interaction for the model, as denoted by the letters of an n × n Latin square. A Friedman-type statistic is then computed on the resulting sums. This is repeated for each of (n?1) mutually orthogonal Latin squares (thus accounting for all the interaction degrees of freedom). The resulting (n?1) Friedman-type statistics are finally combined to obtain an overall test statistic. The necessary null distribution tables for applying the proposed test for non-additivity are presented and we discuss the results of a Monte Carlo simulation study of the relative powers of this new procedure and other (parametric and nonparametric) procedures designed to detect interaction in a two-way layout with one observation per cell.  相似文献   

12.
Qunfang Xu 《Statistics》2017,51(6):1280-1303
In this paper, semiparametric modelling for longitudinal data with an unstructured error process is considered. We propose a partially linear additive regression model for longitudinal data in which within-subject variances and covariances of the error process are described by unknown univariate and bivariate functions, respectively. We provide an estimating approach in which polynomial splines are used to approximate the additive nonparametric components and the within-subject variance and covariance functions are estimated nonparametrically. Both the asymptotic normality of the resulting parametric component estimators and optimal convergence rate of the resulting nonparametric component estimators are established. In addition, we develop a variable selection procedure to identify significant parametric and nonparametric components simultaneously. We show that the proposed SCAD penalty-based estimators of non-zero components have an oracle property. Some simulation studies are conducted to examine the finite-sample performance of the proposed estimation and variable selection procedures. A real data set is also analysed to demonstrate the usefulness of the proposed method.  相似文献   

13.
Summary.  A typical microarray experiment attempts to ascertain which genes display differential expression in different samples. We model the data by using a two-component mixture model and develop an empirical Bayesian thresholding procedure, which was originally introduced for thresholding wavelet coefficients, as an alternative to the existing methods for determining differential expression across thousands of genes. The method is built on sound theoretical properties and has easy computer implementation in the R statistical package. Furthermore, we consider improvements to the standard empirical Bayesian procedure when replication is present, to increase the robustness and reliability of the method. We provide an introduction to microarrays for those who are unfamilar with the field and the proposed procedure is demonstrated with applications to two-channel complementary DNA microarray experiments.  相似文献   

14.
Microarray studies are now common for human, agricultural plant and animal studies. False discovery rate (FDR) is widely used in the analysis of large-scale microarray data to account for problems associated with multiple testing. A well-designed microarray study should have adequate statistical power to detect the differentially expressed (DE) genes, while keeping the FDR acceptably low. In this paper, we used a mixture model of expression responses involving DE genes and non-DE genes to analyse theoretical FDR and power for simple scenarios where it is assumed that each gene has equal error variance and the gene effects are independent. A simulation study was used to evaluate the empirical FDR and power for more complex scenarios with unequal error variance and gene dependence. Based on this approach, we present a general guide for sample size requirement at the experimental design stage for prospective microarray studies. This paper presented an approach to explicitly connect the sample size with FDR and power. While the methods have been developed in the context of one-sample microarray studies, they are readily applicable to two-sample, and could be adapted to multiple-sample studies.  相似文献   

15.
In this article, we consider detection and estimation of change points in nonparametric hazard rate models. Wavelet methods are utilized to develop a testing procedure for change points detection. The asymptotic properties of the test statistic are explored. When there exist change points in hazard function, we also propose estimators for the number, the locations, and the jump sizes of the change points. The asymptotic properties of these estimators are systematically derived. Some simulation examples are conducted to assess the finite sample performance of the proposed approach and to make comparisons with some existing methods. A real data analysis is provided to illustrate the new approach.  相似文献   

16.
The varying-coefficient model is an important nonparametric statistical model since it allows appreciable flexibility on the structure of fitted model. For ultra-high dimensional heterogeneous data it is very necessary to examine how the effects of covariates vary with exposure variables at different quantile level of interest. In this paper, we extended the marginal screening methods to examine and select variables by ranking a measure of nonparametric marginal contributions of each covariate given the exposure variable. Spline approximations are employed to model marginal effects and select the set of active variables in quantile-adaptive framework. This ensures the sure screening property in quantile-adaptive varying-coefficient model. Numerical studies demonstrate that the proposed procedure works well for heteroscedastic data.  相似文献   

17.
18.
Kai B  Li R  Zou H 《Annals of statistics》2011,39(1):305-332
The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications. In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression procedure. We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate. Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors. In addition, it is shown that the loss in efficiency is at most 11.1% for estimating varying coefficient functions and is no greater than 13.6% for estimating parametric components. To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove that the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. Finally, we apply the new methods to analyze the plasma beta-carotene level data.  相似文献   

19.
ABSTRACT

In this article we present a new solution to test for effects in unreplicated two-level factorial designs. The proposed test statistic, in case the error components are normally distributed, follows an F random variable, though our attention is on its nonparametric permutation version. The proposed procedure does not require any transformation of data such as residualization and it is exact for each effect and distribution-free. Our main aim is to discuss a permutation solution conditional to the original vector of responses. We give two versions of the same nonparametric testing procedure in order to control both the individual error rate and the experiment-wise error rate. A power comparison with Loughin and Noble's test is provided in the case of a unreplicated 24 full factorial design.  相似文献   

20.
We propose a nonparametric test for diagnosis of the proportionality assumption between hazard functions based on a functional equation. Because of involvement of censoring distribution, we consider the test procedure in an asymptotic manner and obtain the asymptotic normality for the proposed test statistic. Then we discuss the rationale of use of the functional equation for the initial effect model. Finally we compare our test with others using an example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号