首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A closed form analytic expression for the Bell-Doksum Statistic is developed. The use of the generalized hypergeometric function to evaluate the expression is demonstrated, and a table of critical values for 2 n 25 is presented.  相似文献   

2.
Abstract. DNA array technology is an important tool for genomic research due to its capa‐city of measuring simultaneously the expression levels of a great number of genes or fragments of genes in different experimental conditions. An important point in gene expression data analysis is to identify clusters of genes which present similar expression levels. We propose a new procedure for estimating the mixture model for clustering of gene expression data. The proposed method is a posterior split‐merge‐birth MCMC procedure which does not require the specification of the number of components, since it is estimated jointly with component parameters. The strategy for splitting is based on data and on posterior distribution from the previously allocated observations. This procedure defines a quick split proposal in contrary to other split procedures, which require substantial computational effort. The performance of the method is verified using real and simulated datasets.  相似文献   

3.
An expression is derived for the maximum length of the interval estimator of the correlation coefficient, p, under bivariate normal assumptions. The prespecification of this minimum attainable precision and the confidence level results in an expression for the sample size required. An approximate expression for the sample size is proposed and is numerically shown to be as good as or better than that based on the Fisher's Z transformation.  相似文献   

4.
Massively Parallel Signature Sequencing (MPSS) is a high-throughput counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression (SAGE) and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflated Poisson (ZIP) distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base-pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using non-parametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries.  相似文献   

5.
A closed-form expression is presented for the probability integral of the Pearson Type IV distribution, and a corresponding method of evaluation is given. This analysis addresses a long-standing gap in the theory of the Pearson system of distributions. In addition, a simple derivation is given of an expression for the normalizing constant in the Type IV integral.  相似文献   

6.
The present paper is concerned with the various results developed for the theory of successive sampling. The different correlation patterns and the different sampling procedures assumed in the theory are described and further an attempt has been made to derive results for the general correlation pattern. Accordingly, an expression for the best (minimum variance linear unbiased) estimator is presented for the general correlation pattern where the sampling procedure adopted is a restricted one. When the special correlation pattern assumed in the derivation of results breaks down on a particular occasion, its effect on the estimator employed for that occasion is examined. Furthermore, an expression for the variance of the estimator, which is best under a special correlation pattern, is derived when that correlation pattern does not hold good. The expression so obtained is comparable with that furnished by Patterson (1950).  相似文献   

7.
The development of new technologies to measure gene expression has been calling for statistical methods to integrate findings across multiple-platform studies. A common goal of microarray analysis is to identify genes with differential expression between two conditions, such as treatment versus control. Here, we introduce a hierarchical Bayesian meta-analysis model to pool gene expression studies from different microarray platforms: spotted DNA arrays and short oligonucleotide arrays. The studies have different array design layouts, each with multiple sources of data replication, including repeated experiments, slides and probes. Our model produces the gene-specific posterior probability of differential expression, which is the basis for inference. In simulations combining two and five independent studies, our meta-analysis model outperformed separate analyses for three commonly used comparison measures; it also showed improved receiver operating characteristic curves. When combining spotted DNA and CombiMatrix short oligonucleotide array studies of Geobacter sulfurreducens, our meta-analysis model discovered more genes for fixed thresholds of posterior probability of differential expression and Bayesian false discovery than individual study analyses. We also examine an alternative model and compare models using the deviance information criterion.  相似文献   

8.
This paper describes the derivation of the analytical expression for the integrated squared density partial derivative (ISDPD) in a multivariate normal mixture model. The analytical expression of the ISDPD is derived for arbitrary dimensions with partial derivative orders up to four. Although the value of the ISDPD can be obtained by using the common numerical integration via mathematical software (such as Maple, Mathematica, Matlab, etc), it suffers from the limitation of computation time when the dimension or the number of mixture components of the considered multivariate normal mixture model are large. Moreover, numerical comparison indicates the benefits of speed offered by our proposed analytical expression are far superior to the numerical integration performed by Maple. With this analytical expression, the ISDPD can apace be calculated with no assistance of numerical integration.  相似文献   

9.
Summary. Genome-wide measurement of gene expression is a promising approach to the identification of subclasses of cancer that are currently not differentiable, but potentially biologically heterogeneous. This type of molecular classification gives hope for highly individualized and more effective prognosis and treatment of cancer. Statistically, the analysis of gene expression data from unclassified tumours is a complex hypothesis-generating activity, involving data exploration, modelling and expert elicitation. We propose a modelling framework that can be used to inform and organize the development of exploratory tools for classification. Our framework uses latent categories to provide both a statistical definition of differential expression and a precise, experiment-independent, definition of a molecular profile. It also generates natural similarity measures for traditional clustering and gives probabilistic statements about the assignment of tumours to molecular profiles.  相似文献   

10.
A closed form expression for the distribution of a test statistic for comparing the spectral densities of stationary processes is given. This test statistic was introduced by COATES and DIGGLE ( 1986 ) for the unreplicated case and has been extended to the case of replicated observations by POTSCHER and RESCHENHOFER ( 1988 ). A simple method for computing approximate critical values in case of large numbers of replications is also provided. As a by-product an explicit expression for the distribution function of the range of independent variates each distributed as the logarithm of an F-variate i.e up to a factor of 2 each followin Fishers z-distriution is obtained  相似文献   

11.
The maximum likelihood procedure to estimate paraneters of a model has scveral attractive properties including the existence of the covariance matrix which yield asymptotic covariances: for a sample size N the asymptotics are in general of order 1/N. Here we give an asymptotic for the skewness of the distribution of the maximum likelihood estimator of a parameter; this is of order 1/ n2 and this expression is new. Applications relate to the parameters of (i) the Poisson, binomial, and normal density. (ii) the gamna density and (iii) the Beta debsity. Other application are being considered. The expression for the asymptotic skowness at one phase of the study tured out to be unusually complicated involving the asymptotic expressions for variance and bias. When these were identified a much simpler compact expression appeared which we now describe. The work is a much improved treatment of the subject described in Shenton and Bowman (Mariunm likelihood estimation in small samples, Griffin. 1977).  相似文献   

12.
In this paper, the exact explicit expression for the distribution function of the trivariate extreme vector is derived. The advantage of this expression is its facility when one decides to pass to the limit as the sample size increases. The limit forms as well as the conditions by which this limit splits into the product of the limit marginals are obtained. Moreover, some useful recurrence relations are derived.  相似文献   

13.
Some results concerning expressions for moments and L-moments of continuous distributions are given. These include: some decompositions of variance and covariance closely related to decompositions recently given by Yatracos; a similar expression for the population third central moment and a simple proof thereof for nonnegative random variables; an alternative proof of a general expression for L-moments due to Hosking, and some straightforward consequences for inequalities concerning L-moments. Simplicity is a key feature of all results considered in this paper.  相似文献   

14.
Cook-statistic has been developed for detecting outliers in two likely situations of occurrence of outliers in multi-response experiments. In the first situation, more than one outlying observations vector has been considered. Each of these vectors is obtained on the assumption that a particular observation from each of the responses is an outlier. A general expression of Cook-statistic for detecting any such t outlying observations vectors has been obtained. Then some particular cases have been considered. In the second case a situation is considered where observations from all the responses may not be outliers. Here also a general expression of Cook-statistic is obtained for detecting any t observations from each of any k responses as outliers. In both the cases Cook-statistic is applied to real experimental data.  相似文献   

15.
Gene regulatory networks are collections of genes that interact with one other and with other substances in the cell. By measuring gene expression over time using high-throughput technologies, it may be possible to reverse engineer, or infer, the structure of the gene network involved in a particular cellular process. These gene expression data typically have a high dimensionality and a limited number of biological replicates and time points. Due to these issues and the complexity of biological systems, the problem of reverse engineering networks from gene expression data demands a specialized suite of statistical tools and methodologies. We propose a non-standard adaptation of a simulation-based approach known as Approximate Bayesian Computing based on Markov chain Monte Carlo sampling. This approach is particularly well suited for the inference of gene regulatory networks from longitudinal data. The performance of this approach is investigated via simulations and using longitudinal expression data from a genetic repair system in Escherichia coli.  相似文献   

16.
《统计学通讯:理论与方法》2012,41(16-17):3211-3232
The analysis of microarray data is a widespread functional genomics approach that allows for the monitoring of the expression of thousands of genes at once. The analysis of the great amount of data generated in a microarray experiment requires powerful statistical techniques. One of the first tasks of the analysis of microarray data is to cluster data into biologically meaningful groups according to their expression patterns. In this article, we discuss classical as well as recent clustering techniques for microarray data. We pay particular attention to both theoretical and practical issues and give some general indications that might be useful to practitioners.  相似文献   

17.
A closed form expression is obtained for the hazard function for three stochastic two-stage carcinogenesis models, when the normal cell growth is assumed to be piecewise linear.  相似文献   

18.
One of the fundamental issues in analyzing microarray data is to determine which genes are expressed and which ones are not for a given group of subjects. In datasets where many genes are expressed and many are not expressed (i.e., underexpressed), a bimodal distribution for the gene expression levels often results, where one mode of the distribution represents the expressed genes and the other mode represents the underexpressed genes. To model this bimodality, we propose a new class of mixture models that utilize a random threshold value for accommodating bimodality in the gene expression distribution. Theoretical properties of the proposed model are carefully examined. We use this new model to examine the problem of differential gene expression between two groups of subjects, develop prior distributions, and derive a new criterion for determining which genes are differentially expressed between the two groups. Prior elicitation is carried out using empirical Bayes methodology in order to estimate the threshold value as well as elicit the hyperparameters for the two component mixture model. The new gene selection criterion is demonstrated via several simulations to have excellent false positive rate and false negative rate properties. A gastric cancer dataset is used to motivate and illustrate the proposed methodology.  相似文献   

19.
ABSTRACT

In this article, we obtain exact expression for the distribution of the time to failure of discrete time cold standby repairable system under the classical assumptions that both working time and repair time of components are geometric. Our method is based on alternative representation of lifetime as a waiting time random variable on a binary sequence, and combinatorial arguments. Such an exact expression for the time to failure distribution is new in the literature. Furthermore, we obtain the probability generating function and the first two moments of the lifetime random variable.  相似文献   

20.
This paper presents a new Bayesian, infinite mixture model based, clustering approach, specifically designed for time-course microarray data. The problem is to group together genes which have “similar” expression profiles, given the set of noisy measurements of their expression levels over a specific time interval. In order to capture temporal variations of each curve, a non-parametric regression approach is used. Each expression profile is expanded over a set of basis functions and the sets of coefficients of each curve are subsequently modeled through a Bayesian infinite mixture of Gaussian distributions. Therefore, the task of finding clusters of genes with similar expression profiles is then reduced to the problem of grouping together genes whose coefficients are sampled from the same distribution in the mixture. Dirichlet processes prior is naturally employed in such kinds of models, since it allows one to deal automatically with the uncertainty about the number of clusters. The posterior inference is carried out by a split and merge MCMC sampling scheme which integrates out parameters of the component distributions and updates only the latent vector of the cluster membership. The final configuration is obtained via the maximum a posteriori estimator. The performance of the method is studied using synthetic and real microarray data and is compared with the performances of competitive techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号