期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detecting homogeneous segments in DNA sequences by using hidden Markov models 总被引：2，自引：0，他引：2

R. J. Boys D. A. Henderson & D. J. Wilkinson 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(2):269-285

In recent years there has been a rapid growth in the amount of DNA being sequenced and in its availability through genetic databases. Statistical techniques which identify structure within these sequences can be of considerable assistance to molecular biologists particularly when they incorporate the discrete nature of changes caused by evolutionary processes. This paper focuses on the detection of homogeneous segments within heterogeneous DNA sequences. In particular, we study an intron from the chimpanzee α-fetoprotein gene; this protein plays an important role in the embryonic development of mammals. We present a Bayesian solution to this segmentation problem using a hidden Markov model implemented by Markov chain Monte Carlo methods. We consider the important practical problem of specifying informative prior knowledge about sequences of this type. Two Gibbs sampling algorithms are contrasted and the sensitivity of the analysis to the prior specification is investigated. Model selection and possible ways to overcome the label switching problem are also addressed. Our analysis of intron 7 identifies three distinct homogeneous segment types, two of which occur in more than one region, and one of which is reversible. 相似文献

2.

Estimating the proportion of true null hypotheses, with application to DNA microarray data

Mette Langaas Bo Henry Lindqvist Egil Ferkingstad 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(4):555-572

Summary. We consider the problem of estimating the proportion of true null hypotheses, π ₀, in a multiple-hypothesis set-up. The tests are based on observed p -values. We first review published estimators based on the estimator that was suggested by Schweder and Spjøtvoll. Then we derive new estimators based on nonparametric maximum likelihood estimation of the p -value density, restricting to decreasing and convex decreasing densities. The estimators of π ₀ are all derived under the assumption of independent test statistics. Their performance under dependence is investigated in a simulation study. We find that the estimators are relatively robust with respect to the assumption of independence and work well also for test statistics with moderate dependence. 相似文献

3.

Clustering objects on subsets of attributes (with discussion)

Jerome H. Friedman Jacqueline J. Meulman 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2004,66(4):815-849

Summary. A new procedure is proposed for clustering attribute value data. When used in conjunction with conventional distance-based clustering algorithms this procedure encourages those algorithms to detect automatically subgroups of objects that preferentially cluster on subsets of the attribute variables rather than on all of them simultaneously. The relevant attribute subsets for each individual cluster can be different and partially (or completely) overlap with those of other clusters. Enhancements for increasing sensitivity for detecting especially low cardinality groups clustering on a small subset of variables are discussed. Applications in different domains, including gene expression arrays, are presented. 相似文献

4.

A New Approach to the Estimation of Inter-Variable Correlation

Marc Sobel Bud Mishra 《统计学通讯:理论与方法》2013,42(15):2315-2330

The use of different measures of similarity between observed vectors for the purposes of classifying or clustering them has been expanding dramatically in recent years. One result of this expansion has been the use of many new similarity measures, designed for the purpose of satisfying various criteria. A noteworthy application involves estimating the relationships between genes using microarray experimental data. We consider the class of ‘correlation-type’ similarity measures. The use of these new measures of similarity suggest that the whole problem needs to be formulated in statistical terms to clarify their relative benefits. Pursuant to this need, we define, for each given observed vector, a baseline representing the ‘true’ value common to each of the component observations. These ‘true’ values are taken to be parameters. We define the ‘true correlation’ between each two observed vectors as the average (over the distribution of the observations for given baseline parameters) of Pearson's correlation with sample means replaced by the corresponding baseline parameters. Estimators of this true correlation are assessed using their mean squared error (MSE). Proper Bayes estimators of this true correlation, being based on the predictive posterior distribution of the data, are both difficult to calculate/analyze and highly non robust. By constrast, empirical Bayes estimators are: (i) close to their Bayesian counterparts; (ii) easy to analyze; and (iii) strongly robust. For these reasons, we employ empirical Bayes estimators of correlation in place of their Bayesian counterparts. We show how to construct two different kinds of simultaneous Bayes correlation estimators: the first assumes no apriori correlation between baseline parameters; the second assumes a common unknown correlation between them. Estimators of the latter type frequently have significantly smaller MSE than those of the former type which, in turn, frequently have significantly smaller MSE than their Pearson estimator counterparts. For purposes of illustrating our results, we examine the problem of inferring the relationships between gene expression level vectors, in the context of observing microarray experimental data. 相似文献

5.

荧光假单胞菌BIT-18磷脂酶B基因的克隆及生物信息学分析

姜芳燕黄申李春《琼州学院学报》2013,(5):51-58

利用PCR从Pseudomonas fluorescens BIT-18总DNA中成功扩增到编码Pf-PLB的全长基因,并进行测序.通过生物信息学分析,可知Pf-PLB基因全长1272 bp,编码423个氨基酸,理论分子量为45.8kDa,等电点为5.53,在N末端有一个包含23个氨基酸的信号肽.进化树及序列分析结果显示,pfplb是PLBs新基因家族的成员.采用模建软件Modeller进行人工建模,结果表明Pf-PLB为由14股Strand组成的β-桶状蛋白,本研究为Pf-PLB进一步高效表达及其脱胶机理研究奠定基础. 相似文献

6.

An evaluation of common methods for dichotomization of continuous variables to discriminate disease status

S. L. Prince Nelson V. Ramakrishnan P. J. Nietert D. L. Kamen P. S. Ramos B. J. Wolf 《统计学通讯:理论与方法》2017,46(21):10823-10834

Dichotomization of continuous variables to discriminate a dichotomous outcome is often useful in statistical applications. If a true threshold for a continuous variable exists, the challenge is identifying it. This paper examines common methods for dichotomization to identify which ones recover a true threshold. We provide mathematical and numeric proofs demonstrating that maximizing the odds ratio, Youden’s statistic, Gini Index, chi-square statistic, relative risk and kappa statistic all theoretically recover a true threshold. A simulation study evaluating the ability of these statistics to recover a threshold when sampling from a population indicates that maximizing the chi-square statistic and Gini Index have the smallest bias and variability when the probability of being larger than the threshold is small while maximizing Kappa or Youden’s statistics is best when this probability is larger. Maximizing odds ratio is the most variable and biased of the methods. 相似文献

7.

Some Fundamental Properties of a Multivariate von Mises Distribution

Kanti V. Mardia 《统计学通讯:理论与方法》2014,43(6):1132-1144

In application areas like bioinformatics, multivariate distributions on angles are encountered which show significant clustering. One approach to statistical modeling of such situations is to use mixtures of unimodal distributions. In the literature (Mardia et al., 2012 Mardia , K. V. , Kent , J. T. , Zhang , Z. , Taylor , C. , Hamelryck , T. ( 2012 ). Mixtures of concentrated multivariate sine distributions with applications to bioinformatics . J. Appl. Stat. 39 : 2475 – 2492 .[Taylor &; Francis Online], [Web of Science ®] , [Google Scholar]), the multivariate von Mises distribution, also known as the multivariate sine distribution, has been suggested for components of such models, but work in the area has been hampered by the fact that no good criteria for the von Mises distribution to be unimodal were available. In this article we study the question about when a multivariate von Mises distribution is unimodal. We give sufficient criteria for this to be the case and show examples of distributions with multiple modes when these criteria are violated. In addition, we propose a method to generate samples from the von Mises distribution in the case of high concentration. 相似文献

8.

基于多基因组合选择模型的结肠癌特征基因选取

马超《统计与信息论坛》2012,27(6):78-82

通过基因的Bhattacharyya距离指标过滤掉大部分无关基因,然后探索性的提出了一种建立多基因组合选择模型的统计方法.从候选特征基因中选取了8个可能的结肠癌特征基因集合,判别分析的结果证明了该方法的可行性. 相似文献