期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A note on robustness of D-optimal block designs for two-colour microarray experiments

R.A. Bailey Katharina Schiffl Ralf-Dieter Hilgers 《Journal of statistical planning and inference》2013

Two-colour microarray experiments form an important tool in gene expression analysis. Due to the high risk of missing observations in microarray experiments, it is fundamental to concentrate not only on optimal designs but also on designs which are robust against missing observations. As an extension of Latif et al. (2009), we define the optimal breakdown number for a collection of designs to describe the robustness, and we calculate the breakdown number for various D-optimal block designs. We show that, for certain values of the numbers of treatments and arrays, the designs which are D-optimal have the highest breakdown number. Our calculations use methods from graph theory. 相似文献

2.

Geometric consistency of principal component scores for high-dimensional mixture models and its application

Kazuyoshi Yata Makoto Aoshima 《Scandinavian Journal of Statistics》2020,47(3):899-921

In this article, we consider clustering based on principal component analysis (PCA) for high-dimensional mixture models. We present theoretical reasons why PCA is effective for clustering high-dimensional data. First, we derive a geometric representation of high-dimension, low-sample-size (HDLSS) data taken from a two-class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop ideas of the geometric representation and provide geometric consistency properties for multiclass mixture models. We show that PCA can cluster HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering using gene expression datasets. 相似文献

3.

Evaluating surrogate variables for improving microarray multiple testing inference

Lunceford JK Chen G Hu PH Mehrotra DV 《Pharmaceutical statistics》2011,10(4):302-310

The use of surrogate variables has been proposed as a means to capture, for a given observed set of data, sources driving the dependency structure among high-dimensional sets of features and remove the effects of those sources and their potential negative impact on simultaneous inference. In this article we illustrate the potential effects of latent variables on testing dependence and the resulting impact on multiple inference, we briefly review the method of surrogate variable analysis proposed by Leek and Storey (PNAS 2008; 105:18718-18723), and assess that method via simulations intended to mimic the complexity of feature dependence observed in real-world microarray data. The method is also assessed via application to a recent Merck microarray data set. Both simulation and case study results indicate that surrogate variable analysis can offer a viable strategy for tackling the multiple testing dependence problem when the features follow a potentially complex correlation structure, yielding improvements in the variability of false positive rates and increases in power. 相似文献

4.

Clustering microarray data using model-based double K-means

Francesca Martella Maurizio Vichi 《Journal of applied statistics》2012,39(9):1853-1869

The microarray technology allows the measurement of expression levels of thousands of genes simultaneously. The dimension and complexity of gene expression data obtained by microarrays create challenging data analysis and management problems ranging from the analysis of images produced by microarray experiments to biological interpretation of results. Therefore, statistical and computational approaches are beginning to assume a substantial position within the molecular biology area. We consider the problem of simultaneously clustering genes and tissue samples (in general conditions) of a microarray data set. This can be useful for revealing groups of genes involved in the same molecular process as well as groups of conditions where this process takes place. The need of finding a subset of genes and tissue samples defining a homogeneous block had led to the application of double clustering techniques on gene expression data. Here, we focus on an extension of standard K-means to simultaneously cluster observations and features of a data matrix, namely double K-means introduced by Vichi (2000). We introduce this model in a probabilistic framework and discuss the advantages of using this approach. We also develop a coordinate ascent algorithm and test its performance via simulation studies and real data set. Finally, we validate the results obtained on the real data set by building resampling confidence intervals for block centroids. 相似文献

5.

DNA甲基化检测方法的研究进展

孙贝娜《福建农林大学学报(哲学社会科学版)》2009,(4)

DNA甲基化是一种重要的遗传外修饰,是表观遗传学(epigenetics)的重要组成部分[l]。它参与了动物胚胎发育、基因印迹和X染色体失活等过程,在基因表达的调控中具有重要作用,异常甲基化可以导致肿瘤的形成。近20年来,DNA甲基化的研究逐渐成为新的研究热点。随着对甲基化研究的不断深人,各种各样甲基化检测方法被开发出来以满足不同类型研究的要求。本文对目前现有常用的CpG岛甲基化检测方法作一综述,并着重介绍几项以芯片技术为基础的高通量研究DNA甲基化的方法。相似文献

6.

The Impact of Measurement Error on Principal Component Analysis

Kristoffer Herland Hellton Magne Thoresen 《Scandinavian Journal of Statistics》2014,41(4):1051-1063

We investigate the effect of measurement error on principal component analysis in the high‐dimensional setting. The effects of random, additive errors are characterized by the expectation and variance of the changes in the eigenvalues and eigenvectors. The results show that the impact of uncorrelated measurement error on the principal component scores is mainly in terms of increased variability and not bias. In practice, the error‐induced increase in variability is small compared with the original variability for the components corresponding to the largest eigenvalues. This suggests that the impact will be negligible when these component scores are used in classification and regression or for visualizing data. However, the measurement error will contribute to a large variability in component loadings, relative to the loading values, such that interpretation based on the loadings can be difficult. The results are illustrated by simulating additive Gaussian measurement error in microarray expression data from cancer tumours and control tissues. 相似文献

7.

The Use of Treatment Concurrences to Assess Robustness of Binary Block Designs Against the Loss of Whole Blocks

下载免费PDF全文

J. D. Godolphin E. J. Godolphin 《Australian & New Zealand Journal of Statistics》2015,57(2):225-239

Criteria are proposed for assessing the robustness of a binary block design against the loss of whole blocks, based on summing entries of selected upper non‐principal sections of the concurrence matrix. These criteria improve on the minimal concurrence concept that has been used previously and provide new conditions for measuring the robustness status of a design. The robustness properties of two‐associate partially balanced designs are considered and it is shown that two categories of group divisible designs are maximally robust. These results expand a classic result in the literature, obtained by Ghosh, which established maximal robustness for the class of balanced block designs. 相似文献

8.

Two sample test for high-dimensional partially paired data

Seokho Lee Insuk Sohn Sin-Ho Jung Cheol-Keun Park 《Journal of applied statistics》2015,42(9):1946-1961

相似文献

9.

Development and validation of biomarker classifiers for treatment selection

Richard Simon 《Journal of statistical planning and inference》2008

Many syndromes traditionally viewed as individual diseases are heterogeneous in molecular pathogenesis and treatment responsiveness. This often leads to the conduct of large clinical trials to identify small average treatment benefits for heterogeneous groups of patients. Drugs that demonstrate effectiveness in such trials may subsequently be used broadly, resulting in ineffective treatment of many patients. New genomic and proteomic technologies provide powerful tools for the selection of patients likely to benefit from a therapeutic without unacceptable adverse events. In spite of the large literature on developing predictive biomarkers, there is considerable confusion about the development and validation of biomarker-based diagnostic classifiers for treatment selection. In this paper we attempt to clarify some of these issues and to provide guidance on the design of clinical trials for evaluating the clinical utility and robustness of pharmacogenomic classifiers. 相似文献

10.

Hierarchical Bayesian meta-analysis models for cross-platform microarray studies

E. M. Conlon B. L. Postier B. A. Methé K. P. Nevin D. R. Lovley 《Journal of applied statistics》2009,36(10):1067-1085

The development of new technologies to measure gene expression has been calling for statistical methods to integrate findings across multiple-platform studies. A common goal of microarray analysis is to identify genes with differential expression between two conditions, such as treatment versus control. Here, we introduce a hierarchical Bayesian meta-analysis model to pool gene expression studies from different microarray platforms: spotted DNA arrays and short oligonucleotide arrays. The studies have different array design layouts, each with multiple sources of data replication, including repeated experiments, slides and probes. Our model produces the gene-specific posterior probability of differential expression, which is the basis for inference. In simulations combining two and five independent studies, our meta-analysis model outperformed separate analyses for three commonly used comparison measures; it also showed improved receiver operating characteristic curves. When combining spotted DNA and CombiMatrix short oligonucleotide array studies of Geobacter sulfurreducens, our meta-analysis model discovered more genes for fixed thresholds of posterior probability of differential expression and Bayesian false discovery than individual study analyses. We also examine an alternative model and compare models using the deviance information criterion. 相似文献