Clustering Binary Oligonucleotide Fingerprint Vectors for DNA Clone Classification Analysis |
| |
Authors: | Zhipeng?Cai,Maysam?Heydari,Guohui?Lin author-information" > author-information__contact u-icon-before" > mailto:ghlin@cs.ualberta.ca" title=" ghlin@cs.ualberta.ca" itemprop=" email" data-track=" click" data-track-action=" Email author" data-track-label=" " >Email author |
| |
Affiliation: | (1) Bioinformatics Research Group, Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada |
| |
Abstract: | We considered the problem of clustering binarized oligonucleotide fingerprints that attempts to identify clusters. Oligonucleotide fingerprinting is a powerful DNA array based method to characterize cDNA and rRNA libraries and has many applications including gene expression profiling and DNA clone classification. DNA clone classification is the main application for the problem considered in this paper. Most of the existing approaches for clustering use normalized real intensity values and thus do not treat positive and negative hybridization signals equally. This is demonstrated in a series of recent publications where a discrete approach typically useful in the classification of microbial rRNA clones has been proposed. In the discrete approach, hybridization intensities are normalized and thresholds are set such that a value of 1 represents hybridization, a value of 0 represents no hybridization, and an N represents unknown, which is also called a missing value. A combinatorial optimization problem is then formulated attempting to cluster the fingerprints and resolve the missing values simultaneously. It has been examined that missing values cause much difficulty in clustering analysis and most clustering methods are very sensitive to them. In this paper, we turned a little back to the traditional clustering problem, which takes in no missing values but with the revised goal to stabilize the number of clusters and maintain the clustering quality. We adopted the binarizing scheme used in the discrete approach as it is shown to be typically useful for the clone classifications. We formulated such a problem into another combinatorial optimization problem. The computational complexity of this new clustering problem and its relationships to the discrete approach and the traditional clustering problem were studied. We have designed an exact algorithm for the new clustering problem, which is an A* search algorithm for finding a minimum number of clusters. The experimental results on two commonly tested real datasets demonstrated that the A* search algorithm runs fast and performs better than some popular hierarchical clustering methods, in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.Supported by NSERC and CFI.Supported by NSERC.Supported partially by NSERC, CFI, and NNSF Grant 60373012. |
| |
Keywords: | DNA array oligonucleotide fingerprinting DNA clone classification clustering combinatorial optimization A* search evaluation function |
本文献已被 SpringerLink 等数据库收录! |
|