首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 796 毫秒
1.
An image that is mapped into a bit stream suitable for communication over or storage in a digital medium is said to have been compressed. Using tree-structured vector quantizers (TSVQs) is an approach to image compression in which clustering algorithms are combined with ideas from tree-structured classification to provide code books that can be searched quickly and simply. The overall goal is to optimize the quality of the compressed image subject to a constraint on the communication or storage capacity, i.e. on the allowed bit rate. General goals of image compression and vector quantization are summarized in this paper. There is discussion of methods for code book design, particularly the generalized Lloyd algorithm for clustering, and methods for splitting and pruning that have been extended from the design of classification trees to TSVQs. The resulting codes, called pruned TSVQs, are of variable rate, and yield lower distortion than fixed-rate, full-search vector quantizers for a given average bit rate. They have simple encoders and a natural successive approximation (progressive) property. Applications of pruned TSVQs are discussed, particularly compressing computerized tomography images. In this work, the key issue is not merely the subjective attractiveness of the compressed image but rather whether the diagnostic accuracy is adversely aflected by compression. In recent work, TSVQs have been combined with other types of image processing, including segmentation and enhancement. The relationship between vector quantizer performance and the size of the training sequence used to design the code and other asymptotic properties of the codes are discussed.  相似文献   

2.
3.
Finite memory sources and variable‐length Markov chains have recently gained popularity in data compression and mining, in particular, for applications in bioinformatics and language modelling. Here, we consider denser data compression and prediction with a family of sparse Bayesian predictive models for Markov chains in finite state spaces. Our approach lumps transition probabilities into classes composed of invariant probabilities, such that the resulting models need not have a hierarchical structure as in context tree‐based approaches. This can lead to a substantially higher rate of data compression, and such non‐hierarchical sparse models can be motivated for instance by data dependence structures existing in the bioinformatics context. We describe a Bayesian inference algorithm for learning sparse Markov models through clustering of transition probabilities. Experiments with DNA sequence and protein data show that our approach is competitive in both prediction and classification when compared with several alternative methods on the basis of variable memory length.  相似文献   

4.
We propose an intuitive and computationally simple algorithm for clustering the probability density functions (pdfs). A data-driven learning mechanism is incorporated in the algorithm in order to determine the suitable widths of the clusters. The clustering results prove that the proposed algorithm is able to automatically group the pdfs and provide the optimal cluster number without any a priori information. The performance study also shows that the proposed algorithm is more efficient than existing ones. In addition, the clustering can serve as the intermediate compression tool in content-based multimedia retrieval that we apply the proposed algorithm to categorize a subset of COREL image database. And the clustering results indicate that the proposed algorithm performs well in colour image categorization.  相似文献   

5.
This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat à l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the ‘true’ ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem.  相似文献   

6.
This is the first of a projected series of papers dealing with computational experimentation in mathematical programming. This paper provides early results of a test case using four discrete linear L1 approximation codes. Variables influencing code behavior are identified and measures of performance are specified. More importantly, an experimental design is developed for assessing code performance and is illustrated using the variable “problem size”.  相似文献   

7.
综合Adaboost算法的自适应再加权和随机森林算法的未修剪随机变量划分树基模型,文章提出了用于自适应随机森林算法。通过实验数据发现,在训练集较大、贝叶斯误差较小时,模拟自适应再加权会起作用,从而,拟自适应随机森林算法会优于随机森林算法。  相似文献   

8.
Algebraic-geometric codes have a t-error-correcting pair which corrects errors up to half the designed minimum distance. A generalization of the Roos bound is given from cyclic to linear codes. An MDS code of minimum distance 5 has a 2-error-correcting pair if and only if it is an extended-generalized-Reed–Solomon code  相似文献   

9.
Computer models simulating a physical process are used in many areas of science. Due to the complex nature of these codes it is often necessary to approximate the code, which is typically done using a Gaussian process. In many situations the number of code runs available to build the Gaussian process approximation is limited. When the initial design is small or the underlying response surface is complicated this can lead to poor approximations of the code output. In order to improve the fit of the model, sequential design strategies must be employed. In this paper we introduce two simple distance based metrics that can be used to augment an initial design in a batch sequential manner. In addition we propose a sequential updating strategy to an orthogonal array based Latin hypercube sample. We show via various real and simulated examples that the distance metrics and the extension of the orthogonal array based Latin hypercubes work well in practice.  相似文献   

10.
The binary linear codes generated by incidence matrices of the 80 Steiner triple systems on 15 points (STS(15)) are studied. The 80 codes of length 35 spanned by incidence vectors of the points are all non-isomorphic. In contrast, a pair of codes of length 15 generated by blocks are isomorphic if and only if the corresponding incidence matrices have the same rank over GF(2). The weight distribution, the automorphism groups of the codes, and the distribution of the Steiner triple systems within the codes are computed. There are 54 codes of length 35 that contain several non-isomorphic STS(15)'s, and any such code is generated by an STS(15) of largest 2-rank.  相似文献   

11.
An upper bound is presented for the average Hamming distance (avd) of binary codes by using the average Hamming weight (avw) of the codes. The upper bound is attained by a large class of codes. Hence, for this class of codes, a relation between the avd and the avw of a code is established. The results are applied to the avd between two codes, similar results are obtained.  相似文献   

12.
We describe a simple procedure for constructing Steiner 2-designs with the parameters of the designs of points and lines of a finite projective or affine geometry of dimension m⩾3; the codes of many of the designs constructed in this way will contain the code of the relevant finite-geometry design (a Reed-Muller or generalized Reed-Muller code). The designs can be extended to 3-designs provided that planes in the finite-geometry design extend.  相似文献   

13.
The support vector machine (SVM) has been successfully applied to various classification areas with great flexibility and a high level of classification accuracy. However, the SVM is not suitable for the classification of large or imbalanced datasets because of significant computational problems and a classification bias toward the dominant class. The SVM combined with the k-means clustering (KM-SVM) is a fast algorithm developed to accelerate both the training and the prediction of SVM classifiers by using the cluster centers obtained from the k-means clustering. In the KM-SVM algorithm, however, the penalty of misclassification is treated equally for each cluster center even though the contributions of different cluster centers to the classification can be different. In order to improve classification accuracy, we propose the WKM–SVM algorithm which imposes different penalties for the misclassification of cluster centers by using the number of data points within each cluster as a weight. As an extension of the WKM–SVM, the recovery process based on WKM–SVM is suggested to incorporate the information near the optimal boundary. Furthermore, the proposed WKM–SVM can be successfully applied to imbalanced datasets with an appropriate weighting strategy. Experiments show the effectiveness of our proposed methods.  相似文献   

14.
Many spatial data such as those in climatology or environmental monitoring are collected over irregular geographical locations. Furthermore, it is common to have multivariate observations at each location. We propose a method of segmentation of a region of interest based on such data that can be carried out in two steps: (1) clustering or classification of irregularly sample points and (2) segmentation of the region based on the classified points.

We develop a spatially-constrained clustering algorithm for segmentation of the sample points by incorporating a geographical-constraint into the standard clustering methods. Both hierarchical and nonhierarchical methods are considered. The latter is a modification of the seeded region growing method known in image analysis. Both algorithms work on a suitable neighbourhood structure, which can for example be defined by the Delaunay triangulation of the sample points. The number of clusters is estimated by testing the significance of successive change in the within-cluster sum-of-squares relative to a null permutation distribution. The methodology is validated on simulated data and used in construction of a climatology map of Ireland based on meteorological data of daily rainfall records from 1294 stations over the period of 37 years.  相似文献   

15.
A vast collection of reusable mathematical and statistical software is now available for use by scientists and engineers in their modeling efforts. This software represents a significant source of mathematical expertise, created and maintained at considerable expense. Unfortunately, the collection is so heterogeneous that it is a tedious and error-prone task simply to determine what software is available to solve a given problem. In mathematical problem solving environments of the future such questions will be fielded by expert software advisory systems. One way for such systems to systematically associate available software with the problems they solve is to use a problem classification system. In this paper we describe a detailed tree-structured problem-oriented classification system appropriate for such use.  相似文献   

16.
Methods are developed to test the performance of software for calculating the standard normal distribution function. The accuracy and implementation details of the tests are described. Results are presented on a selection of available codes, showing a wide variation in performance. At least one published code is shown to have severe defects.  相似文献   

17.
The performance of computationally inexpensive model selection criteria in the context of tree-structured subgroup analysis is investigated. It is shown through simulation that no single model selection criterion exhibits a uniformly superior performance over a wide range of scenarios. Therefore, a two-stage approach for model selection is proposed and shown to perform satisfactorily. Applied example of subgroup analysis is presented. Problems associated with tree-structured subgroup analysis are discussed and practical solutions are suggested.  相似文献   

18.
Summary.  Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.  相似文献   

19.
All affine resolvable designs with parameters of the design of the hyperplanes in ternary affine 3-space are enumerated. This enumeration implies the classification (up to equivalence), of all optimal equidistant ternary codes of length 13 and distance 9, as well as all complete orthogonal arrays of strength 2 with 3 symbols, 13 constraints and index 3. Up to isomorphism, there are exactly 68 such designs. The automorphism groups and the rank of the incidence matrices over GF(3) are computed. There are six designs with point-transitive automorphism groups, and one design with trivial group. The affine geometry design is the unique design with lowest 3-rank, and the only design with 2-transitive automorphism group.  相似文献   

20.
It is often desirable to use Gray codes with properties different from those of the standard binary reflected code. Previously, only short codes (32- to 256-element) could be systematically generated or found with a given set of properties. This paper describes a technique where long codes (65000-element or longer) as well as short codes can systematically be generated with desired properties. The technique is described and demostrated by generating codes of various lenght with the desired property of equal column change counts. Several examples of the use of the technique for generating codes with other desired properties are outlined.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号