期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Vector quantization: clustering and classification trees

Pamela C. Cosman Robert M. Gray Richard A. Olshen 《Journal of applied statistics》1994,21(1-2):93-108

An image that is mapped into a bit stream suitable for communication over or storage in a digital medium is said to have been compressed. Using tree-structured vector quantizers (TSVQs) is an approach to image compression in which clustering algorithms are combined with ideas from tree-structured classification to provide code books that can be searched quickly and simply. The overall goal is to optimize the quality of the compressed image subject to a constraint on the communication or storage capacity, i.e. on the allowed bit rate. General goals of image compression and vector quantization are summarized in this paper. There is discussion of methods for code book design, particularly the generalized Lloyd algorithm for clustering, and methods for splitting and pruning that have been extended from the design of classification trees to TSVQs. The resulting codes, called pruned TSVQs, are of variable rate, and yield lower distortion than fixed-rate, full-search vector quantizers for a given average bit rate. They have simple encoders and a natural successive approximation (progressive) property. Applications of pruned TSVQs are discussed, particularly compressing computerized tomography images. In this work, the key issue is not merely the subjective attractiveness of the compressed image but rather whether the diagnostic accuracy is adversely aflected by compression. In recent work, TSVQs have been combined with other types of image processing, including segmentation and enhancement. The relationship between vector quantizer performance and the size of the training sequence used to design the code and other asymptotic properties of the codes are discussed. 相似文献

2.

Reliability computing method combined with compressed storage and Krylov subspace for phased-mission system

Hua Yan Li Gao Kui Wang Lei Qi 《统计学通讯:理论与方法》2017,46(20):9972-9984

相似文献

3.

Sparse Markov Chains for Sequence Data

Väinö Jääskinen Jie Xiong Jukka Corander Timo Koski 《Scandinavian Journal of Statistics》2014,41(3):639-655

Finite memory sources and variable‐length Markov chains have recently gained popularity in data compression and mining, in particular, for applications in bioinformatics and language modelling. Here, we consider denser data compression and prediction with a family of sparse Bayesian predictive models for Markov chains in finite state spaces. Our approach lumps transition probabilities into classes composed of invariant probabilities, such that the resulting models need not have a hierarchical structure as in context tree‐based approaches. This can lead to a substantially higher rate of data compression, and such non‐hierarchical sparse models can be motivated for instance by data dependence structures existing in the bioinformatics context. We describe a Bayesian inference algorithm for learning sparse Markov models through clustering of transition probabilities. Experiments with DNA sequence and protein data show that our approach is competitive in both prediction and classification when compared with several alternative methods on the basis of variable memory length. 相似文献

4.

An automatic clustering algorithm for probability density functions

《Journal of Statistical Computation and Simulation》2012,82(15):3047-3063

We propose an intuitive and computationally simple algorithm for clustering the probability density functions (pdfs). A data-driven learning mechanism is incorporated in the algorithm in order to determine the suitable widths of the clusters. The clustering results prove that the proposed algorithm is able to automatically group the pdfs and provide the optimal cluster number without any a priori information. The performance study also shows that the proposed algorithm is more efficient than existing ones. In addition, the clustering can serve as the intermediate compression tool in content-based multimedia retrieval that we apply the proposed algorithm to categorize a subset of COREL image database. And the clustering results indicate that the proposed algorithm performs well in colour image categorization. 相似文献

5.

Projection-based curve clustering

《Journal of Statistical Computation and Simulation》2012,82(8):1145-1168

This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat à l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the ‘true’ ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem. 相似文献

6.

Methodology and analysis for comparing discrete linear l1 approximation codes

J. Gilsinn K. Hoffman R. H. F. Jackson E. Leyendecker P. Saunders D. Shier 《统计学通讯:模拟与计算》2013,42(4):399-413

This is the first of a projected series of papers dealing with computational experimentation in mathematical programming. This paper provides early results of a test case using four discrete linear L₁ approximation codes. Variables influencing code behavior are identified and measures of performance are specified. More importantly, an experimental design is developed for assessing code performance and is illustrated using the variable “problem size”. 相似文献

7.

拟适应再加权分类随机森林

马景义谢邦昌《统计与信息论坛》2010,25(3):13-16

综合Adaboost算法的自适应再加权和随机森林算法的未修剪随机变量划分树基模型,文章提出了用于自适应随机森林算法。通过实验数据发现,在训练集较大、贝叶斯误差较小时,模拟自适应再加权会起作用,从而,拟自适应随机森林算法会优于随机森林算法。相似文献

8.

On the existence of error-correcting pairs

《Journal of statistical planning and inference》1996,51(2):229-242

Algebraic-geometric codes have a t-error-correcting pair which corrects errors up to half the designed minimum distance. A generalization of the Roos bound is given from cyclic to linear codes. An MDS code of minimum distance 5 has a 2-error-correcting pair if and only if it is an extended-generalized-Reed–Solomon code 相似文献

9.

Batch sequential designs for computer experiments

Jason L. Loeppky Leslie M. Moore Brian J. Williams 《Journal of statistical planning and inference》2010

Computer models simulating a physical process are used in many areas of science. Due to the complex nature of these codes it is often necessary to approximate the code, which is typically done using a Gaussian process. In many situations the number of code runs available to build the Gaussian process approximation is limited. When the initial design is small or the underlying response surface is complicated this can lead to poor approximations of the code output. In order to improve the fit of the model, sequential design strategies must be employed. In this paper we introduce two simple distance based metrics that can be used to augment an initial design in a batch sequential manner. In addition we propose a sequential updating strategy to an orthogonal array based Latin hypercube sample. We show via various real and simulated examples that the distance metrics and the extension of the orthogonal array based Latin hypercubes work well in practice. 相似文献

10.

Steiner triple systems of order 15 and their codes

《Journal of statistical planning and inference》1997,58(1):207-216

The binary linear codes generated by incidence matrices of the 80 Steiner triple systems on 15 points (STS(15)) are studied. The 80 codes of length 35 spanned by incidence vectors of the points are all non-isomorphic. In contrast, a pair of codes of length 15 generated by blocks are isomorphic if and only if the corresponding incidence matrices have the same rank over GF(2). The weight distribution, the automorphism groups of the codes, and the distribution of the Steiner triple systems within the codes are computed. There are 54 codes of length 35 that contain several non-isomorphic STS(15)'s, and any such code is generated by an STS(15) of largest 2-rank. 相似文献

11.

A relation between the average Hamming distance and the average Hamming weight of binary codes

《Journal of statistical planning and inference》2001,94(2):413-419

An upper bound is presented for the average Hamming distance (avd) of binary codes by using the average Hamming weight (avw) of the codes. The upper bound is attained by a large class of codes. Hence, for this class of codes, a relation between the avd and the avw of a code is established. The results are applied to the avd between two codes, similar results are obtained. 相似文献

12.

Extendable Steiner designs from finite geometries

《Journal of statistical planning and inference》1996,56(2):181-186

We describe a simple procedure for constructing Steiner 2-designs with the parameters of the designs of points and lines of a finite projective or affine geometry of dimension m⩾3; the codes of many of the designs constructed in this way will contain the code of the relevant finite-geometry design (a Reed-Muller or generalized Reed-Muller code). The designs can be extended to 3-designs provided that planes in the finite-geometry design extend. 相似文献

13.

Weighted Support Vector Machine Using k-Means Clustering

Sungwan Bang 《统计学通讯:模拟与计算》2013,42(10):2307-2324

The support vector machine (SVM) has been successfully applied to various classification areas with great flexibility and a high level of classification accuracy. However, the SVM is not suitable for the classification of large or imbalanced datasets because of significant computational problems and a classification bias toward the dominant class. The SVM combined with the k-means clustering (KM-SVM) is a fast algorithm developed to accelerate both the training and the prediction of SVM classifiers by using the cluster centers obtained from the k-means clustering. In the KM-SVM algorithm, however, the penalty of misclassification is treated equally for each cluster center even though the contributions of different cluster centers to the classification can be different. In order to improve classification accuracy, we propose the WKM–SVM algorithm which imposes different penalties for the misclassification of cluster centers by using the number of data points within each cluster as a weight. As an extension of the WKM–SVM, the recovery process based on WKM–SVM is suggested to incorporate the information near the optimal boundary. Furthermore, the proposed WKM–SVM can be successfully applied to imbalanced datasets with an appropriate weighting strategy. Experiments show the effectiveness of our proposed methods. 相似文献

14.

Constrained clustering of irregularly sampled spatial data

《Journal of Statistical Computation and Simulation》2012,82(12):853-865

Many spatial data such as those in climatology or environmental monitoring are collected over irregular geographical locations. Furthermore, it is common to have multivariate observations at each location. We propose a method of segmentation of a region of interest based on such data that can be carried out in two steps: (1) clustering or classification of irregularly sample points and (2) segmentation of the region based on the classified points.

We develop a spatially-constrained clustering algorithm for segmentation of the sample points by incorporating a geographical-constraint into the standard clustering methods. Both hierarchical and nonhierarchical methods are considered. The latter is a modification of the seeded region growing method known in image analysis. Both algorithms work on a suitable neighbourhood structure, which can for example be defined by the Delaunay triangulation of the sample points. The number of clusters is estimated by testing the significance of successive change in the within-cluster sum-of-squares relative to a null permutation distribution. The methodology is validated on simulated data and used in construction of a climatology map of Ireland based on meteorological data of daily rainfall records from 1294 stations over the period of 37 years. 相似文献

15.

The guide to available mathematical software problem classification system1

Ronald F. Boisvert Sally E. Howe David K. Kahaner 《统计学通讯:模拟与计算》2013,42(4):811-842

A vast collection of reusable mathematical and statistical software is now available for use by scientists and engineers in their modeling efforts. This software represents a significant source of mathematical expertise, created and maintained at considerable expense. Unfortunately, the collection is so heterogeneous that it is a tedious and error-prone task simply to determine what software is available to solve a given problem. In mathematical problem solving environments of the future such questions will be fielded by expert software advisory systems. One way for such systems to systematically associate available software with the problems they solve is to use a problem classification system. In this paper we describe a detailed tree-structured problem-oriented classification system appropriate for such use. 相似文献

16.

Performance evaluation of normal distribution software

Allan J. Macleod 《Statistics and Computing》1992,2(1):1-5

Methods are developed to test the performance of software for calculating the standard normal distribution function. The accuracy and implementation details of the tests are described. Results are presented on a selection of available codes, showing a wide variation in performance. At least one published code is shown to have severe defects. 相似文献

17.

Tree-structured subgroup analysis for censored survival data: Validation of computationally inexpensive model selection criteria

Abdissa?Negassa Email author Antonio?Ciampi Michal?Abrahamowicz Stanley?Shapiro Jean-Fran?ois?Boivin 《Statistics and Computing》2005,15(3):231-239

The performance of computationally inexpensive model selection criteria in the context of tree-structured subgroup analysis is investigated. It is shown through simulation that no single model selection criterion exhibits a uniformly superior performance over a wide range of scenarios. Therefore, a two-stage approach for model selection is proposed and shown to perform satisfactorily. Applied example of subgroup analysis is presented. Problems associated with tree-structured subgroup analysis are discussed and practical solutions are suggested. 相似文献

18.

A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms

Claudio J. Verzilli John C. Whittaker Nigel Stallard Daniel Chasman 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(1):191-206

Summary. Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree. 相似文献

19.

Classification of affine resolvable 2-(27, 9, 4) designs

《Journal of statistical planning and inference》1996,56(2):187-202

All affine resolvable designs with parameters of the design of the hyperplanes in ternary affine 3-space are enumerated. This enumeration implies the classification (up to equivalence), of all optimal equidistant ternary codes of length 13 and distance 9, as well as all complete orthogonal arrays of strength 2 with 3 symbols, 13 constraints and index 3. Up to isomorphism, there are exactly 68 such designs. The automorphism groups and the rank of the incidence matrices over GF(3) are computed. There are six designs with point-transitive automorphism groups, and one design with trivial group. The affine geometry design is the unique design with lowest 3-rank, and the only design with 2-transitive automorphism group. 相似文献

20.

A technique for generating Gray codes

J.E. Ludman J.L. Sampson 《Journal of statistical planning and inference》1981,5(2):171-180

It is often desirable to use Gray codes with properties different from those of the standard binary reflected code. Previously, only short codes (32- to 256-element) could be systematically generated or found with a given set of properties. This paper describes a technique where long codes (65000-element or longer) as well as short codes can systematically be generated with desired properties. The technique is described and demostrated by generating codes of various lenght with the desired property of equal column change counts. Several examples of the use of the technique for generating codes with other desired properties are outlined. 相似文献