期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comparative trials in classification of image data

C. C. Taylor R. J. Henery 《Journal of applied statistics》1994,21(1-2):77-91

In this paper, we describe some results of an ESPRIT project known as StatLog whose purpose is the comparison of classification algorithms. We give a brief summary of some of the algorithms in the project: discriminant analysis; nearest neighbours; decision trees; neural net methods; SMART; kernel methods and other Bayesian approaches.We focus on data sets derived from images, ranging from raw pixel data to features and summaries extracted from such data. 相似文献

2.

Multiple-site updates in maximum a posteriori and marginal posterior modes image estimation

Merrilee Hurn Christopher Jennison 《Journal of applied statistics》1993,20(5-6):155-186

We describe standard single-site Monte Carlo Markov chain methods, the Hastings and Metropolis algorithms, the Gibbs sampler and simulated annealing, for maximum a posteriori and marginal posterior modes image estimation. These methods can experience great difficulty in traversing the whole image space in a finite time when the target distribution is multi-modal. We present a survey of multiple-site update methods, including Swendsen and Wang's algorithm, coupled Markov chains and cascade algorithms designed to tackle the problem of moving between modes of the posterior image distribution. We compare the performance of some of these algorithms for sampling from degraded and non-degraded Ising models 相似文献

3.

Multiple imputation: an alternative to top coding for statistical disclosure control

Di An Roderick J. A. Little 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(4):923-940

Summary. Top coding of extreme values of variables like income is a common method of statistical disclosure control, but it creates problems for the data analyst. The paper proposes two alternative methods to top coding for statistical disclosure control that are based on multiple imputation. We show in simulation studies that the multiple-imputation methods provide better inferences of the publicly released data than top coding, using straightforward multiple-imputation methods of analysis, while maintaining good statistical disclosure control properties. We illustrate the methods on data from the 1995 Chinese household income project. 相似文献

4.

数据挖掘中关联规则算法的考察

来升强朱建平《统计与信息论坛》2005,20(1):106-109

文章从算法角度对关联规则的提出、演变过程和前沿研究进行了较为详细的考察,并在此基础上提出了关联规则未来研究的领域和发展趋势。文章先详细地考察了关联规则的三类典型算法,然后总结了基于复杂数据属性的关联规则算法扩展。为考察其他方面的算法扩展和介绍其他学科领域对关联规则的研究奠定了基础。相似文献

5.

The Generalized Birnbaum–Saunders Distribution and Its Theory,Methodology, and Application

Antonio Sanhueza N. Balakrishnan 《统计学通讯:理论与方法》2013,42(5):645-670

In this paper, we discuss the class of generalized Birnbaum–Saunders distributions, which is a very flexible family suitable for modeling lifetime data as it allows for different degrees of kurtosis and asymmetry and unimodality as well as bimodality. We describe the theoretical developments on this model including properties, transformations and related distributions, lifetime analysis, and shape analysis. We also discuss methods of inference based on uncensored and censored data, diagnostics methods, goodness-of-fit tests, and random number generation algorithms for the generalized Birnbaum–Saunders model. Finally, we present some illustrative examples and show that this distribution fits the data better than the classical Birnbaum–Saunders model. 相似文献

6.

Collapsing of Non‐centred Parameterized MCMC Algorithms with Applications to Epidemic Models

下载免费PDF全文

Peter Neal Fei Xiang 《Scandinavian Journal of Statistics》2017,44(1):81-96

Data augmentation is required for the implementation of many Markov chain Monte Carlo (MCMC) algorithms. The inclusion of augmented data can often lead to conditional distributions from well‐known probability distributions for some of the parameters in the model. In such cases, collapsing (integrating out parameters) has been shown to improve the performance of MCMC algorithms. We show how integrating out the infection rate parameter in epidemic models leads to efficient MCMC algorithms for two very different epidemic scenarios, final outcome data from a multitype SIR epidemic and longitudinal data from a spatial SI epidemic. The resulting MCMC algorithms give fresh insight into real‐life epidemic data sets. 相似文献

7.

Inference in molecular population genetics

Matthew Stephens & Peter Donnelly 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2000,62(4):605-635

Full likelihood-based inference for modern population genetics data presents methodological and computational challenges. The problem is of considerable practical importance and has attracted recent attention, with the development of algorithms based on importance sampling (IS) and Markov chain Monte Carlo (MCMC) sampling. Here we introduce a new IS algorithm. The optimal proposal distribution for these problems can be characterized, and we exploit a detailed analysis of genealogical processes to develop a practicable approximation to it. We compare the new method with existing algorithms on a variety of genetic examples. Our approach substantially outperforms existing IS algorithms, with efficiency typically improved by several orders of magnitude. The new method also compares favourably with existing MCMC methods in some problems, and less favourably in others, suggesting that both IS and MCMC methods have a continuing role to play in this area. We offer insights into the relative advantages of each approach, and we discuss diagnostics in the IS framework. 相似文献

8.

Analyses of infectious disease data from household outbreaks by Markov chain Monte Carlo methods 总被引：1，自引：0，他引：1

Philip D. O'Neill David J. Balding Niels G. Becker Mervi Eerola & Denis Mollison 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(4):517-542

The analysis of infectious disease data presents challenges arising from the dependence in the data and the fact that only part of the transmission process is observable. These difficulties are usually overcome by making simplifying assumptions. The paper explores the use of Markov chain Monte Carlo (MCMC) methods for the analysis of infectious disease data, with the hope that they will permit analyses to be made under more realistic assumptions. Two important kinds of data sets are considered, containing temporal and non-temporal information, from outbreaks of measles and influenza. Stochastic epidemic models are used to describe the processes that generate the data. MCMC methods are then employed to perform inference in a Bayesian context for the model parameters. The MCMC methods used include standard algorithms, such as the Metropolis–Hastings algorithm and the Gibbs sampler, as well as a new method that involves likelihood approximation. It is found that standard algorithms perform well in some situations but can exhibit serious convergence difficulties in others. The inferences that we obtain are in broad agreement with estimates obtained by other methods where they are available. However, we can also provide inferences for parameters which have not been reported in previous analyses. 相似文献

9.

Algorithms for Computing the Sample Variance: Analysis and Recommendations

Tony F. Chan Gene H. Golub Randall J. Leveque 《The American statistician》2013,67(3):242-247

The problem of computing the variance of a sample of N data points {x_i } may be difficult for certain data sets, particularly when N is large and the variance is small. We present a survey of possible algorithms and their round-off error bounds, including some new analysis for computations with shifted data. Experimental results confirm these bounds and illustrate the dangers of some algorithms. Specific recommendations are made as to which algorithm should be used in various contexts. 相似文献

10.

Computing and approximating multivariate chi-square probabilities

《Journal of Statistical Computation and Simulation》2012,82(6):1233-1247

We consider computational methods for evaluating and approximating multivariate chi-square probabilities in cases where the pertaining correlation matrix or blocks thereof have a low factorial representation. To this end, techniques from matrix factorization and probability theory are applied. We outline a variety of statistical applications of multivariate chi-square distributions and provide a system of MATLAB programs implementing the proposed algorithms. Computer simulations demonstrate the accuracy and the computational efficiency of our methods in comparison with Monte Carlo approximations, and a real data example from statistical genetics illustrates their usage in practice. 相似文献

11.

Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities 总被引：9，自引：0，他引：9

Ian J. Wilson Michael E. Weale David J. Balding 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2003,166(2):155-188

Summary. We develop a flexible class of Metropolis–Hastings algorithms for drawing inferences about population histories and mutation rates from deoxyribonucleic acid (DNA) sequence data. Match probabilities for use in forensic identification are also obtained, which is particularly useful for mitochondrial DNA profiles. Our data augmentation approach, in which the ancestral DNA data are inferred at each node of the genealogical tree, simplifies likelihood calculations and permits a wide class of mutation models to be employed, so that many different types of DNA sequence data can be analysed within our framework. Moreover, simpler likelihood calculations imply greater freedom for generating tree proposals, so that algorithms with good mixing properties can be implemented. We incorporate the effects of demography by means of simple mechanisms for changes in population size and structure, and we estimate the corresponding demographic parameters, but we do not here allow for the effects of either recombination or selection. We illustrate our methods by application to four human DNA data sets, consisting of DNA sequences, short tandem repeat loci, single-nucleotide polymorphism sites and insertion sites. Two of the data sets are drawn from the male-specific Y-chromosome, one from maternally inherited mitochondrial DNA and one from the β -globin locus on chromosome 11. 相似文献

12.

Dimension-wise sparse low-rank approximation of a matrix with application to variable selection in high-dimensional integrative analyzes of association

J. C. Poythress Cheolwoo Park Jeongyoun Ahn 《Journal of applied statistics》2022,49(15):3889

Many research proposals involve collecting multiple sources of information from a set of common samples, with the goal of performing an integrative analysis describing the associations between sources. We propose a method that characterizes the dominant modes of co-variation between the variables in two datasets while simultaneously performing variable selection. Our method relies on a sparse, low rank approximation of a matrix containing pairwise measures of association between the two sets of variables. We show that the proposed method shares a close connection with another group of methods for integrative data analysis – sparse canonical correlation analysis (CCA). Under some assumptions, the proposed method and sparse CCA aim to select the same subsets of variables. We show through simulation that the proposed method can achieve better variable selection accuracies than two state-of-the-art sparse CCA algorithms. Empirically, we demonstrate through the analysis of DNA methylation and gene expression data that the proposed method selects variables that have as high or higher canonical correlation than the variables selected by sparse CCA methods, which is a rather surprising finding given that objective function of the proposed method does not actually maximize the canonical correlation. 相似文献

13.

Two preprocessing algorithms for climate time series

Stephan Schlüter Milena Kresoja 《Journal of applied statistics》2020,47(11):1970

We propose two preprocessing algorithms suitable for climate time series. The first algorithm detects outliers based on an autoregressive cost update mechanism. The second one is based on the wavelet transform, a method from pattern recognition. In order to benchmark the algorithms'' performance we compare them to existing methods based on a synthetic data set. Eventually, for exemplary purposes, the proposed methods are applied to a data set of high-frequent temperature measurements from Novi Sad, Serbia. The results show that both methods together form a powerful tool for signal preprocessing: In case of solitary outliers the autoregressive cost update mechanism prevails, whereas the wavelet-based mechanism is the method of choice in the presence of multiple consecutive outliers. 相似文献

14.

A likelihood-based approach for multivariate one-sided tests with missing data

Guohai Zhou Lang Wu Rollin Brant J. Mark Ansermino 《Journal of applied statistics》2017,44(11):2000-2016

Inequality-restricted hypotheses testing methods containing multivariate one-sided testing methods are useful in practice, especially in multiple comparison problems. In practice, multivariate and longitudinal data often contain missing values since it may be difficult to observe all values for each variable. However, although missing values are common for multivariate data, statistical methods for multivariate one-sided tests with missing values are quite limited. In this article, motivated by a dataset in a recent collaborative project, we develop two likelihood-based methods for multivariate one-sided tests with missing values, where the missing data patterns can be arbitrary and the missing data mechanisms may be non-ignorable. Although non-ignorable missing data are not testable based on observed data, statistical methods addressing this issue can be used for sensitivity analysis and might lead to more reliable results, since ignoring informative missingness may lead to biased analysis. We analyse the real dataset in details under various possible missing data mechanisms and report interesting findings which are previously unavailable. We also derive some asymptotic results and evaluate our new tests using simulations. 相似文献

15.

A Bootstrap Likelihood Approach to Bayesian Computation

下载免费PDF全文

Weixuan Zhu J. Miguel Marin Fabrizio Leisen 《Australian & New Zealand Journal of Statistics》2016,58(2):227-244

There is an increasing amount of literature focused on Bayesian computational methods to address problems with intractable likelihood. One approach is a set of algorithms known as Approximate Bayesian Computational (ABC) methods. One of the problems with these algorithms is that their performance depends on the appropriate choice of summary statistics, distance measure and tolerance level. To circumvent this problem, an alternative method based on the empirical likelihood has been introduced. This method can be easily implemented when a set of constraints, related to the moments of the distribution, is specified. However, the choice of the constraints is sometimes challenging. To overcome this difficulty, we propose an alternative method based on a bootstrap likelihood approach. The method is easy to implement and in some cases is actually faster than the other approaches considered. We illustrate the performance of our algorithm with examples from population genetics, time series and stochastic differential equations. We also test the method on a real dataset. 相似文献

16.

On Box–Muller Transformation and Simulation of Normal Record Data

N. Balakrishnan H. Y. So X. J. Zhu 《统计学通讯:模拟与计算》2016,45(10):3670-3682

Record data are commonly encountered in many fields such as sports, geography, finance, and reliability. In this article, we use the well-known Box–Muller transformation to develop an efficient method of simulating record data from the normal distribution. Another method based on exponential records is also discussed. Then, the performance of these algorithms is compared with some standard simulation methods. 相似文献

17.

On boosting kernel density methods for multivariate data: density estimation and classification

Marco Di Marzio Charles C. Taylor 《Statistical Methods and Applications》2005,14(2):163-178

Statistical learning is emerging as a promising field where a number of algorithms from machine learning are interpreted as statistical methods and vice-versa. Due to good practical performance, boosting is one of the most studied machine learning techniques. We propose algorithms for multivariate density estimation and classification. They are generated by using the traditional kernel techniques as weak learners in boosting algorithms. Our algorithms take the form of multistep estimators, whose first step is a standard kernel method. Some strategies for bandwidth selection are also discussed with regard both to the standard kernel density classification problem, and to our 'boosted' kernel methods. Extensive experiments, using real and simulated data, show an encouraging practical relevance of the findings. Standard kernel methods are often outperformed by the first boosting iterations and in correspondence of several bandwidth values. In addition, the practical effectiveness of our classification algorithm is confirmed by a comparative study on two real datasets, the competitors being trees including AdaBoosting with trees. 相似文献

18.

Maximum entropy in the mean methods in propensity score matching for interval and noisy data

Laura H. Gunn Henryk Gzyl Enrique ter Horst Miller Janny Ariza 《统计学通讯:理论与方法》2013,42(18):4581-4597

Abstract

In this paper, we propose maximum entropy in the mean methods for propensity score matching classification problems. We provide a new methodological approach and estimation algorithms to handle explicitly cases when data is available: (i) in interval form; (ii) with bounded measurement or observational errors; or (iii) both as intervals and with bounded errors. We show that entropy in the mean methods for these three cases generally outperform benchmark error-free approaches. 相似文献

19.

Simple and flexible Bayesian inferences for standardized regression coefficients

Yonggang Lu Peter Westfall 《Journal of applied statistics》2019,46(12):2254-2288

ABSTRACT

In statistical practice, inferences on standardized regression coefficients are often required, but complicated by the fact that they are nonlinear functions of the parameters, and thus standard textbook results are simply wrong. Within the frequentist domain, asymptotic delta methods can be used to construct confidence intervals of the standardized coefficients with proper coverage probabilities. Alternatively, Bayesian methods solve similar and other inferential problems by simulating data from the posterior distribution of the coefficients. In this paper, we present Bayesian procedures that provide comprehensive solutions for inferences on the standardized coefficients. Simple computing algorithms are developed to generate posterior samples with no autocorrelation and based on both noninformative improper and informative proper prior distributions. Simulation studies show that Bayesian credible intervals constructed by our approaches have comparable and even better statistical properties than their frequentist counterparts, particularly in the presence of collinearity. In addition, our approaches solve some meaningful inferential problems that are difficult if not impossible from the frequentist standpoint, including identifying joint rankings of multiple standardized coefficients and making optimal decisions concerning their sizes and comparisons. We illustrate applications of our approaches through examples and make sample R functions available for implementing our proposed methods. 相似文献

20.

Bayesian nonparametric clustering for large data sets

Zuanetti Daiane Aparecida Müller Peter Zhu Yitan Yang Shengjie Ji Yuan 《Statistics and Computing》2019,29(2):203-215

We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene–gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene–gene interactions extracted from the online search tool “Zodiac.”

相似文献