期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using a Markov Chain to Construct a Tractable Approximation of an Intractable Probability Distribution

JAMES P. HOBERT GALIN L. JONES CHRISTIAN P. ROBERT 《Scandinavian Journal of Statistics》2006,33(1):37-51

Abstract. Let π denote an intractable probability distribution that we would like to explore. Suppose that we have a positive recurrent, irreducible Markov chain that satisfies a minorization condition and has π as its invariant measure. We provide a method of using simulations from the Markov chain to construct a statistical estimate of π from which it is straightforward to sample. We show that this estimate is 'strongly consistent' in the sense that the total variation distance between the estimate and π converges to 0 almost surely as the number of simulations grows. Moreover, we use some recently developed asymptotic results to provide guidance as to how much simulation is necessary. Draws from the estimate can be used to approximate features of π or as intelligent starting values for the original Markov chain. We illustrate our methods with two examples. 相似文献

2.

On Maximum Depth and Related Classifiers 总被引：1，自引：0，他引：1

ANIL K. GHOSH PROBAL CHAUDHURI 《Scandinavian Journal of Statistics》2005,32(2):327-350

Abstract. Over the last couple of decades, data depth has emerged as a powerful exploratory and inferential tool for multivariate data analysis with wide-spread applications. This paper investigates the possible use of different notions of data depth in non-parametric discriminant analysis. First, we consider the situation where the prior probabilities of the competing populations are all equal and investigate classifiers that assign an observation to the population with respect to which it has the maximum location depth. We propose a different depth-based classification technique for unequal prior problems, which is also useful for equal prior cases, especially when the populations have different scatters and shapes. We use some simulated data sets as well as some benchmark real examples to evaluate the performance of these depth-based classifiers. Large sample behaviour of the misclassification rates of these depth-based non-parametric classifiers have been derived under appropriate regularity conditions. 相似文献

3.

Mode Jumping Proposals in MCMC 总被引：1，自引：1，他引：0

Hakon Tjelmeland & Bjorn Kare Hegstad 《Scandinavian Journal of Statistics》2001,28(1):205-223

Markov chain Monte Carlo algorithms generate samples from a target distribution by simulating a Markov chain. Large flexibility exists in specification of transition matrix of the chain. In practice, however, most algorithms used only allow small changes in the state vector in each iteration. This choice typically causes problems for multi-modal distributions as moves between modes become rare and, in turn, results in slow convergence to the target distribution. In this paper we consider continuous distributions on Rⁿ and specify how optimization for local maxima of the target distribution can be incorporated in the specification of the Markov chain. Thereby, we obtain a chain with frequent jumps between modes. We demonstrate the effectiveness of the approach in three examples. The first considers a simple mixture of bivariate normal distributions, whereas the two last examples consider sampling from posterior distributions based on previously analysed data sets. 相似文献

4.

The use of simple reparameterizations to improve the efficiency of Markov chain Monte Carlo estimation for multilevel models with applications to discrete time survival models

William J. Browne Fiona Steele Mousa Golalizadeh Martin J. Green 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(3):579-598

Summary. We consider the application of Markov chain Monte Carlo (MCMC) estimation methods to random-effects models and in particular the family of discrete time survival models. Survival models can be used in many situations in the medical and social sciences and we illustrate their use through two examples that differ in terms of both substantive area and data structure. A multilevel discrete time survival analysis involves expanding the data set so that the model can be cast as a standard multilevel binary response model. For such models it has been shown that MCMC methods have advantages in terms of reducing estimate bias. However, the data expansion results in very large data sets for which MCMC estimation is often slow and can produce chains that exhibit poor mixing. Any way of improving the mixing will result in both speeding up the methods and more confidence in the estimates that are produced. The MCMC methodological literature is full of alternative algorithms designed to improve mixing of chains and we describe three reparameterization techniques that are easy to implement in available software. We consider two examples of multilevel survival analysis: incidence of mastitis in dairy cattle and contraceptive use dynamics in Indonesia. For each application we show where the reparameterization techniques can be used and assess their performance. 相似文献

5.

SIMPCA: a framework for rotating and sparsifying principal components

Giovanni Maria Merola 《Journal of applied statistics》2020,47(8):1325

We propose an algorithmic framework for computing sparse components from rotated principal components. This methodology, called SIMPCA, is useful to replace the unreliable practice of ignoring small coefficients of rotated components when interpreting them. The algorithm computes genuinely sparse components by projecting rotated principal components onto subsets of variables. The so simplified components are highly correlated with the corresponding components. By choosing different simplification strategies different sparse solutions can be obtained which can be used to compare alternative interpretations of the principal components. We give some examples of how effective simplified solutions can be achieved with SIMPCA using some publicly available data sets. 相似文献

6.

Mathematical morphology: A useful set of tools for image analysis 总被引：2，自引：0，他引：2

Breen Edmond J. Jones Ronald Talbot Hugues 《Statistics and Computing》2000,10(2):105-120

In this paper we give an overview of both classical and more modern morphological techniques. We will demonstrate their utility through a range of practical examples. After discussing the fundamental morphological ideas, we show how the classic morphological opening and closing filters lead to measures of size via granulometries, and we will discuss briefly their implementation. We also present an overview of morphological segmentation techniques, and the use of connected openings and thinnings will be demonstrated. This then leads us into the more recent set-theoretic notions of graph based approaches to image analysis. 相似文献

7.

A Family of Near-Exact Distributions Based on Truncations of the Exact Distribution for the Generalized Wilks Lambda Statistic

《统计学通讯:理论与方法》2012,41(13-14):2321-2341

For the case where at least two sets have an odd number of variables we do not have the exact distribution of the generalized Wilks Lambda statistic in a manageable form, adequate for manipulation. In this article, we develop a family of very accurate near-exact distributions for this statistic for the case where two or three sets have an odd number of variables. We first express the exact characteristic function of the logarithm of the statistic in the form of the characteristic function of an infinite mixture of Generalized Integer Gamma distributions. Then, based on truncations of this exact characteristic function, we obtain a family of near-exact distributions, which, by construction, match the first two exact moments. These near-exact distributions display an asymptotic behaviour for increasing number of variables involved. The corresponding cumulative distribution functions are obtained in a concise and manageable form, relatively easy to implement computationally, allowing for the computation of virtually exact quantiles. We undertake a comparative study for small sample sizes, using two proximity measures based on the Berry-Esseen bounds, to assess the performance of the near-exact distributions for different numbers of sets of variables and different numbers of variables in each set. 相似文献

8.

Variational approximation for heteroscedastic linear models and matching pursuit algorithms

David J. Nott Minh-Ngoc Tran Chenlei Leng 《Statistics and Computing》2012,22(2):497-512

Modern statistical applications involving large data sets have focused attention on statistical methodologies which are both efficient computationally and able to deal with the screening of large numbers of different candidate models. Here we consider computationally efficient variational Bayes approaches to inference in high-dimensional heteroscedastic linear regression, where both the mean and variance are described in terms of linear functions of the predictors and where the number of predictors can be larger than the sample size. We derive a closed form variational lower bound on the log marginal likelihood useful for model selection, and propose a novel fast greedy search algorithm on the model space which makes use of one-step optimization updates to the variational lower bound in the current model for screening large numbers of candidate predictor variables for inclusion/exclusion in a computationally thrifty way. We show that the model search strategy we suggest is related to widely used orthogonal matching pursuit algorithms for model search but yields a framework for potentially extending these algorithms to more complex models. The methodology is applied in simulations and in two real examples involving prediction for food constituents using NIR technology and prediction of disease progression in diabetes. 相似文献

9.

COMBINING INFORMATION FROM LINEAR MODELS ON POSSIBLY DIFFERENT SCALES

Mark Berman 《Australian & New Zealand Journal of Statistics》1984,26(2):160-168

This paper examines the joint statistical analysis of M independent data sets, the jth of which satisfies the model λ_j Y_j=X_jB +ε_j, where the λ_j are unknown and the ε_i are normally distributed with a known correlation structure. The maximum likelihood equations, their asymptotic covariance matrix, and the likelihood ratio test of the hypothesis that the λ_js are all equal are derived. These results are applied to two examples. 相似文献

10.

Modelling conditional variance function in industrial data: A case study

《Statistical Methodology》2008,5(6):564-575

We study the suitability of different modelling methods for joint prediction of mean and variance based on large data sets. We review the approaches to the modelling of conditional variance function that are capable of handling a problem where conditional variance depends on about 10 explanatory variables and training dataset consists of 100 000 observations. We present a promising approach for neural network modelling of mean and dispersion. We compare different approaches in predicting the mechanical properties of steel in two case data sets collected from the production line of a steel plate mill. As a conclusion we give some recommendations concerning the modelling of conditional variance in large datasets. 相似文献

11.

Controlling processes in food technology by simplifying the canonical form of fitted response surfaces

S. G. Gilmour & T. J. Ringrose 《Journal of the Royal Statistical Society. Series C, Applied statistics》1999,48(1):91-101

Second-order response surfaces are often fitted to the results of designed experiments, and the canonical form of such surfaces can greatly help both in interpreting the results and in deciding what action to take on the process under study. A mixing process on pastry dough is described in which it is desired to simplify the canonical form to make the control of the process more economical, by basing it on only two of the three factors. We give examples where a simplification is possible with minimal loss of accuracy and where it can be seriously misleading, and we outline the features of the response surface that lead to these two situations. A method of improving the simplification by recalculating the constrained canonical axis is proposed. These methods ensure that the mixing process can be controlled by using only two factors without seriously lowering the quality of the pastry. 相似文献

12.

A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines

M.F. Hutchinson 《统计学通讯:模拟与计算》2013,42(3):1059-1076

An unbiased stochastic estimator of tr(I–A), where A is the influence matrix associated with the calculation of Laplacian smoothing splines, is described. The estimator is similar to one recently developed by Girard but satisfies a minimum variance criterion and does not require the simulation of a standard normal variable. It uses instead simulations of the discrete random variable which takes the values 1, -1 each with probability 1/2. Bounds on the variance of the estimator, similar to those established by Girard, are obtained using elementary methods. The estimator can be used to approximately minimize generalised cross validation (GCV) when using discretized iterative methods for fitting Laplacian smoothing splines to very large data sets. Simulated examples show that the estimated trace values, using either the estimator presented here or the estimator of Girard, perform almost as well as the exact values when applied to the minimization of GCV for n as small as a few hundred, where n is the number of data points. 相似文献

13.

Bayesian Additive Machine: classification with a semiparametric discriminant function

《Journal of Statistical Computation and Simulation》2012,82(4):682-695

In this paper, we propose a new Bayesian inference approach for classification based on the traditional hinge loss used for classical support vector machines, which we call the Bayesian Additive Machine (BAM). Unlike existing approaches, the new model has a semiparametric discriminant function where some feature effects are nonlinear and others are linear. This separation of features is achieved automatically during model fitting without user pre-specification. Following the literature on sparse regression of high-dimensional models, we can also identify the irrelevant features. By introducing spike-and-slab priors using two sets of indicator variables, these multiple goals are achieved simultaneously and automatically, without any parameter tuning such as cross-validation. An efficient partially collapsed Markov chain Monte Carlo algorithm is developed for posterior exploration based on a data augmentation scheme for the hinge loss. Our simulations and three real data examples demonstrate that the new approach is a strong competitor to some approaches that were proposed recently for dealing with challenging classification examples with high dimensionality. 相似文献

14.

A Contrasting Study of Likelihood Methods for the Analysis of Longitudinal Binary Data

Weiming Yang 《统计学通讯:理论与方法》2014,43(14):3027-3046

相似文献

15.

A note on existence and construction of invariant loss functions

Haojin Zhou 《Statistics》2013,47(6):1335-1343

In a statistical decision problem, if the model is invariant under a transformation group, it is desirable or even compelling to apply equivariance for choosing a decision rule. However, formal equivariance also requires an invariant loss function. In this paper, we give a necessary and sufficient condition for the existence of invariant loss functions, and characterize all invariant loss functions, when the condition is satisfied. Analogous results for the more general case, where the quantity of inferential interest depends also on the observed data, are presented. We also discuss connections among our results and the equivariance literature and present some illustrative examples. 相似文献

16.

Testing lumpability for marginal discrete hidden Markov models

Roberto Colombi Sabrina Giordano 《AStA Advances in Statistical Analysis》2011,95(3):293-311

When some states of a Markov chain are aggregated (or lumped) and the new process, with lumped states, inherits the Markov property, the original chain is said to be lumpable. We discuss the notion of lumpability for discrete hidden Markov models (DHMMs) and we explain why, in general, testing this hypothesis leads to non-standard problems. Nevertheless, we present a case where lumpability in DHMMs is a regular problem of comparing nested models. Finally, some simulation results assessing the performance of the proposed test and an application to two real data sets are given. 相似文献

17.

Bayesian models for relative archaeological chronology building 总被引：1，自引：0，他引：1

Caitlin E. Buck & Sujit K. Sahu 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(4):423-440

For many years, archaeologists have postulated that the numbers of various artefact types found within excavated features should give insight about their relative dates of deposition even when stratigraphic information is not present. A typical data set used in such studies can be reported as a cross-classification table (often called an abundance matrix or, equivalently, a contingency table) of excavated features against artefact types. Each entry of the table represents the number of a particular artefact type found in a particular archaeological feature. Methodologies for attempting to identify temporal sequences on the basis of such data are commonly referred to as seriation techniques. Several different procedures for seriation including both parametric and non-parametric statistics have been used in an attempt to reconstruct relative chronological orders on the basis of such contingency tables. We develop some possible model-based approaches that might be used to aid in relative, archaeological chronology building. We use the recently developed Markov chain Monte Carlo method based on Langevin diffusions to fit some of the models proposed. Predictive Bayesian model choice techniques are then employed to ascertain which of the models that we develop are most plausible. We analyse two data sets taken from the literature on archaeological seriation. 相似文献

18.

Teaching examples for the design of experiments: geographical sensitivity and the self‐fulfilling prophecy

下载免费PDF全文

Dennis W. Lendrem B. Clare Lendrem Ruth Rowland‐Jones Fabio D'Agostino Matt Linsley Martin R. Owen John D. Isaacs 《Pharmaceutical statistics》2016,15(1):90-92

Many scientists believe that small experiments, guided by scientific intuition, are simpler and more efficient than design of experiments. This belief is strong and persists even in the face of data demonstrating that it is clearly wrong. In this paper, we present two powerful teaching examples illustrating the dangers of small experiments guided by scientific intuition. We describe two, simple, two‐dimensional spaces. These two spaces give rise to, and at the same time appear to generate supporting data for, scientific intuitions that are deeply flawed or wholly incorrect. We find these spaces useful in unfreezing scientific thinking and challenging the misplaced confidence in scientific intuition. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

19.

Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues

Dobra Adrian Karr Alan F. Sanil Ashish P. 《Statistics and Computing》2003,13(4):363-370

Dissemination of information derived from large contingency tables formed from confidential data is a major responsibility of statistical agencies. In this paper we present solutions to several computational and algorithmic problems that arise in the dissemination of cross-tabulations (marginal sub-tables) from a single underlying table. These include data structures that exploit sparsity to support efficient computation of marginals and algorithms such as iterative proportional fitting, as well as a generalized form of the shuttle algorithm that computes sharp bounds on (small, confidentiality threatening) cells in the full table from arbitrary sets of released marginals. We give examples illustrating the techniques. 相似文献

20.

Asymptotic properties of nonparametric M-estimation for mixing functional data

Jia Chen Lixin Zhang 《Journal of statistical planning and inference》2009

We investigate the asymptotic behavior of a nonparametric M-estimator of a regression function for stationary dependent processes, where the explanatory variables take values in some abstract functional space. Under some regularity conditions, we give the weak and strong consistency of the estimator as well as its asymptotic normality. We also give two examples of functional processes that satisfy the mixing conditions assumed in this paper. Furthermore, a simulated example is presented to examine the finite sample performance of the proposed estimator. 相似文献