首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
Abstract.  Let π denote an intractable probability distribution that we would like to explore. Suppose that we have a positive recurrent, irreducible Markov chain that satisfies a minorization condition and has π as its invariant measure. We provide a method of using simulations from the Markov chain to construct a statistical estimate of π from which it is straightforward to sample. We show that this estimate is 'strongly consistent' in the sense that the total variation distance between the estimate and π converges to 0 almost surely as the number of simulations grows. Moreover, we use some recently developed asymptotic results to provide guidance as to how much simulation is necessary. Draws from the estimate can be used to approximate features of π or as intelligent starting values for the original Markov chain. We illustrate our methods with two examples.  相似文献   

2.
On Maximum Depth and Related Classifiers   总被引:1,自引:0,他引:1  
Abstract.  Over the last couple of decades, data depth has emerged as a powerful exploratory and inferential tool for multivariate data analysis with wide-spread applications. This paper investigates the possible use of different notions of data depth in non-parametric discriminant analysis. First, we consider the situation where the prior probabilities of the competing populations are all equal and investigate classifiers that assign an observation to the population with respect to which it has the maximum location depth. We propose a different depth-based classification technique for unequal prior problems, which is also useful for equal prior cases, especially when the populations have different scatters and shapes. We use some simulated data sets as well as some benchmark real examples to evaluate the performance of these depth-based classifiers. Large sample behaviour of the misclassification rates of these depth-based non-parametric classifiers have been derived under appropriate regularity conditions.  相似文献   

3.
Mode Jumping Proposals in MCMC   总被引:1,自引:1,他引:0  
Markov chain Monte Carlo algorithms generate samples from a target distribution by simulating a Markov chain. Large flexibility exists in specification of transition matrix of the chain. In practice, however, most algorithms used only allow small changes in the state vector in each iteration. This choice typically causes problems for multi-modal distributions as moves between modes become rare and, in turn, results in slow convergence to the target distribution. In this paper we consider continuous distributions on R n and specify how optimization for local maxima of the target distribution can be incorporated in the specification of the Markov chain. Thereby, we obtain a chain with frequent jumps between modes. We demonstrate the effectiveness of the approach in three examples. The first considers a simple mixture of bivariate normal distributions, whereas the two last examples consider sampling from posterior distributions based on previously analysed data sets.  相似文献   

4.
Summary.  We consider the application of Markov chain Monte Carlo (MCMC) estimation methods to random-effects models and in particular the family of discrete time survival models. Survival models can be used in many situations in the medical and social sciences and we illustrate their use through two examples that differ in terms of both substantive area and data structure. A multilevel discrete time survival analysis involves expanding the data set so that the model can be cast as a standard multilevel binary response model. For such models it has been shown that MCMC methods have advantages in terms of reducing estimate bias. However, the data expansion results in very large data sets for which MCMC estimation is often slow and can produce chains that exhibit poor mixing. Any way of improving the mixing will result in both speeding up the methods and more confidence in the estimates that are produced. The MCMC methodological literature is full of alternative algorithms designed to improve mixing of chains and we describe three reparameterization techniques that are easy to implement in available software. We consider two examples of multilevel survival analysis: incidence of mastitis in dairy cattle and contraceptive use dynamics in Indonesia. For each application we show where the reparameterization techniques can be used and assess their performance.  相似文献   

5.
We propose an algorithmic framework for computing sparse components from rotated principal components. This methodology, called SIMPCA, is useful to replace the unreliable practice of ignoring small coefficients of rotated components when interpreting them. The algorithm computes genuinely sparse components by projecting rotated principal components onto subsets of variables. The so simplified components are highly correlated with the corresponding components. By choosing different simplification strategies different sparse solutions can be obtained which can be used to compare alternative interpretations of the principal components. We give some examples of how effective simplified solutions can be achieved with SIMPCA using some publicly available data sets.  相似文献   

6.
Mathematical morphology: A useful set of tools for image analysis   总被引:2,自引:0,他引:2  
In this paper we give an overview of both classical and more modern morphological techniques. We will demonstrate their utility through a range of practical examples. After discussing the fundamental morphological ideas, we show how the classic morphological opening and closing filters lead to measures of size via granulometries, and we will discuss briefly their implementation. We also present an overview of morphological segmentation techniques, and the use of connected openings and thinnings will be demonstrated. This then leads us into the more recent set-theoretic notions of graph based approaches to image analysis.  相似文献   

7.
《统计学通讯:理论与方法》2012,41(13-14):2321-2341
For the case where at least two sets have an odd number of variables we do not have the exact distribution of the generalized Wilks Lambda statistic in a manageable form, adequate for manipulation. In this article, we develop a family of very accurate near-exact distributions for this statistic for the case where two or three sets have an odd number of variables. We first express the exact characteristic function of the logarithm of the statistic in the form of the characteristic function of an infinite mixture of Generalized Integer Gamma distributions. Then, based on truncations of this exact characteristic function, we obtain a family of near-exact distributions, which, by construction, match the first two exact moments. These near-exact distributions display an asymptotic behaviour for increasing number of variables involved. The corresponding cumulative distribution functions are obtained in a concise and manageable form, relatively easy to implement computationally, allowing for the computation of virtually exact quantiles. We undertake a comparative study for small sample sizes, using two proximity measures based on the Berry-Esseen bounds, to assess the performance of the near-exact distributions for different numbers of sets of variables and different numbers of variables in each set.  相似文献   

8.
Modern statistical applications involving large data sets have focused attention on statistical methodologies which are both efficient computationally and able to deal with the screening of large numbers of different candidate models. Here we consider computationally efficient variational Bayes approaches to inference in high-dimensional heteroscedastic linear regression, where both the mean and variance are described in terms of linear functions of the predictors and where the number of predictors can be larger than the sample size. We derive a closed form variational lower bound on the log marginal likelihood useful for model selection, and propose a novel fast greedy search algorithm on the model space which makes use of one-step optimization updates to the variational lower bound in the current model for screening large numbers of candidate predictor variables for inclusion/exclusion in a computationally thrifty way. We show that the model search strategy we suggest is related to widely used orthogonal matching pursuit algorithms for model search but yields a framework for potentially extending these algorithms to more complex models. The methodology is applied in simulations and in two real examples involving prediction for food constituents using NIR technology and prediction of disease progression in diabetes.  相似文献   

9.
This paper examines the joint statistical analysis of M independent data sets, the jth of which satisfies the model λj Yj=XjB +εj, where the λj are unknown and the εi are normally distributed with a known correlation structure. The maximum likelihood equations, their asymptotic covariance matrix, and the likelihood ratio test of the hypothesis that the λjs are all equal are derived. These results are applied to two examples.  相似文献   

10.
We study the suitability of different modelling methods for joint prediction of mean and variance based on large data sets. We review the approaches to the modelling of conditional variance function that are capable of handling a problem where conditional variance depends on about 10 explanatory variables and training dataset consists of 100 000 observations. We present a promising approach for neural network modelling of mean and dispersion. We compare different approaches in predicting the mechanical properties of steel in two case data sets collected from the production line of a steel plate mill. As a conclusion we give some recommendations concerning the modelling of conditional variance in large datasets.  相似文献   

11.
Second-order response surfaces are often fitted to the results of designed experiments, and the canonical form of such surfaces can greatly help both in interpreting the results and in deciding what action to take on the process under study. A mixing process on pastry dough is described in which it is desired to simplify the canonical form to make the control of the process more economical, by basing it on only two of the three factors. We give examples where a simplification is possible with minimal loss of accuracy and where it can be seriously misleading, and we outline the features of the response surface that lead to these two situations. A method of improving the simplification by recalculating the constrained canonical axis is proposed. These methods ensure that the mixing process can be controlled by using only two factors without seriously lowering the quality of the pastry.  相似文献   

12.
An unbiased stochastic estimator of tr(I–A), where A is the influence matrix associated with the calculation of Laplacian smoothing splines, is described. The estimator is similar to one recently developed by Girard but satisfies a minimum variance criterion and does not require the simulation of a standard normal variable. It uses instead simulations of the discrete random variable which takes the values 1, -1 each with probability 1/2. Bounds on the variance of the estimator, similar to those established by Girard, are obtained using elementary methods. The estimator can be used to approximately minimize generalised cross validation (GCV) when using discretized iterative methods for fitting Laplacian smoothing splines to very large data sets. Simulated examples show that the estimated trace values, using either the estimator presented here or the estimator of Girard, perform almost as well as the exact values when applied to the minimization of GCV for n as small as a few hundred, where n is the number of data points.  相似文献   

13.
In this paper, we propose a new Bayesian inference approach for classification based on the traditional hinge loss used for classical support vector machines, which we call the Bayesian Additive Machine (BAM). Unlike existing approaches, the new model has a semiparametric discriminant function where some feature effects are nonlinear and others are linear. This separation of features is achieved automatically during model fitting without user pre-specification. Following the literature on sparse regression of high-dimensional models, we can also identify the irrelevant features. By introducing spike-and-slab priors using two sets of indicator variables, these multiple goals are achieved simultaneously and automatically, without any parameter tuning such as cross-validation. An efficient partially collapsed Markov chain Monte Carlo algorithm is developed for posterior exploration based on a data augmentation scheme for the hinge loss. Our simulations and three real data examples demonstrate that the new approach is a strong competitor to some approaches that were proposed recently for dealing with challenging classification examples with high dimensionality.  相似文献   

14.
15.
Haojin Zhou 《Statistics》2013,47(6):1335-1343
In a statistical decision problem, if the model is invariant under a transformation group, it is desirable or even compelling to apply equivariance for choosing a decision rule. However, formal equivariance also requires an invariant loss function. In this paper, we give a necessary and sufficient condition for the existence of invariant loss functions, and characterize all invariant loss functions, when the condition is satisfied. Analogous results for the more general case, where the quantity of inferential interest depends also on the observed data, are presented. We also discuss connections among our results and the equivariance literature and present some illustrative examples.  相似文献   

16.
When some states of a Markov chain are aggregated (or lumped) and the new process, with lumped states, inherits the Markov property, the original chain is said to be lumpable. We discuss the notion of lumpability for discrete hidden Markov models (DHMMs) and we explain why, in general, testing this hypothesis leads to non-standard problems. Nevertheless, we present a case where lumpability in DHMMs is a regular problem of comparing nested models. Finally, some simulation results assessing the performance of the proposed test and an application to two real data sets are given.  相似文献   

17.
Bayesian models for relative archaeological chronology building   总被引:1,自引:0,他引:1  
For many years, archaeologists have postulated that the numbers of various artefact types found within excavated features should give insight about their relative dates of deposition even when stratigraphic information is not present. A typical data set used in such studies can be reported as a cross-classification table (often called an abundance matrix or, equivalently, a contingency table) of excavated features against artefact types. Each entry of the table represents the number of a particular artefact type found in a particular archaeological feature. Methodologies for attempting to identify temporal sequences on the basis of such data are commonly referred to as seriation techniques. Several different procedures for seriation including both parametric and non-parametric statistics have been used in an attempt to reconstruct relative chronological orders on the basis of such contingency tables. We develop some possible model-based approaches that might be used to aid in relative, archaeological chronology building. We use the recently developed Markov chain Monte Carlo method based on Langevin diffusions to fit some of the models proposed. Predictive Bayesian model choice techniques are then employed to ascertain which of the models that we develop are most plausible. We analyse two data sets taken from the literature on archaeological seriation.  相似文献   

18.
Many scientists believe that small experiments, guided by scientific intuition, are simpler and more efficient than design of experiments. This belief is strong and persists even in the face of data demonstrating that it is clearly wrong. In this paper, we present two powerful teaching examples illustrating the dangers of small experiments guided by scientific intuition. We describe two, simple, two‐dimensional spaces. These two spaces give rise to, and at the same time appear to generate supporting data for, scientific intuitions that are deeply flawed or wholly incorrect. We find these spaces useful in unfreezing scientific thinking and challenging the misplaced confidence in scientific intuition. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

19.
Dissemination of information derived from large contingency tables formed from confidential data is a major responsibility of statistical agencies. In this paper we present solutions to several computational and algorithmic problems that arise in the dissemination of cross-tabulations (marginal sub-tables) from a single underlying table. These include data structures that exploit sparsity to support efficient computation of marginals and algorithms such as iterative proportional fitting, as well as a generalized form of the shuttle algorithm that computes sharp bounds on (small, confidentiality threatening) cells in the full table from arbitrary sets of released marginals. We give examples illustrating the techniques.  相似文献   

20.
We investigate the asymptotic behavior of a nonparametric M-estimator of a regression function for stationary dependent processes, where the explanatory variables take values in some abstract functional space. Under some regularity conditions, we give the weak and strong consistency of the estimator as well as its asymptotic normality. We also give two examples of functional processes that satisfy the mixing conditions assumed in this paper. Furthermore, a simulated example is presented to examine the finite sample performance of the proposed estimator.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号