首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Kontkanen  P.  Myllymäki  P.  Silander  T.  Tirri  H.  Grünwald  P. 《Statistics and Computing》2000,10(1):39-54
In this paper we are interested in discrete prediction problems for a decision-theoretic setting, where the task is to compute the predictive distribution for a finite set of possible alternatives. This question is first addressed in a general Bayesian framework, where we consider a set of probability distributions defined by some parametric model class. Given a prior distribution on the model parameters and a set of sample data, one possible approach for determining a predictive distribution is to fix the parameters to the instantiation with the maximum a posteriori probability. A more accurate predictive distribution can be obtained by computing the evidence (marginal likelihood), i.e., the integral over all the individual parameter instantiations. As an alternative to these two approaches, we demonstrate how to use Rissanen's new definition of stochastic complexity for determining predictive distributions, and show how the evidence predictive distribution with Jeffrey's prior approaches the new stochastic complexity predictive distribution in the limit with increasing amount of sample data. To compare the alternative approaches in practice, each of the predictive distributions discussed is instantiated in the Bayesian network model family case. In particular, to determine Jeffrey's prior for this model family, we show how to compute the (expected) Fisher information matrix for a fixed but arbitrary Bayesian network structure. In the empirical part of the paper the predictive distributions are compared by using the simple tree-structured Naive Bayes model, which is used in the experiments for computational reasons. The experimentation with several public domain classification datasets suggest that the evidence approach produces the most accurate predictions in the log-score sense. The evidence-based methods are also quite robust in the sense that they predict surprisingly well even when only a small fraction of the full training set is used.  相似文献   

2.
We design a probability distribution for ordinal data by modeling the process generating data, which is assumed to rely only on order comparisons between categories. Contrariwise, most competitors often either forget the order information or add a non-existent distance information. The data generating process is assumed, from optimality arguments, to be a stochastic binary search algorithm in a sorted table. The resulting distribution is natively governed by two meaningful parameters (position and precision) and has very appealing properties: decrease around the mode, shape tuning from uniformity to a Dirac, identifiability. Moreover, it is easily estimated by an EM algorithm since the path in the stochastic binary search algorithm can be considered as missing values. Using then the classical latent class assumption, the previous univariate ordinal model is straightforwardly extended to model-based clustering for multivariate ordinal data. Parameters of this mixture model are estimated by an AECM algorithm. Both simulated and real data sets illustrate the great potential of this model by its ability to parsimoniously identify particularly relevant clusters which were unsuspected by some traditional competitors.  相似文献   

3.
Estimation of finite mixture models when the mixing distribution support is unknown is an important problem. This article gives a new approach based on a marginal likelihood for the unknown support. Motivated by a Bayesian Dirichlet prior model, a computationally efficient stochastic approximation version of the marginal likelihood is proposed and large-sample theory is presented. By restricting the support to a finite grid, a simulated annealing method is employed to maximize the marginal likelihood and estimate the support. Real and simulated data examples show that this novel stochastic approximation and simulated annealing procedure compares favorably with existing methods.  相似文献   

4.
The purpose of the paper, is to explain how recent advances in Markov Chain Monte Carlo integration can facilitate the routine Bayesian analysis of the linear model when the prior distribution is completely user dependent. The method is based on a Metropolis-Hastings algorithm with a Student-t source distribution that can generate posterior moments as well as marginal posterior densities for model parameters. The method is illustrated with numerical examples where the combination of prior and likelihood information leads to multimodal posteriors due to prior-likelihood conflicts, and to cases where prior information can be summarized by symmetric stable Paretian distributions.  相似文献   

5.
A common assumption in fitting panel data models is normality of stochastic subject effects. This can be extremely restrictive, making vague most potential features of true distributions. The objective of this article is to propose a modeling strategy, from a semi-parametric Bayesian perspective, to specify a flexible distribution for the random effects in dynamic panel data models. This is addressed here by assuming the Dirichlet process mixture model to introduce Dirichlet process prior for the random-effects distribution. We address the role of initial conditions in dynamic processes, emphasizing on joint modeling of start-up and subsequent responses. We adopt Gibbs sampling techniques to approximate posterior estimates. These important topics are illustrated by a simulation study and also by testing hypothetical models in two empirical contexts drawn from economic studies. We use modified versions of information criteria to compare the fitted models.  相似文献   

6.
汪炜  袁东任 《统计研究》2014,31(4):89-96
自愿性信息披露是管理层对外传递公司价值,缓解信息不对称的重要手段,是强制性财务报告的有益补充。财务报告的盈余信息不但通过契约制定约束了管理层行为,其盈余质量也反映了管理层可信度并影响自愿披露的估值作用,这些都会影响公司自愿披露行为。本文以上市公司自愿披露的前瞻性信息为对象,分析了盈余质量对自愿性信息披露的影响及作用机理。实证结果证实了盈余质量对自愿性信息披露有契约作用和鉴证作用;契约作用表现为盈余质量可通过降低代理成本提高自愿披露水平;而鉴证作用体现在盈余质量为自愿披露信息提供了可鉴证性保障,提高了公司价值与自愿披露水平的相关性。  相似文献   

7.
针对传统交叉分类信度模型计算复杂且在结构参数先验信息不足的情况下不能得到参数无偏后验估计的问题,利用MCMC模拟和GLMM方法,对交叉分类信度模型进行实证分析证明模型的有效性。结果表明:基于MCMC方法能够动态模拟参数的后验分布,并可提高模型估计的精度;基于GLMM能大大简化计算过程且操作方便,可利用图形和其它诊断工具选择模型,并对模型实用性做出评价。  相似文献   

8.
A uniform shrinkage prior (USP) distribution on the unknown variance component of a random-effects model is known to produce good frequency properties. The USP has a parameter that determines the shape of its density function, but it has been neglected whether the USP can maintain such good frequency properties regardless of the choice for the shape parameter. We investigate which choice for the shape parameter of the USP produces Bayesian interval estimates of random effects that meet their nominal confidence levels better than several existent choices in the literature. Using univariate and multivariate Gaussian hierarchical models, we show that the USP can achieve its best frequency properties when its shape parameter makes the USP behave similarly to an improper flat prior distribution on the unknown variance component.  相似文献   

9.
For models with random effects or missing data, the likelihood function is sometimes intractable analytically but amenable to Monte Carlo approximation. To get a good approximation, the parameter value that drives the simulations should be sufficiently close to the maximum likelihood estimate (MLE) which unfortunately is unknown. Introducing a working prior distribution, we express the likelihood function as a posterior expectation and approximate it using posterior simulations. If the sample size is large, the sample information is likely to outweigh the prior specification and the posterior simulations will be concentrated around the MLE automatically, leading to good approximation of the likelihood near the MLE. For smaller samples, we propose to use the current posterior as the next prior distribution to make the posterior simulations closer to the MLE and hence improve the likelihood approximation. By using the technique of data duplication, we can simulate from the sharpened posterior distribution without actually updating the prior distribution. The suggested method works well in several test cases. A more complex example involving censored spatial data is also discussed.  相似文献   

10.
This paper considers the problem of estimating a nonlinear statistical model subject to stochastic linear constraints among unknown parameters. These constraints represent prior information which originates from a previous estimation of the same model using an alternative database. One feature of this specification allows for the disign matrix of stochastic linear restrictions to be estimated. The mixed regression technique and the maximum likelihood approach are used to derive the estimator for both the model coefficients and the unknown elements of this design matrix. The proposed estimator whose asymptotic properties are studied, contains as a special case the conventional mixed regression estimator based on a fixed design matrix. A new test of compatibility between prior and sample information is also introduced. Thesuggested estimator is tested empirically with both simulated and actual marketing data.  相似文献   

11.
This paper considers the problem of estimating a nonlinear statistical model subject to stochastic linear constraints among unknown parameters. These constraints represent prior information which originates from a previous estimation of the same model using an alternative database. One feature of this specification allows for the disign matrix of stochastic linear restrictions to be estimated. The mixed regression technique and the maximum likelihood approach are used to derive the estimator for both the model coefficients and the unknown elements of this design matrix. The proposed estimator whose asymptotic properties are studied, contains as a special case the conventional mixed regression estimator based on a fixed design matrix. A new test of compatibility between prior and sample information is also introduced. Thesuggested estimator is tested empirically with both simulated and actual marketing data.  相似文献   

12.
Due to computational challenges and non-availability of conjugate prior distributions, Bayesian variable selection in quantile regression models is often a difficult task. In this paper, we address these two issues for quantile regression models. In particular, we develop an informative stochastic search variable selection (ISSVS) for quantile regression models that introduces an informative prior distribution. We adopt prior structures which incorporate historical data into the current data by quantifying them with a suitable prior distribution on the model parameters. This allows ISSVS to search more efficiently in the model space and choose the more likely models. In addition, a Gibbs sampler is derived to facilitate the computation of the posterior probabilities. A major advantage of ISSVS is that it avoids instability in the posterior estimates for the Gibbs sampler as well as convergence problems that may arise from choosing vague priors. Finally, the proposed methods are illustrated with both simulation and real data.  相似文献   

13.
The Bayesian CART (classification and regression tree) approach proposed by Chipman, George and McCulloch (1998) entails putting a prior distribution on the set of all CART models and then using stochastic search to select a model. The main thrust of this paper is to propose a new class of hierarchical priors which enhance the potential of this Bayesian approach. These priors indicate a preference for smooth local mean structure, resulting in tree models which shrink predictions from adjacent terminal node towards each other. Past methods for tree shrinkage have searched for trees without shrinking, and applied shrinkage to the identified tree only after the search. By using hierarchical priors in the stochastic search, the proposed method searches for shrunk trees that fit well and improves the tree through shrinkage of predictions.  相似文献   

14.
胡宗义  李毅 《统计研究》2020,37(4):59-74
本文利用我国2008年正式实施环境信息披露制度这一外生冲击构造准自然实验,基于2004-2017年我国285个城市的面板数据,通过双重差分法系统评估环境信息披露对工业污染物排放的影响。克服环境信息披露的测量困难与内生性问题,首次考察环境信息披露的污染减排效果,并借助数理模型对其影响机制进行规范阐释。研究发现,环境信息披露能够显著降低工业污染物排放水平,且该影响具有一定的时滞性和长期性;同时,减排作用会随地区环境污染程度和环境规制力度的增强而呈现递增规律;此外,机制分析的结果表明,其传导机制主要来自于产业结构转型和减排技术进步。为验证研究结论的稳健性,本文提供了平行趋势、工具变量、安慰剂等多种方法的检验结果。本文的研究在经验上丰富了环境信息披露与环境污染治理之间关系的相关讨论,为提升我国环境污染治理水平、打赢污染防治攻坚战提供有益的政策启示。  相似文献   

15.
A novel framework is proposed for the estimation of multiple sinusoids from irregularly sampled time series. This spectral analysis problem is addressed as an under-determined inverse problem, where the spectrum is discretized on an arbitrarily thin frequency grid. As we focus on line spectra estimation, the solution must be sparse, i.e. the amplitude of the spectrum must be zero almost everywhere. Such prior information is taken into account within the Bayesian framework. Two models are used to account for the prior sparseness of the solution, namely a Laplace prior and a Bernoulli–Gaussian prior, associated to optimization and stochastic sampling algorithms, respectively. Such approaches are efficient alternatives to usual sequential prewhitening methods, especially in case of strong sampling aliases perturbating the Fourier spectrum. Both methods should be intensively tested on real data sets by physicists.  相似文献   

16.
In this paper, we develop a variable selection framework with the spike-and-slab prior distribution via the hazard function of the Cox model. Specifically, we consider the transformation of the score and information functions for the partial likelihood function evaluated at the given data from the parameter space into the space generated by the logarithm of the hazard ratio. Thereby, we reduce the nonlinear complexity of the estimation equation for the Cox model and allow the utilization of a wider variety of stable variable selection methods. Then, we use a stochastic variable search Gibbs sampling approach via the spike-and-slab prior distribution to obtain the sparsity structure of the covariates associated with the survival outcome. Additionally, we conduct numerical simulations to evaluate the finite-sample performance of our proposed method. Finally, we apply this novel framework on lung adenocarcinoma data to find important genes associated with decreased survival in subjects with the disease.  相似文献   

17.
It is demonstrated how a suitably chosen prior for the frequency parameters can streamline the Bayesian analysis of categorical data with missing entries due to nonresponse or other causes. The two cases where the data follow the Multinomial or the Hypergeometric model are treated separately. In the first case it is adequate to restrict the prior (for the cell probabilities) to the class of Dirichlet distributions. In the case of the Hypergeometric model it is convenient to select a prior from the class of Dirichlet-Multinomial (DM) distributions. The DM distributions are studied in some details.  相似文献   

18.
Particle MCMC involves using a particle filter within an MCMC algorithm. For inference of a model which involves an unobserved stochastic process, the standard implementation uses the particle filter to propose new values for the stochastic process, and MCMC moves to propose new values for the parameters. We show how particle MCMC can be generalised beyond this. Our key idea is to introduce new latent variables. We then use the MCMC moves to update the latent variables, and the particle filter to propose new values for the parameters and stochastic process given the latent variables. A generic way of defining these latent variables is to model them as pseudo-observations of the parameters or of the stochastic process. By choosing the amount of information these latent variables have about the parameters and the stochastic process we can often improve the mixing of the particle MCMC algorithm by trading off the Monte Carlo error of the particle filter and the mixing of the MCMC moves. We show that using pseudo-observations within particle MCMC can improve its efficiency in certain scenarios: dealing with initialisation problems of the particle filter; speeding up the mixing of particle Gibbs when there is strong dependence between the parameters and the stochastic process; and enabling further MCMC steps to be used within the particle filter.  相似文献   

19.
This article proposes a stochastic version of the matching pursuit algorithm for Bayesian variable selection in linear regression. In the Bayesian formulation, the prior distribution of each regression coefficient is assumed to be a mixture of a point mass at 0 and a normal distribution with zero mean and a large variance. The proposed stochastic matching pursuit algorithm is designed for sampling from the posterior distribution of the coefficients for the purpose of variable selection. The proposed algorithm can be considered a modification of the componentwise Gibbs sampler. In the componentwise Gibbs sampler, the variables are visited by a random or a systematic scan. In the stochastic matching pursuit algorithm, the variables that better align with the current residual vector are given higher probabilities of being visited. The proposed algorithm combines the efficiency of the matching pursuit algorithm and the Bayesian formulation with well defined prior distributions on coefficients. Several simulated examples of small n and large p are used to illustrate the algorithm. These examples show that the algorithm is efficient for screening and selecting variables.  相似文献   

20.
《统计学通讯:理论与方法》2012,41(16-17):2864-2878
We describe diverse stochastic inference problems whose solution essentially depends on the moment determinacy of some distributions involved. For a variety of stochastic models we ask questions such as “how to identify a distribution if knowing its moments?” “how asymmetric can be a distribution with zero odd order moments?” “is any mixture model identifiable?” For specific models we provide answers, motivating arguments, and illustrations. Some challenging open questions are outlined.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号