首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
We consider an adaptive importance sampling approach to estimating the marginal likelihood, a quantity that is fundamental in Bayesian model comparison and Bayesian model averaging. This approach is motivated by the difficulty of obtaining an accurate estimate through existing algorithms that use Markov chain Monte Carlo (MCMC) draws, where the draws are typically costly to obtain and highly correlated in high-dimensional settings. In contrast, we use the cross-entropy (CE) method, a versatile adaptive Monte Carlo algorithm originally developed for rare-event simulation. The main advantage of the importance sampling approach is that random samples can be obtained from some convenient density with little additional costs. As we are generating independent draws instead of correlated MCMC draws, the increase in simulation effort is much smaller should one wish to reduce the numerical standard error of the estimator. Moreover, the importance density derived via the CE method is grounded in information theory, and therefore, is in a well-defined sense optimal. We demonstrate the utility of the proposed approach by two empirical applications involving women's labor market participation and U.S. macroeconomic time series. In both applications, the proposed CE method compares favorably to existing estimators.  相似文献   

2.
Approximate Bayesian computation (ABC) methods permit approximate inference for intractable likelihoods when it is possible to simulate from the model. However, they perform poorly for high-dimensional data and in practice must usually be used in conjunction with dimension reduction methods, resulting in a loss of accuracy which is hard to quantify or control. We propose a new ABC method for high-dimensional data based on rare event methods which we refer to as RE-ABC. This uses a latent variable representation of the model. For a given parameter value, we estimate the probability of the rare event that the latent variables correspond to data roughly consistent with the observations. This is performed using sequential Monte Carlo and slice sampling to systematically search the space of latent variables. In contrast, standard ABC can be viewed as using a more naive Monte Carlo estimate. We use our rare event probability estimator as a likelihood estimate within the pseudo-marginal Metropolis–Hastings algorithm for parameter inference. We provide asymptotics showing that RE-ABC has a lower computational cost for high-dimensional data than standard ABC methods. We also illustrate our approach empirically, on a Gaussian distribution and an application in infectious disease modelling.  相似文献   

3.
We present a versatile Monte Carlo method for estimating multidimensional integrals, with applications to rare-event probability estimation. The method fuses two distinct and popular Monte Carlo simulation methods—Markov chain Monte Carlo and importance sampling—into a single algorithm. We show that for some applied numerical examples the proposed Markov Chain importance sampling algorithm performs better than methods based solely on importance sampling or MCMC.  相似文献   

4.
The smooth integration of counting and absolute deviation (SICA) penalty has been demonstrated theoretically and practically to be effective in non-convex penalization for variable selection. However, solving the non-convex optimization problem associated with the SICA penalty when the number of variables exceeds the sample size remains to be enriched due to the singularity at the origin and the non-convexity of the SICA penalty function. In this paper, we develop an efficient and accurate alternating direction method of multipliers with continuation algorithm for solving the SICA-penalized least squares problem in high dimensions. We establish the convergence property of the proposed algorithm under some mild regularity conditions and study the corresponding Karush–Kuhn–Tucker optimality condition. A high-dimensional Bayesian information criterion is developed to select the optimal tuning parameters. We conduct extensive simulations studies to evaluate the efficiency and accuracy of the proposed algorithm, while its practical usefulness is further illustrated with a high-dimensional microarray study.  相似文献   

5.
In this paper, we present a Bayesian analysis for the Weibull proportional hazard (PH) model used in step-stress accelerated life testings. The key mathematical and graphical difference between the Weibull cumulative exposure (CE) model and the PH model is illustrated. Compared with the CE model, the PH model provides more flexibility in fitting step-stress testing data and has the attractive mathematical properties of being desirable in the Bayesian framework. A Markov chain Monte Carlo algorithm with adaptive rejection sampling technique is used for posterior inference. We demonstrate the performance of this method on both simulated and real datasets.  相似文献   

6.
This paper is about variable selection with the random forests algorithm in presence of correlated predictors. In high-dimensional regression or classification frameworks, variable selection is a difficult task, that becomes even more challenging in the presence of highly correlated predictors. Firstly we provide a theoretical study of the permutation importance measure for an additive regression model. This allows us to describe how the correlation between predictors impacts the permutation importance. Our results motivate the use of the recursive feature elimination (RFE) algorithm for variable selection in this context. This algorithm recursively eliminates the variables using permutation importance measure as a ranking criterion. Next various simulation experiments illustrate the efficiency of the RFE algorithm for selecting a small number of variables together with a good prediction error. Finally, this selection algorithm is tested on the Landsat Satellite data from the UCI Machine Learning Repository.  相似文献   

7.
In this article, an importance sampling (IS) method for the posterior expectation of a non linear function in a Bayesian vector autoregressive (VAR) model is developed. Most Bayesian inference problems involve the evaluation of the expectation of a function of interest, usually a non linear function of the model parameters, under the posterior distribution. Non linear functions in Bayesian VAR setting are difficult to estimate and usually require numerical methods for their evaluation. A weighted IS estimator is used for the evaluation of the posterior expectation. With the cross-entropy (CE) approach, the IS density is chosen from a specified family of densities such that the CE distance or the Kullback–Leibler divergence between the optimal IS density and the importance density is minimal. The performance of the proposed algorithm is assessed in an iterated multistep forecasting of US macroeconomic time series.  相似文献   

8.
Simulated annealing—moving from a tractable distribution to a distribution of interest via a sequence of intermediate distributions—has traditionally been used as an inexact method of handling isolated modes in Markov chain samplers. Here, it is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler. The Markov chain aspect allows this method to perform acceptably even for high-dimensional problems, where finding good importance sampling distributions would otherwise be very difficult, while the use of importance weights ensures that the estimates found converge to the correct values as the number of annealing runs increases. This annealed importance sampling procedure resembles the second half of the previously-studied tempered transitions, and can be seen as a generalization of a recently-proposed variant of sequential importance sampling. It is also related to thermodynamic integration methods for estimating ratios of normalizing constants. Annealed importance sampling is most attractive when isolated modes are present, or when estimates of normalizing constants are required, but it may also be more generally useful, since its independent sampling allows one to bypass some of the problems of assessing convergence and autocorrelation in Markov chain samplers.  相似文献   

9.
Full likelihood-based inference for modern population genetics data presents methodological and computational challenges. The problem is of considerable practical importance and has attracted recent attention, with the development of algorithms based on importance sampling (IS) and Markov chain Monte Carlo (MCMC) sampling. Here we introduce a new IS algorithm. The optimal proposal distribution for these problems can be characterized, and we exploit a detailed analysis of genealogical processes to develop a practicable approximation to it. We compare the new method with existing algorithms on a variety of genetic examples. Our approach substantially outperforms existing IS algorithms, with efficiency typically improved by several orders of magnitude. The new method also compares favourably with existing MCMC methods in some problems, and less favourably in others, suggesting that both IS and MCMC methods have a continuing role to play in this area. We offer insights into the relative advantages of each approach, and we discuss diagnostics in the IS framework.  相似文献   

10.
The smooth integration of counting and absolute deviation (SICA) penalty has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection and parameter estimation. However, solving the nonconvex optimization problem associated with SICA penalty in high-dimensional setting remains to be enriched, mainly due to the singularity at the origin and the nonconvexity of the SICA penalty function. In this paper, we develop a fast primal dual active set (PDAS) with continuation algorithm for solving the nonconvex SICA-penalized least squares in high dimensions. Upon introducing the dual variable, the PDAS algorithm iteratively identify and update the active set in the optimization using both the primal and dual information, and then solve a low-dimensional least square problem on the active set. When combined with a continuation strategy and a high-dimensional Bayesian information criterion (BIC) selector on the tuning parameters, the proposed algorithm is very efficient and accurate. Extensive simulation studies and analysis of a high-dimensional microarray gene expression data are presented to illustrate the performance of the proposed method.  相似文献   

11.
In this paper, we discuss a parsimonious approach to estimation of high-dimensional covariance matrices via the modified Cholesky decomposition with lasso. Two different methods are proposed. They are the equi-angular and equi-sparse methods. We use simulation to compare the performance of the proposed methods with others available in the literature, including the sample covariance matrix, the banding method, and the L1-penalized normal loglikelihood method. We then apply the proposed methods to a portfolio selection problem using 80 series of daily stock returns. To facilitate the use of lasso in high-dimensional time series analysis, we develop the dynamic weighted lasso (DWL) algorithm that extends the LARS-lasso algorithm. In particular, the proposed algorithm can efficiently update the lasso solution as new data become available. It can also add or remove explanatory variables. The entire solution path of the L1-penalized normal loglikelihood method is also constructed.  相似文献   

12.
Importance sampling and Markov chain Monte Carlo methods have been used in exact inference for contingency tables for a long time, however, their performances are not always very satisfactory. In this paper, we propose a stochastic approximation Monte Carlo importance sampling (SAMCIS) method for tackling this problem. SAMCIS is a combination of adaptive Markov chain Monte Carlo and importance sampling, which employs the stochastic approximation Monte Carlo algorithm (Liang et al., J. Am. Stat. Assoc., 102(477):305–320, 2007) to draw samples from an enlarged reference set with a known Markov basis. Compared to the existing importance sampling and Markov chain Monte Carlo methods, SAMCIS has a few advantages, such as fast convergence, ergodicity, and the ability to achieve a desired proportion of valid tables. The numerical results indicate that SAMCIS can outperform the existing importance sampling and Markov chain Monte Carlo methods: It can produce much more accurate estimates in much shorter CPU time than the existing methods, especially for the tables with high degrees of freedom.  相似文献   

13.
Recently, a new ensemble classification method named Canonical Forest (CF) has been proposed by Chen et al. [Canonical forest. Comput Stat. 2014;29:849–867]. CF has been proven to give consistently good results in many data sets and comparable to other widely used classification ensemble methods. However, CF requires an adopting feature reduction method before classifying high-dimensional data. Here, we extend CF to a high-dimensional classifier by incorporating a random feature subspace algorithm [Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–844]. This extended algorithm is called HDCF (high-dimensional CF) as it is specifically designed for high-dimensional data. We conducted an experiment using three data sets – gene imprinting, oestrogen, and leukaemia – to compare the performance of HDCF with several popular and successful classification methods on high-dimensional data sets, including Random Forest [Breiman L. Random forest. Mach Learn. 2001;45:5–32], CERP [Ahn H, et al. Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal. 2007;51:6166–6179], and support vector machines [Vapnik V. The nature of statistical learning theory. New York: Springer; 1995]. Besides the classification accuracy, we also investigated the balance between sensitivity and specificity for all these four classification methods.  相似文献   

14.
Markov chain Monte Carlo (MCMC) methods, including the Gibbs sampler and the Metropolis–Hastings algorithm, are very commonly used in Bayesian statistics for sampling from complicated, high-dimensional posterior distributions. A continuing source of uncertainty is how long such a sampler must be run in order to converge approximately to its target stationary distribution. A method has previously been developed to compute rigorous theoretical upper bounds on the number of iterations required to achieve a specified degree of convergence in total variation distance by verifying drift and minorization conditions. We propose the use of auxiliary simulations to estimate the numerical values needed in this theorem. Our simulation method makes it possible to compute quantitative convergence bounds for models for which the requisite analytical computations would be prohibitively difficult or impossible. On the other hand, although our method appears to perform well in our example problems, it cannot provide the guarantees offered by analytical proof.  相似文献   

15.
The problem of interaction selection in high-dimensional data analysis has recently received much attention. This note aims to address and clarify several fundamental issues in interaction selection for linear regression models, especially when the input dimension p is much larger than the sample size n. We first discuss how to give a formal definition of “importance” for main and interaction effects. Then we focus on two-stage methods, which are computationally attractive for high-dimensional data analysis but thus far have been regarded as heuristic. We revisit the counterexample of Turlach and provide new insight to justify two-stage methods from the theoretical perspective. In the end, we suggest new strategies for interaction selection under the marginality principle and provide some simulation results.  相似文献   

16.
ABSTRACT

Incremental modelling of data streams is of great practical importance, as shown by its applications in advertising and financial data analysis. We propose two incremental covariance matrix decomposition methods for a compositional data type. The first method, exact incremental covariance decomposition of compositional data (C-EICD), gives an exact decomposition result. The second method, covariance-free incremental covariance decomposition of compositional data (C-CICD), is an approximate algorithm that can efficiently compute high-dimensional cases. Based on these two methods, many frequently used compositional statistical models can be incrementally calculated. We take multiple linear regression and principle component analysis as examples to illustrate the utility of the proposed methods via extensive simulation studies.  相似文献   

17.
The real-parameter evolutionary Monte Carlo algorithm (EMC) has been proposed as an effective tool both for sampling from high-dimensional distributions and for stochastic optimization (Liang and Wong, 2001). EMC uses a temperature ladder similar to that in parallel tempering (PT; Geyer, 1991). In contrast with PT, EMC allows for crossover moves between parallel and tempered MCMC chains. In the context of EMC, we introduce four new moves, which enhance its efficiency as measured by the effective sample size. Secondly, we introduce a practical strategy for determining the temperature range and placing the temperatures in the ladder used in EMC and PT. Lastly, we prove the validity of the conditional sampling step of the snooker algorithm, a crossover move in EMC, which extends a result of Roberts and Gilks (1994).  相似文献   

18.
Testing for differences between two groups is a fundamental problem in statistics, and due to developments in Bayesian non parametrics and semiparametrics there has been renewed interest in approaches to this problem. Here we describe a new approach to developing such tests and introduce a class of such tests that take advantage of developments in Bayesian non parametric computing. This class of tests uses the connection between the Dirichlet process (DP) prior and the Wilcoxon rank sum test but extends this idea to the DP mixture prior. Here tests are developed that have appropriate frequentist sampling procedures for large samples but have the potential to outperform the usual frequentist tests. Extensions to interval and right censoring are considered and an application to a high-dimensional data set obtained from an RNA-Seq investigation demonstrates the practical utility of the method.  相似文献   

19.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

20.
Abstract.  We consider the problem of estimating a collection of integrals with respect to an unknown finite measure μ from noisy observations of some of the integrals. A new method to carry out Bayesian inference for the integrals is proposed. We use a Dirichlet or Gamma process as a prior for μ , and construct an approximation to the posterior distribution of the integrals using the sampling importance resampling algorithm and samples from a new multidimensional version of a Markov chain by Feigin and Tweedie. We prove that the Markov chain is positive Harris recurrent, and that the approximating distribution converges weakly to the posterior as the sample size increases, under a mild integrability condition. Applications to polymer chemistry and mathematical finance are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号