首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Statistical model learning problems are traditionally solved using either heuristic greedy optimization or stochastic simulation, such as Markov chain Monte Carlo or simulated annealing. Recently, there has been an increasing interest in the use of combinatorial search methods, including those based on computational logic. Some of these methods are particularly attractive since they can also be successful in proving the global optimality of solutions, in contrast to stochastic algorithms that only guarantee optimality at the limit. Here we improve and generalize a recently introduced constraint-based method for learning undirected graphical models. The new method combines perfect elimination orderings with various strategies for solution pruning and offers a dramatic improvement both in terms of time and memory complexity. We also show that the method is capable of efficiently handling a more general class of models, called stratified/labeled graphical models, which have an astronomically larger model space.  相似文献   

2.
With the influx of complex and detailed tracking data gathered from electronic tracking devices, the analysis of animal movement data has recently emerged as a cottage industry among biostatisticians. New approaches of ever greater complexity are continue to be added to the literature. In this paper, we review what we believe to be some of the most popular and most useful classes of statistical models used to analyse individual animal movement data. Specifically, we consider discrete-time hidden Markov models, more general state-space models and diffusion processes. We argue that these models should be core components in the toolbox for quantitative researchers working on stochastic modelling of individual animal movement. The paper concludes by offering some general observations on the direction of statistical analysis of animal movement. There is a trend in movement ecology towards what are arguably overly complex modelling approaches which are inaccessible to ecologists, unwieldy with large data sets or not based on mainstream statistical practice. Additionally, some analysis methods developed within the ecological community ignore fundamental properties of movement data, potentially leading to misleading conclusions about animal movement. Corresponding approaches, e.g. based on Lévy walk-type models, continue to be popular despite having been largely discredited. We contend that there is a need for an appropriate balance between the extremes of either being overly complex or being overly simplistic, whereby the discipline relies on models of intermediate complexity that are usable by general ecologists, but grounded in well-developed statistical practice and efficient to fit to large data sets.  相似文献   

3.
In this paper we investigate the application of stochastic complexity theory to classification problems. In particular, we define the notion of admissible models as a function of problem complexity, the number of data pointsN, and prior belief. This allows us to derive general bounds relating classifier complexity with data-dependent parameters such as sample size, class entropy and the optimal Bayes error rate. We discuss the application of these results to a variety of problems, including decision tree classifiers, Markov models for image segmentation, and feedforward multilayer neural network classifiers.  相似文献   

4.
The simplification of complex models which were originally envisaged to explain some data is considered as a discrete form of smoothing. In this sense data based model selection techniques lead to minimal and unavoidable initial smoothing. The same techniques may also be used for further smoothing if this seems necessary. For deterministic data parametric models which are usually used for stochastic data also provide convenient notches in the process of smoothing. The usual discrepancies can be used to measure the degree of smoothing. The methods for tables of means and tables of frequencies are described in more detail and examples of applications are given.  相似文献   

5.
6.
Probabilistic graphical models offer a powerful framework to account for the dependence structure between variables, which is represented as a graph. However, the dependence between variables may render inference tasks intractable. In this paper, we review techniques exploiting the graph structure for exact inference, borrowed from optimisation and computer science. They are built on the principle of variable elimination whose complexity is dictated in an intricate way by the order in which variables are eliminated. The so‐called treewidth of the graph characterises this algorithmic complexity: low‐treewidth graphs can be processed efficiently. The first point that we illustrate is therefore the idea that for inference in graphical models, the number of variables is not the limiting factor, and it is worth checking the width of several tree decompositions of the graph before resorting to the approximate method. We show how algorithms providing an upper bound of the treewidth can be exploited to derive a ‘good' elimination order enabling to realise exact inference. The second point is that when the treewidth is too large, algorithms for approximate inference linked to the principle of variable elimination, such as loopy belief propagation and variational approaches, can lead to accurate results while being much less time consuming than Monte‐Carlo approaches. We illustrate the techniques reviewed in this article on benchmarks of inference problems in genetic linkage analysis and computer vision, as well as on hidden variables restoration in coupled Hidden Markov Models.  相似文献   

7.
Stochastic volatility models have been widely appreciated in empirical finance such as option pricing, risk management, etc. Recent advances of Markov chain Monte Carlo (MCMC) techniques made it possible to fit all kinds of stochastic volatility models of increasing complexity within Bayesian framework. In this article, we propose a new Bayesian model selection procedure based on Bayes factor and a classical thermodynamic integration technique named path sampling to select an appropriate stochastic volatility model. The performance of the developed procedure is illustrated with an application to the daily pound/dollar exchange rates data set.  相似文献   

8.
A maximum likelihood estimation procedure is presented for the frailty model. The procedure is based on a stochastic Expectation Maximization algorithm which converges quickly to the maximum likelihood estimate. The usual expectation step is replaced by a stochastic approximation of the complete log-likelihood using simulated values of unobserved frailties whereas the maximization step follows the same lines as those of the Expectation Maximization algorithm. The procedure allows to obtain at the same time estimations of the marginal likelihood and of the observed Fisher information matrix. Moreover, this stochastic Expectation Maximization algorithm requires less computation time. A wide variety of multivariate frailty models without any assumption on the covariance structure can be studied. To illustrate this procedure, a Gaussian frailty model with two frailty terms is introduced. The numerical results based on simulated data and on real bladder cancer data are more accurate than those obtained by using the Expectation Maximization Laplace algorithm and the Monte-Carlo Expectation Maximization one. Finally, since frailty models are used in many fields such as ecology, biology, economy, …, the proposed algorithm has a wide spectrum of applications.  相似文献   

9.
Discrete choice models describe the choices made by decision makers among alternatives and play an important role in transportation planning, marketing research and other applications. The mixed multinomial logit (MMNL) model is a popular discrete choice model that captures heterogeneity in the preferences of decision makers through random coefficients. While Markov chain Monte Carlo methods provide the Bayesian analogue to classical procedures for estimating MMNL models, computations can be prohibitively expensive for large datasets. Approximate inference can be obtained using variational methods at a lower computational cost with competitive accuracy. In this paper, we develop variational methods for estimating MMNL models that allow random coefficients to be correlated in the posterior and can be extended easily to large-scale datasets. We explore three alternatives: (1) Laplace variational inference, (2) nonconjugate variational message passing and (3) stochastic linear regression. Their performances are compared using real and simulated data. To accelerate convergence for large datasets, we develop stochastic variational inference for MMNL models using each of the above alternatives. Stochastic variational inference allows data to be processed in minibatches by optimizing global variational parameters using stochastic gradient approximation. A novel strategy for increasing minibatch sizes adaptively within stochastic variational inference is proposed.  相似文献   

10.
The recent advent of modern technology has generated a large number of datasets which can be frequently modeled as functional data. This paper focuses on the problem of multiclass classification for stochastic diffusion paths. In this context we establish a closed formula for the optimal Bayes rule. We provide new statistical procedures which are built either on the plug-in principle or on the empirical risk minimization principle. We show the consistency of these procedures under mild conditions. We apply our methodologies to the parametric case and illustrate their accuracy with a simulation study through examples.  相似文献   

11.

In this paper the use of the empirical Fisher information matrix as an estimator of the information matrix is considered in the context of response models and incomplete data problems. The introduction of an additional stochastic component into such models is shown to greatly increase the range of situations in which the estimator can be employed. In particular the conditions for its use in incomplete data problems are shown to be the same as those needed to justify the use of the EM algorithm.  相似文献   

12.
In this paper, we consider that the degradation of two performance characteristics of a product can be modelled by stochastic processes and jointly by copula functions, but different stochastic processes govern the behaviour of each performance characteristic (PC) degradation. Different heterogeneous and homogeneous models are presented considering copula functions and different combinations of the most used stochastic processes in degradation analysis as marginal distributions. This is an important aspect to consider because the behaviour of the degradation of each PC may be different in its nature. As the joint distributions of the proposed models result in complex distributions, the estimation of the parameters of interest is performed via MCMC. A simulation study is performed to compare heterogeneous and homogeneous models. In addition, the proposed models are implemented to crack propagation data of two terminals of an electronic device, and some insights are provided about the product reliability under heterogeneous models.  相似文献   

13.
This paper conducts simulation-based comparison of several stochastic volatility models with leverage effects. Two new variants of asymmetric stochastic volatility models, which are subject to a logarithmic transformation on the squared asset returns, are proposed. The leverage effect is introduced into the model through correlation either between the innovations of the observation equation and the latent process, or between the logarithm of squared asset returns and the latent process. Suitable Markov Chain Monte Carlo algorithms are developed for parameter estimation and model comparison. Simulation results show that our proposed formulation of the leverage effect and the accompanying inference methods give rise to reasonable parameter estimates. Applications to two data sets uncover a negative correlation (which can be interpreted as a leverage effect) between the observed returns and volatilities, and a negative correlation between the logarithm of squared returns and volatilities.  相似文献   

14.
This article assumes the goal of proposing a simulation-based theoretical model comparison methodology with application to two time series road accident models. The model comparison exercise helps to quantify the main differences and similarities between the two models and comprises of three main stages: (1) simulation of time series through a true model with predefined properties; (2) estimation of the alternative model using the simulated data; (3) sensitivity analysis to quantify the effect of changes in the true model parameters on alternative model parameter estimates through analysis of variance, ANOVA. The proposed methodology is applied to two time series road accident models: UCM (unobserved components model) and DRAG (Demand for Road Use, Accidents and their Severity). Assuming that the real data-generating process is the UCM, new datasets approximating the road accident data are generated, and DRAG models are estimated using the simulated data. Since these two methodologies are usually assumed to be equivalent, in a sense that both models accurately capture the true effects of the regressors, we are specifically addressing the modeling of the stochastic trend, through the alternative model. Stochastic trend is the time-varying component and is one of the crucial factors in time series road accident data. Theoretically, it can be easily modeled through UCM, given its modeling properties. However, properly capturing the effect of a non-stationary component such as stochastic trend in a stationary explanatory model such as DRAG is challenging. After obtaining the parameter estimates of the alternative model (DRAG), the estimates of both true and alternative models are compared and the differences are quantified through experimental design and ANOVA techniques. It is observed that the effects of the explanatory variables used in the UCM simulation are only partially captured by the respective DRAG coefficients. This a priori, could be due to multicollinearity but the results of both simulation of UCM data and estimating of DRAG models reveal that there is no significant static correlation among regressors. Moreover, in fact, using ANOVA, it is determined that this regression coefficient estimation bias is caused by the presence of the stochastic trend present in the simulated data. Thus, the results of the methodological development suggest that the stochastic component present in the data should be treated accordingly through a preliminary, exploratory data analysis.  相似文献   

15.
SEMIFAR forecasts, with applications to foreign exchange rates   总被引:2,自引:0,他引:2  
SEMIFAR models introduced in Beran (1997. Estimating trends, long-range dependence and nonstationarity, preprint) provide a semiparametric modelling framework that enables the data analyst to separate deterministic and stochastic trends as well as short- and long-memory components in an observed time series. A correct distinction between these components, and in particular, the decision which of the components may be present in the data have an important impact on forecasts. In this paper, forecasts and forecast intervals for SEMIFAR models are obtained. The forecasts are based on an extrapolation of the nonparametric trend function and optimal forecasts of the stochastic component. In the data analytical part of the paper, the proposed method is applied to foreign exchange rates from Europe and Asia.  相似文献   

16.
Based on a generalized cumulative damage approach with a stochastic process describing degradation, new accelerated life test models are presented in which both observed failures and degradation measures can be considered for parametric inference of system lifetime. Incorporating an accelerated test variable, we provide several new accelerated degradation models for failure based on the geometric Brownian motion or gamma process. It is shown that in most cases, our models for failure can be approximated closely by accelerated test versions of Birnbaum–Saunders and inverse Gaussian distributions. Estimation of model parameters and a model selection procedure are discussed, and two illustrative examples using real data for carbon-film resistors and fatigue crack size are presented.  相似文献   

17.
The global sensitivity analysis method used to quantify the influence of uncertain input variables on the variability in numerical model responses has already been applied to deterministic computer codes; deterministic means here that the same set of input variables always gives the same output value. This paper proposes a global sensitivity analysis methodology for stochastic computer codes, for which the result of each code run is itself random. The framework of the joint modeling of the mean and dispersion of heteroscedastic data is used. To deal with the complexity of computer experiment outputs, nonparametric joint models are discussed and a new Gaussian process-based joint model is proposed. The relevance of these models is analyzed based upon two case studies. Results show that the joint modeling approach yields accurate sensitivity index estimators even when heteroscedasticity is strong.  相似文献   

18.
A test for randomness based on a statistic related to the complexity of finite sequences is presented. Simulation of binary sequences under different stochastic models provides estimates of the power of the test. The results show that the test is sensitive to a variety of alternatives to randomness and suggest that the proposed test statistic is a reasonable measure of the stochastic complexity of a finite sequence of discrete random variables.  相似文献   

19.

Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise: First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler–Maruyama discretisation) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterised by an invariant distribution and for which a measure-preserving numerical method can be derived.

  相似文献   

20.
Discrete time models are used in Ecology for describing the dynamics of an age-structured population. They can be introduced from a deterministic or from a stochastic viewpoint. We analyze a stochastic model for the case in which the dynamics of the population is described by means of a projection matrix. In this statistical model, fertility rates and survival rates are unknown parameters which are estimated by using a Bayesian approach and also data cloning, which is a simulation-based method especially useful with complex hierarchical models.

Both methodologies are applied to real data from the population of Steller sea lions located in the Alaska coast since 1978–2004. The estimates obtained from these methods show a good behavior when they are compared to the nonmissing actual values.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号