首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We introduce two types of graphical log‐linear models: label‐ and level‐invariant models for triangle‐free graphs. These models generalise symmetry concepts in graphical log‐linear models and provide a tool with which to model symmetry in the discrete case. A label‐invariant model is category‐invariant and is preserved after permuting some of the vertices according to transformations that maintain the graph, whereas a level‐invariant model equates expected frequencies according to a given set of permutations. These new models can both be seen as instances of a new type of graphical log‐linear model termed the restricted graphical log‐linear model, or RGLL, in which equality restrictions on subsets of main effects and first‐order interactions are imposed. Their likelihood equations and graphical representation can be obtained from those derived for the RGLL models.  相似文献   

2.
Abstract. In geophysical and environmental problems, it is common to have multiple variables of interest measured at the same location and time. These multiple variables typically have dependence over space (and/or time). As a consequence, there is a growing interest in developing models for multivariate spatial processes, in particular, the cross‐covariance models. On the other hand, many data sets these days cover a large portion of the Earth such as satellite data, which require valid covariance models on a globe. We present a class of parametric covariance models for multivariate processes on a globe. The covariance models are flexible in capturing non‐stationarity in the data yet computationally feasible and require moderate numbers of parameters. We apply our covariance model to surface temperature and precipitation data from an NCAR climate model output. We compare our model to the multivariate version of the Matérn cross‐covariance function and models based on coregionalization and demonstrate the superior performance of our model in terms of AIC (and/or maximum loglikelihood values) and predictive skill. We also present some challenges in modelling the cross‐covariance structure of the temperature and precipitation data. Based on the fitted results using full data, we give the estimated cross‐correlation structure between the two variables.  相似文献   

3.
Abstract. We propose an objective Bayesian method for the comparison of all Gaussian directed acyclic graphical models defined on a given set of variables. The method, which is based on the notion of fractional Bayes factor (BF), requires a single default (typically improper) prior on the space of unconstrained covariance matrices, together with a prior sample size hyper‐parameter, which can be set to its minimal value. We show that our approach produces genuine BFs. The implied prior on the concentration matrix of any complete graph is a data‐dependent Wishart distribution, and this in turn guarantees that Markov equivalent graphs are scored with the same marginal likelihood. We specialize our results to the smaller class of Gaussian decomposable undirected graphical models and show that in this case they coincide with those recently obtained using limiting versions of hyper‐inverse Wishart distributions as priors on the graph‐constrained covariance matrices.  相似文献   

4.
Nonlinear mixed‐effects models are being widely used for the analysis of longitudinal data, especially from pharmaceutical research. They use random effects which are latent and unobservable variables so the random‐effects distribution is subject to misspecification in practice. In this paper, we first study the consequences of misspecifying the random‐effects distribution in nonlinear mixed‐effects models. Our study is focused on Gauss‐Hermite quadrature, which is now the routine method for calculation of the marginal likelihood in mixed models. We then present a formal diagnostic test to check the appropriateness of the assumed random‐effects distribution in nonlinear mixed‐effects models, which is very useful for real data analysis. Our findings show that the estimates of fixed‐effects parameters in nonlinear mixed‐effects models are generally robust to deviations from normality of the random‐effects distribution, but the estimates of variance components are very sensitive to the distributional assumption of random effects. Furthermore, a misspecified random‐effects distribution will either overestimate or underestimate the predictions of random effects. We illustrate the results using a real data application from an intensive pharmacokinetic study.  相似文献   

5.
In this paper we discuss graphical models for mixed types of continuous and discrete variables with incomplete data. We use a set of hyperedges to represent an observed data pattern. A hyperedge is a set of variables observed for a group of individuals. In a mixed graph with two types of vertices and two types of edges, dots and circles represent discrete and continuous variables respectively. A normal graph represents a graphical model and a hypergraph represents an observed data pattern. In terms of the mixed graph, we discuss decomposition of mixed graphical models with incomplete data, and we present a partial imputation method which can be used in the EM algorithm and the Gibbs sampler to speed their convergence. For a given mixed graphical model and an observed data pattern, we try to decompose a large graph into several small ones so that the original likelihood can be factored into a product of likelihoods with distinct parameters for small graphs. For the case that a graph cannot be decomposed due to its observed data pattern, we can impute missing data partially so that the graph can be decomposed.  相似文献   

6.
Over the past years, significant progress has been made in developing statistically rigorous methods to implement clinically interpretable sensitivity analyses for assumptions about the missingness mechanism in clinical trials for continuous and (to a lesser extent) for binary or categorical endpoints. Studies with time‐to‐event outcomes have received much less attention. However, such studies can be similarly challenged with respect to the robustness and integrity of primary analysis conclusions when a substantial number of subjects withdraw from treatment prematurely prior to experiencing an event of interest. We discuss how the methods that are widely used for primary analyses of time‐to‐event outcomes could be extended in a clinically meaningful and interpretable way to stress‐test the assumption of ignorable censoring. We focus on a ‘tipping point’ approach, the objective of which is to postulate sensitivity parameters with a clear clinical interpretation and to identify a setting of these parameters unfavorable enough towards the experimental treatment to nullify a conclusion that was favorable to that treatment. Robustness of primary analysis results can then be assessed based on clinical plausibility of the scenario represented by the tipping point. We study several approaches for conducting such analyses based on multiple imputation using parametric, semi‐parametric, and non‐parametric imputation models and evaluate their operating characteristics via simulation. We argue that these methods are valuable tools for sensitivity analyses of time‐to‐event data and conclude that the method based on piecewise exponential imputation model of survival has some advantages over other methods studied here. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

7.
Dead recoveries of marked animals are commonly used to estimate survival probabilities. Band‐recovery models can be parameterized either by r (the probability of recovering a band conditional on death of the animal) or by f (the probability that an animal will be killed, retrieved, and have its band reported). The T parametrization can be implemented in a capture‐recapture framework with two states (alive and newly dead), mortality being the transition probability between the two states. The authors show here that the f parametrization can also be implemented in a multistate framework by imposing simple constraints on some parameters. They illustrate it using data on the mallard and the snow goose. However, they mention that because it does not entirely separate the individual survival and encounter processes, the f parametrization must be used with care on reduced models, or in the presence of estimates at the boundary of the parameter space. As they show, a multistate framework allows the use of powerful software for model fitting or testing the goodness‐of‐fit of models; it also affords the implementation of complex models such as those based on mixture of information or uncertain states  相似文献   

8.
We propose a new type of multivariate statistical model that permits non‐Gaussian distributions as well as the inclusion of conditional independence assumptions specified by a directed acyclic graph. These models feature a specific factorisation of the likelihood that is based on pair‐copula constructions and hence involves only univariate distributions and bivariate copulas, of which some may be conditional. We demonstrate maximum‐likelihood estimation of the parameters of such models and compare them to various competing models from the literature. A simulation study investigates the effects of model misspecification and highlights the need for non‐Gaussian conditional independence models. The proposed methods are finally applied to modeling financial return data. The Canadian Journal of Statistics 40: 86–109; 2012 © 2012 Statistical Society of Canada  相似文献   

9.
In survey sampling, policymaking regarding the allocation of resources to subgroups (called small areas) or the determination of subgroups with specific properties in a population should be based on reliable estimates. Information, however, is often collected at a different scale than that of these subgroups; hence, the estimation can only be obtained on finer scale data. Parametric mixed models are commonly used in small‐area estimation. The relationship between predictors and response, however, may not be linear in some real situations. Recently, small‐area estimation using a generalised linear mixed model (GLMM) with a penalised spline (P‐spline) regression model, for the fixed part of the model, has been proposed to analyse cross‐sectional responses, both normal and non‐normal. However, there are many situations in which the responses in small areas are serially dependent over time. Such a situation is exemplified by a data set on the annual number of visits to physicians by patients seeking treatment for asthma, in different areas of Manitoba, Canada. In cases where covariates that can possibly predict physician visits by asthma patients (e.g. age and genetic and environmental factors) may not have a linear relationship with the response, new models for analysing such data sets are required. In the current work, using both time‐series and cross‐sectional data methods, we propose P‐spline regression models for small‐area estimation under GLMMs. Our proposed model covers both normal and non‐normal responses. In particular, the empirical best predictors of small‐area parameters and their corresponding prediction intervals are studied with the maximum likelihood estimation approach being used to estimate the model parameters. The performance of the proposed approach is evaluated using some simulations and also by analysing two real data sets (precipitation and asthma).  相似文献   

10.
A joint estimation approach for multiple high‐dimensional Gaussian copula graphical models is proposed, which achieves estimation robustness by exploiting non‐parametric rank‐based correlation coefficient estimators. Although we focus on continuous data in this paper, the proposed method can be extended to deal with binary or mixed data. Based on a weighted minimisation problem, the estimators can be obtained by implementing second‐order cone programming. Theoretical properties of the procedure are investigated. We show that the proposed joint estimation procedure leads to a faster convergence rate than estimating the graphs individually. It is also shown that the proposed procedure achieves an exact graph structure recovery with probability tending to 1 under certain regularity conditions. Besides theoretical analysis, we conduct numerical simulations to compare the estimation performance and graph recovery performance of some state‐of‐the‐art methods including both joint estimation methods and estimation methods for individuals. The proposed method is then applied to a gene expression data set, which illustrates its practical usefulness.  相似文献   

11.
Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well‐known model‐based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross‐validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co‐data”. In particular, we present two novel examples that allow for co‐data: first, a Bayesian spike‐and‐slab setting that facilitates inclusion of multiple co‐data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.  相似文献   

12.
A bank offering unsecured personal loans may be interested in several related outcome variables, including defaulting on the repayments, early repayment or failing to take up an offered loan. Current predictive models used by banks typically consider such variables individually. However, the fact that they are related to each other, and to many interrelated potential predictor variables, suggests that graphical models may provide an attractive alternative solution. We developed such a model for a data set of 15 variables measured on a set of 14 000 applications for unsecured personal loans. The resulting global model of behaviour enabled us to identify several previously unsuspected relationships of considerable interest to the bank. For example, we discovered important but obscure relationships between taking out insurance, prior delinquency with a credit card and delinquency with the loan.  相似文献   

13.
We extend the log‐mean linear parameterization for binary data to discrete variables with arbitrary number of levels and show that also in this case it can be used to parameterize bi‐directed graph models. Furthermore, we show that the log‐mean linear parameterization allows one to simultaneously represent marginal independencies among variables and marginal independencies that only appear when certain levels are collapsed into a single one. We illustrate the application of this property by means of an example based on genetic association studies involving single‐nucleotide polymorphisms. More generally, this feature provides a natural way to reduce the parameter count, while preserving the independence structure, by means of substantive constraints that give additional insight into the association structure of the variables. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics  相似文献   

14.
Abstract. We propose an ?1‐penalized estimation procedure for high‐dimensional linear mixed‐effects models. The models are useful whenever there is a grouping structure among high‐dimensional observations, that is, for clustered data. We prove a consistency and an oracle optimality result and we develop an algorithm with provable numerical convergence. Furthermore, we demonstrate the performance of the method on simulated and a real high‐dimensional data set.  相似文献   

15.
Directed acyclic graph (DAG) models—also called Bayesian networks—are widely used in probabilistic reasoning, machine learning and causal inference. If latent variables are present, then the set of possible marginal distributions over the remaining (observed) variables is generally not represented by any DAG. Larger classes of mixed graphical models have been introduced to overcome this; however, as we show, these classes are not sufficiently rich to capture all the marginal models that can arise. We introduce a new class of hyper‐graphs, called mDAGs, and a latent projection operation to obtain an mDAG from the margin of a DAG. We show that each distinct marginal of a DAG model is represented by at least one mDAG and provide graphical results towards characterizing equivalence of these models. Finally, we show that mDAGs correctly capture the marginal structure of causally interpreted DAGs under interventions on the observed variables.  相似文献   

16.
Abstract. For certain classes of hierarchical models, it is easy to derive an expression for the joint moment‐generating function (MGF) of data, whereas the joint probability density has an intractable form which typically involves an integral. The most important example is the class of linear models with non‐Gaussian latent variables. Parameters in the model can be estimated by approximate maximum likelihood, using a saddlepoint‐type approximation to invert the MGF. We focus on modelling heavy‐tailed latent variables, and suggest a family of mixture distributions that behaves well under the saddlepoint approximation (SPA). It is shown that the well‐known normalization issue renders the ordinary SPA useless in the present context. As a solution we extend the non‐Gaussian leading term SPA to a multivariate setting, and introduce a general rule for choosing the leading term density. The approach is applied to mixed‐effects regression, time‐series models and stochastic networks and it is shown that the modified SPA is very accurate.  相似文献   

17.
Probabilistic graphical models offer a powerful framework to account for the dependence structure between variables, which is represented as a graph. However, the dependence between variables may render inference tasks intractable. In this paper, we review techniques exploiting the graph structure for exact inference, borrowed from optimisation and computer science. They are built on the principle of variable elimination whose complexity is dictated in an intricate way by the order in which variables are eliminated. The so‐called treewidth of the graph characterises this algorithmic complexity: low‐treewidth graphs can be processed efficiently. The first point that we illustrate is therefore the idea that for inference in graphical models, the number of variables is not the limiting factor, and it is worth checking the width of several tree decompositions of the graph before resorting to the approximate method. We show how algorithms providing an upper bound of the treewidth can be exploited to derive a ‘good' elimination order enabling to realise exact inference. The second point is that when the treewidth is too large, algorithms for approximate inference linked to the principle of variable elimination, such as loopy belief propagation and variational approaches, can lead to accurate results while being much less time consuming than Monte‐Carlo approaches. We illustrate the techniques reviewed in this article on benchmarks of inference problems in genetic linkage analysis and computer vision, as well as on hidden variables restoration in coupled Hidden Markov Models.  相似文献   

18.
We present an objective Bayes method for covariance selection in Gaussian multivariate regression models having a sparse regression and covariance structure, the latter being Markov with respect to a directed acyclic graph (DAG). Our procedure can be easily complemented with a variable selection step, so that variable and graphical model selection can be performed jointly. In this way, we offer a solution to a problem of growing importance especially in the area of genetical genomics (eQTL analysis). The input of our method is a single default prior, essentially involving no subjective elicitation, while its output is a closed form marginal likelihood for every covariate‐adjusted DAG model, which is constant over each class of Markov equivalent DAGs; our procedure thus naturally encompasses covariate‐adjusted decomposable graphical models. In realistic experimental studies, our method is highly competitive, especially when the number of responses is large relative to the sample size.  相似文献   

19.
In geostatistics and also in other applications in science and engineering, it is now common to perform updates on Gaussian process models with many thousands or even millions of components. These large‐scale inferences involve modelling, representational and computational challenges. We describe a visualization tool for large‐scale Gaussian updates, the ‘medal plot’. The medal plot shows the updated uncertainty at each observation location and also summarizes the sharing of information across observations, as a proxy for the sharing of information across the state vector (or latent process). As such, it reflects characteristics of both the observations and the statistical model. We illustrate with an application to assess mass trends in the Antarctic Ice Sheet, for which there are strong constraints from the observations and the physics.  相似文献   

20.
Abstract. We investigate simulation methodology for Bayesian inference in Lévy‐driven stochastic volatility (SV) models. Typically, Bayesian inference from such models is performed using Markov chain Monte Carlo (MCMC); this is often a challenging task. Sequential Monte Carlo (SMC) samplers are methods that can improve over MCMC; however, there are many user‐set parameters to specify. We develop a fully automated SMC algorithm, which substantially improves over the standard MCMC methods in the literature. To illustrate our methodology, we look at a model comprised of a Heston model with an independent, additive, variance gamma process in the returns equation. The driving gamma process can capture the stylized behaviour of many financial time series and a discretized version, fit in a Bayesian manner, has been found to be very useful for modelling equity data. We demonstrate that it is possible to draw exact inference, in the sense of no time‐discretization error, from the Bayesian SV model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号