首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Abstract.  CG-regressions are multivariate regression models for mixed continuous and discrete responses that result from conditioning in the class of conditional Gaussian (CG) models. Their conditional independence structure can be read off a marked graph. The property of collapsibility, in this context, means that the multivariate CG-regression can be decomposed into lower dimensional regressions that are still CG and are consistent with the corresponding subgraphs. We derive conditions for this property that can easily be checked on the graph, and indicate computational advantages of this kind of collapsibility. Further, a simple graphical condition is given for checking whether a decomposition into univariate regressions is possible.  相似文献   

2.
We develop Bayesian models for density regression with emphasis on discrete outcomes. The problem of density regression is approached by considering methods for multivariate density estimation of mixed scale variables, and obtaining conditional densities from the multivariate ones. The approach to multivariate mixed scale outcome density estimation that we describe represents discrete variables, either responses or covariates, as discretised versions of continuous latent variables. We present and compare several models for obtaining these thresholds in the challenging context of count data analysis where the response may be over‐ and/or under‐dispersed in some of the regions of the covariate space. We utilise a nonparametric mixture of multivariate Gaussians to model the directly observed and the latent continuous variables. The paper presents a Markov chain Monte Carlo algorithm for posterior sampling, sufficient conditions for weak consistency, and illustrations on density, mean and quantile regression utilising simulated and real datasets.  相似文献   

3.
This paper focuses on a situation in which a set of treatments is associated with a response through a set of supplementary variables in linear models as well as discrete models. Under the situation, we demonstrate that the causal effect can be estimated more accurately from the set of supplementary variables. In addition, we show that the set of supplementary variables can include selection variables and proxy variables as well. Furthermore, we propose selection criteria for supplementary variables based on the estimation accuracy of causal effects. From graph structures based on our results, we can judge certain situations under which the causal effect can be estimated more accurately by supplementary variables and reliably evaluate the causal effects from observed data.  相似文献   

4.
Directed acyclic graph (DAG) models—also called Bayesian networks—are widely used in probabilistic reasoning, machine learning and causal inference. If latent variables are present, then the set of possible marginal distributions over the remaining (observed) variables is generally not represented by any DAG. Larger classes of mixed graphical models have been introduced to overcome this; however, as we show, these classes are not sufficiently rich to capture all the marginal models that can arise. We introduce a new class of hyper‐graphs, called mDAGs, and a latent projection operation to obtain an mDAG from the margin of a DAG. We show that each distinct marginal of a DAG model is represented by at least one mDAG and provide graphical results towards characterizing equivalence of these models. Finally, we show that mDAGs correctly capture the marginal structure of causally interpreted DAGs under interventions on the observed variables.  相似文献   

5.
Recent research has demonstrated that information learned from building a graphical model on the predictor set of a regularized linear regression model can be leveraged to improve prediction of a continuous outcome. In this article, we present a new model that encourages sparsity at both the level of the regression coefficients and the level of individual contributions in a decomposed representation. This model provides parameter estimates with a finite sample error bound and exhibits robustness to errors in the input graph structure. Through a simulation study and the analysis of two real data sets, we demonstrate that our model provides a predictive benefit when compared to previously proposed models. Furthermore, it is a highly flexible model that provides a unified framework for the fitting of many commonly used regularized regression models. The Canadian Journal of Statistics 47: 729–747; 2019 © 2019 Statistical Society of Canada  相似文献   

6.
Abstract. We propose an extension of graphical log‐linear models to allow for symmetry constraints on some interaction parameters that represent homologous factors. The conditional independence structure of such quasi‐symmetric (QS) graphical models is described by an undirected graph with coloured edges, in which a particular colour corresponds to a set of equality constraints on a set of parameters. Unlike standard QS models, the proposed models apply with contingency tables for which only some variables or sets of the variables have the same categories. We study the graphical properties of such models, including conditions for decomposition of model parameters and of maximum likelihood estimates.  相似文献   

7.
We introduce two types of graphical log‐linear models: label‐ and level‐invariant models for triangle‐free graphs. These models generalise symmetry concepts in graphical log‐linear models and provide a tool with which to model symmetry in the discrete case. A label‐invariant model is category‐invariant and is preserved after permuting some of the vertices according to transformations that maintain the graph, whereas a level‐invariant model equates expected frequencies according to a given set of permutations. These new models can both be seen as instances of a new type of graphical log‐linear model termed the restricted graphical log‐linear model, or RGLL, in which equality restrictions on subsets of main effects and first‐order interactions are imposed. Their likelihood equations and graphical representation can be obtained from those derived for the RGLL models.  相似文献   

8.
Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets.  相似文献   

9.
Cluster analysis is one of the most widely used method in statistical analyses, in which homogeneous subgroups are identified in a heterogeneous population. Due to the existence of the continuous and discrete mixed data in many applications, so far, some ordinary clustering methods such as, hierarchical methods, k-means and model-based methods have been extended for analysis of mixed data. However, in the available model-based clustering methods, by increasing the number of continuous variables, the number of parameters increases and identifying as well as fitting an appropriate model may be difficult. In this paper, to reduce the number of the parameters, for the model-based clustering mixed data of continuous (normal) and nominal data, a set of parsimonious models is introduced. Models in this set are extended, using the general location model approach, for modeling distribution of mixed variables and applying factor analyzer structure for covariance matrices. The ECM algorithm is used for estimating the parameters of these models. In order to show the performance of the proposed models for clustering, results from some simulation studies and analyzing two real data sets are presented.  相似文献   

10.
The construction of a joint model for mixed discrete and continuous random variables that accounts for their associations is an important statistical problem in many practical applications. In this paper, we use copulas to construct a class of joint distributions of mixed discrete and continuous random variables. In particular, we employ the Gaussian copula to generate joint distributions for mixed variables. Examples include the robit-normal and probit-normal-exponential distributions, the first for modelling the distribution of mixed binary-continuous data and the second for a mixture of continuous, binary and trichotomous variables. The new class of joint distributions is general enough to include many mixed-data models currently available. We study properties of the distributions and outline likelihood estimation; a small simulation study is used to investigate the finite-sample properties of estimates obtained by full and pairwise likelihood methods. Finally, we present an application to discriminant analysis of multiple correlated binary and continuous data from a study involving advanced breast cancer patients.  相似文献   

11.
12.
While conjugate Bayesian inference in decomposable Gaussian graphical models is largely solved, the non-decomposable case still poses difficulties concerned with the specification of suitable priors and the evaluation of normalizing constants. In this paper we derive the DY-conjugate prior ( Diaconis & Ylvisaker, 1979 ) for non-decomposable models and show that it can be regarded as a generalization to an arbitrary graph G of the hyper inverse Wishart distribution ( Dawid & Lauritzen, 1993 ). In particular, if G is an incomplete prime graph it constitutes a non-trivial generalization of the inverse Wishart distribution. Inference based on marginal likelihood requires the evaluation of a normalizing constant and we propose an importance sampling algorithm for its computation. Examples of structural learning involving non-decomposable models are given. In order to deal efficiently with the set of all positive definite matrices with non-decomposable zero-pattern we introduce the operation of triangular completion of an incomplete triangular matrix. Such a device turns out to be extremely useful both in the proof of theoretical results and in the implementation of the Monte Carlo procedure.  相似文献   

13.
A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to be observed outcomes. The model accommodates multilevel data structures, general covariate effects and distinct link functions and error distributions for each underlying variable. Within the framework proposed, novel models are developed for clustered multiple binary, unordered categorical and joint discrete and continuous outcomes. A Markov chain Monte Carlo sampling algorithm is described for estimating the posterior distributions of the parameters and latent variables. Because of the flexibility of the modelling framework and estimation procedure, extensions to ordered categorical outcomes and more complex data structures are straightforward. The methods are illustrated by using data from a reproductive toxicity study.  相似文献   

14.
Multiple imputation has emerged as a popular approach to handling data sets with missing values. For incomplete continuous variables, imputations are usually produced using multivariate normal models. However, this approach might be problematic for variables with a strong non-normal shape, as it would generate imputations incoherent with actual distributions and thus lead to incorrect inferences. For non-normal data, we consider a multivariate extension of Tukey's gh distribution/transformation [38] to accommodate skewness and/or kurtosis and capture the correlation among the variables. We propose an algorithm to fit the incomplete data with the model and generate imputations. We apply the method to a national data set for hospital performance on several standard quality measures, which are highly skewed to the left and substantially correlated with each other. We use Monte Carlo studies to assess the performance of the proposed approach. We discuss possible generalizations and give some advices to practitioners on how to handle non-normal incomplete data.  相似文献   

15.
A joint estimation approach for multiple high‐dimensional Gaussian copula graphical models is proposed, which achieves estimation robustness by exploiting non‐parametric rank‐based correlation coefficient estimators. Although we focus on continuous data in this paper, the proposed method can be extended to deal with binary or mixed data. Based on a weighted minimisation problem, the estimators can be obtained by implementing second‐order cone programming. Theoretical properties of the procedure are investigated. We show that the proposed joint estimation procedure leads to a faster convergence rate than estimating the graphs individually. It is also shown that the proposed procedure achieves an exact graph structure recovery with probability tending to 1 under certain regularity conditions. Besides theoretical analysis, we conduct numerical simulations to compare the estimation performance and graph recovery performance of some state‐of‐the‐art methods including both joint estimation methods and estimation methods for individuals. The proposed method is then applied to a gene expression data set, which illustrates its practical usefulness.  相似文献   

16.
Gaussian graphical models represent the backbone of the statistical toolbox for analyzing continuous multivariate systems. However, due to the intrinsic properties of the multivariate normal distribution, use of this model family may hide certain forms of context-specific independence that are natural to consider from an applied perspective. Such independencies have been earlier introduced to generalize discrete graphical models and Bayesian networks into more flexible model families. Here, we adapt the idea of context-specific independence to Gaussian graphical models by introducing a stratification of the Euclidean space such that a conditional independence may hold in certain segments but be absent elsewhere. It is shown that the stratified models define a curved exponential family, which retains considerable tractability for parameter estimation and model selection.  相似文献   

17.
Abstract. We propose an objective Bayesian method for the comparison of all Gaussian directed acyclic graphical models defined on a given set of variables. The method, which is based on the notion of fractional Bayes factor (BF), requires a single default (typically improper) prior on the space of unconstrained covariance matrices, together with a prior sample size hyper‐parameter, which can be set to its minimal value. We show that our approach produces genuine BFs. The implied prior on the concentration matrix of any complete graph is a data‐dependent Wishart distribution, and this in turn guarantees that Markov equivalent graphs are scored with the same marginal likelihood. We specialize our results to the smaller class of Gaussian decomposable undirected graphical models and show that in this case they coincide with those recently obtained using limiting versions of hyper‐inverse Wishart distributions as priors on the graph‐constrained covariance matrices.  相似文献   

18.
Some general remarks are made about likelihood factorizations, distinguishing parameter-based factorizations and concentration-graph factorizations. Two parametric families of distributions for mixed discrete and continuous variables are discussed. Conditions on graphs are given for the circumstances under which their joint analysis can be split into separate analyses, each involving a reduced set of component variables and parameters. The result shows marked differences between the two families although both involve the same necessary condition on prime graphs. This condition is both necessary and sufficient for simplified estimation in Gaussian and for discrete log linear models.  相似文献   

19.
Multivariate Markov dependencies between different variables often can be represented graphically using acyclic digraphs (ADGs). In certain cases, though, different ADGs represent the same statistical model, thus leading to a set of equivalence classes of ADGs that constitute the true universe of available graphical models. Building upon the previously known formulas for counting the number of acyclic digraphs and the number of equivalence classes of size 1, formulas are developed to count ADG equivalence classes of arbitrary size, based on the chordal graph configurations that produce a class of that size. Theorems to validate the formulas as well as to aid in determining the appropriate chordal graphs to use for a given class size are included.  相似文献   

20.
Model checking with discrete data regressions can be difficult because the usual methods such as residual plots have complicated reference distributions that depend on the parameters in the model. Posterior predictive checks have been proposed as a Bayesian way to average the results of goodness-of-fit tests in the presence of uncertainty in estimation of the parameters. We try this approach using a variety of discrepancy variables for generalized linear models fitted to a historical data set on behavioural learning. We then discuss the general applicability of our findings in the context of a recent applied example on which we have worked. We find that the following discrepancy variables work well, in the sense of being easy to interpret and sensitive to important model failures: structured displays of the entire data set, general discrepancy variables based on plots of binned or smoothed residuals versus predictors and specific discrepancy variables created on the basis of the particular concerns arising in an application. Plots of binned residuals are especially easy to use because their predictive distributions under the model are sufficiently simple that model checks can often be made implicitly. The following discrepancy variables did not work well: scatterplots of latent residuals defined from an underlying continuous model and quantile–quantile plots of these residuals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号