首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
多图模型表示来自于不同类的同一组随机变量间的相关关系,结点表示随机变量,边表示变量之间的直接联系,各类的图模型反映了各自相关结构特征和类间共同的信息。用多图模型联合估计方法,将来自不同个体的数据按其特征分类,假设每类中各变量间的相依结构服从同一个高斯图模型,应用组Lasso方法和图Lasso方法联合估计每类的图模型结构。数值模拟验证了多图模型联合估计方法的有效性。用多图模型和联合估计方法对中国15个省份13个宏观经济指标进行相依结构分析,结果表明,不同经济发展水平省份的宏观经济变量间存在共同的相关联系,反映了中国现阶段经济发展的特征;每一类的相关结构反映了各类省份经济发展独有的特征。  相似文献   

This study considers a fully-parametric but uncongenial multiple imputation (MI) inference to jointly analyze incomplete binary response variables observed in a correlated data settings. Multiple imputation model is specified as a fully-parametric model based on a multivariate extension of mixed-effects models. Dichotomized imputed datasets are then analyzed using joint GEE models where covariates are associated with the marginal mean of responses with response-specific regression coefficients and a Kronecker product is accommodated for cluster-specific correlation structure for a given response variable and correlation structure between multiple response variables. The validity of the proposed MI-based JGEE (MI-JGEE) approach is assessed through a Monte Carlo simulation study under different scenarios. The simulation results, which are evaluated in terms of bias, mean-squared error, and coverage rate, show that MI-JGEE has promising inferential properties even when the underlying multiple imputation is misspecified. Finally, Adolescent Alcohol Prevention Trial data are used for illustration.  相似文献   

In studies of complex disorders such as nicotine dependence, it is common that researchers assess multiple variables related to a disorder as well as other disorders that are potentially correlated with the primary disorder of interest. In this work, we refer to those variables and disorders broadly as multiple traits. The multiple traits may or may not have a common causal genetic variant. Intuitively, it may be more powerful to accommodate multiple traits in genetic traits, but the analysis of multiple traits is generally more complicated than the analysis of a single trait. Furthermore, it is not well documented as to how much power we may potentially gain by considering multiple traits. Our aim is to enhance our understanding on this important and practical issue. We considered a variety of correlation structures between traits and the disease locus. To focus on the effect of accommodating multiple traits, we examined genetic models that are relatively simple so that we can pinpoint the factors affecting the power. We conducted simulation studies to explore the performance of testing multiple traits simultaneously and the performance of testing a single trait at a time in family-based association studies. Our simulation results demonstrated that the performance of testing multiple traits simultaneously is better than that of testing each trait individually for almost models considered. We also found that the power of association tests varies among the underlying models. The advantage of conducting a multiple traits test is minimized when some traits are influenced by the gene only through other traits; and it is maximized when there are causal relations between the traits and the gene, and among the traits themselves or when there are extraneous traits.  相似文献   

Abstract. In geophysical and environmental problems, it is common to have multiple variables of interest measured at the same location and time. These multiple variables typically have dependence over space (and/or time). As a consequence, there is a growing interest in developing models for multivariate spatial processes, in particular, the cross‐covariance models. On the other hand, many data sets these days cover a large portion of the Earth such as satellite data, which require valid covariance models on a globe. We present a class of parametric covariance models for multivariate processes on a globe. The covariance models are flexible in capturing non‐stationarity in the data yet computationally feasible and require moderate numbers of parameters. We apply our covariance model to surface temperature and precipitation data from an NCAR climate model output. We compare our model to the multivariate version of the Matérn cross‐covariance function and models based on coregionalization and demonstrate the superior performance of our model in terms of AIC (and/or maximum loglikelihood values) and predictive skill. We also present some challenges in modelling the cross‐covariance structure of the temperature and precipitation data. Based on the fitted results using full data, we give the estimated cross‐correlation structure between the two variables.  相似文献   

A flexible family of multivariate models, named multiple stable Tweedie (MST) models, is introduced and produces generalized variance functions which are products of powered components of the mean. These MST models are built from a fixed univariate stable Tweedie variable having a positive value domain, and the remaining random variables given the fixed one are also real independent Tweedie variables, with the same dispersion parameter equal to the fixed component. In this huge family of MST models, generalized variance estimators are explicitly pointed out by maximum likelihood method and, moreover, computably presented for the uniform minimum variance and unbiased approach. The second estimator is brought from modified Lévy measures of MST which lead to some solutions of particular Monge–Ampère equations.  相似文献   

The partial least squares (PLS) approach first constructs new explanatory variables, known as factors (or components), which are linear combinations of available predictor variables. A small subset of these factors is then chosen and retained for prediction. We study the performance of PLS in estimating single-index models, especially when the predictor variables exhibit high collinearity. We show that PLS estimates are consistent up to a constant of proportionality. We present three simulation studies that compare the performance of PLS in estimating single-index models with that of sliced inverse regression (SIR). In the first two studies, we find that PLS performs better than SIR when collinearity exists. In the third study, we learn that PLS performs well even when there are multiple dependent variables, the link function is non-linear and the shape of the functional form is not known.  相似文献   

Additive varying coefficient models are a natural extension of multiple linear regression models, allowing the regression coefficients to be functions of other variables. Therefore these models are more flexible to model more complex dependencies in data structures. In this paper we consider the problem of selecting in an automatic way the significant variables among a large set of variables, when the interest is on a given response variable. In recent years several grouped regularization methods have been proposed and in this paper we present these under one unified framework in this varying coefficient model context. For each of the discussed grouped regularization methods we investigate the optimization problem to be solved, possible algorithms for doing so, and the variable and estimation consistency of the methods. We investigate the finite-sample performance of these methods, in a comparative study, and illustrate them on real data examples.  相似文献   

In this paper, we propose a new partial correlation, the so-called composite quantile partial correlation, to measure the relationship of two variables given other variables. We further use this correlation to screen variables in ultrahigh-dimensional varying coefficient models. Our proposed method is fast and robust against outliers and can be efficiently employed in both single index variable and multiple index variable varying coefficient models. Numerical results indicate the preference of our proposed method.  相似文献   

An estimated 1 billion people suffer from hunger worldwide, and climate change, urbanization, and globalization have the potential to exacerbate this situation. Improved models for predicting food security are needed to understand these impacts and design interventions. However, food insecurity is the result of complex interactions between physical and socio-economic factors that can overwhelm linear regression models. More sophisticated data-mining approaches could provide an effective way to model these relationships and accurately predict food insecure situations. In this paper, we compare multiple regression and data-mining methods in their ability to predict the percent of a country's population that suffers from undernourishment using widely available predictor variables related to socio-economic settings, agricultural production and trade, and climate conditions. Averaging predictions from multiple models results in the lowest predictive error and provides an accurate method to predict undernourishment levels. Partial dependence plots are used to evaluate covariate influence and demonstrate the relationship between food insecurity and climatic and socio-economic variables. By providing insights into these relationships and a mechanism for predicting undernourishment using readily available data, statistical models like those developed here could be a useful tool for those tasked with understanding and addressing food insecurity.  相似文献   

In the case where non-experimental data are available from an industrial process and a directed graph for how various factors affect a response variable is known based on a substantive understanding of the process, we consider a problem in which a control plan involving multiple treatment variables is conducted in order to bring a response variable close to a target value with variation reduction. Using statistical causal analysis with linear (recursive and non-recursive) structural equation models, we configure an optimal control plan involving multiple treatment variables through causal parameters. Based on the formulation, we clarify the causal mechanism for how the variance of a response variable changes when the control plan is conducted. The results enable us to evaluate the effect of a control plan on the variance of a response variable from non-experimental data and provide a new application of linear structural equation models to engineering science.  相似文献   

A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to be observed outcomes. The model accommodates multilevel data structures, general covariate effects and distinct link functions and error distributions for each underlying variable. Within the framework proposed, novel models are developed for clustered multiple binary, unordered categorical and joint discrete and continuous outcomes. A Markov chain Monte Carlo sampling algorithm is described for estimating the posterior distributions of the parameters and latent variables. Because of the flexibility of the modelling framework and estimation procedure, extensions to ordered categorical outcomes and more complex data structures are straightforward. The methods are illustrated by using data from a reproductive toxicity study.  相似文献   

Data collection process in most observational and experimental studies yield different types of variables, leading to the use of joint models that are capable of handling multiple data types. Evaluation of various statistical techniques that have been developed for mixed data in simulated environments requires concurrent generation of multiple variables. In this article, I present an important augmentation to a unified framework proposed in our previously published work for simultaneously generating binary and nonnormal continuous data given the marginal characteristics and correlation structure, via fifth-order power polynomials that are known to extend the area covered in the skewness-elongation plane and to provide a better approximation to the probability density function of the continuous variables. I evaluate how well the improved methodology performs in comparison to the original one, in a simulated setting with illustrations of algorithmic steps. Although the relative gains for the associational quantities are not substantial, the augmented version appears to better capture the marginal quantities that are pertinent to the higher-order moments, as indicated by very close resemblance between the specified and empirically computed quantities on average.  相似文献   

The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The performance of a newly proposed imputation method based on generalized additive models for location, scale, and shape (GAMLSS) is investigated. Although imputation methods based on predictive mean matching are virtually unbiased, they suffer from mild to moderate under-coverage, even in the experiment where all variables are jointly normal distributed. The GAMLSS method features better coverage than currently available methods.  相似文献   

Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets.  相似文献   

We propose a general Bayesian joint modeling approach to model mixed longitudinal outcomes from the exponential family for taking into account any differential misclassification that may exist among categorical outcomes. Under this framework, outcomes observed without measurement error are related to latent trait variables through generalized linear mixed effect models. The misclassified outcomes are related to the latent class variables, which represent unobserved real states, using mixed hidden Markov models (MHMMs). In addition to enabling the estimation of parameters in prevalence, transition and misclassification probabilities, MHMMs capture cluster level heterogeneity. A transition modeling structure allows the latent trait and latent class variables to depend on observed predictors at the same time period and also on latent trait and latent class variables at previous time periods for each individual. Simulation studies are conducted to make comparisons with traditional models in order to illustrate the gains from the proposed approach. The new approach is applied to data from the Southern California Children Health Study to jointly model questionnaire-based asthma state and multiple lung function measurements in order to gain better insight about the underlying biological mechanism that governs the inter-relationship between asthma state and lung function development.  相似文献   

This paper aims at evaluating different aspects of Monte Carlo expectation – maximization algorithm to estimate heavy-tailed mixed logistic regression (MLR) models. As a novelty it also proposes a multiple chain Gibbs sampler to generate of the latent variables distributions thus obtaining independent samples. In heavy-tailed MLR models, the analytical forms of the full conditional distributions for the random effects are unknown. Four different Metropolis–Hastings algorithms are assumed to generate from them. We also discuss stopping rules in order to obtain more efficient algorithms in heavy-tailed MLR models. The algorithms are compared through the analysis of simulated and Ascaris Suum data.  相似文献   

Many different biased regression techniques have been proposed for estimating parameters of a multiple linear regression model when the predictor variables are collinear. One particular alternative, latent root regression analysis, is a technique based on analyzing the latent roots and latent vectors of the correlation matrix of both the response and the predictor variables. It is the purpose of this paper to review the latent root regression estimator and to re-examine some of its properties and applications. It is shown that the latent root estimator is a member of a wider class of estimators for linear models  相似文献   

In a general parametric setup, a multivariate regression model is considered when responses may be missing at random while the explanatory variables and covariates are completely observed. Asymptotic optimality properties of maximum likelihood estimators for such models are linked to the Fisher information matrix for the parameters. It is shown that the information matrix is well defined for the missing-at-random model and that it plays the same role as in the complete-data linear models. Applications of the methodologic developments in hypothesis-testing problems, without any imputation of missing data, are illustrated. Some simulation results comparing the proposed method with Rubin's multiple imputation method are presented.  相似文献   

In electrical engineering, circuit designs are now often optimized via circuit simulation computer models. Typically, many response variables characterize the circuit's performance. Each response is a function of many input variables, including factors that can be set in the engineering design and noise factors representing manufacturing conditions. We describe a modelling approach which is appropriate for the simulator's deterministic input–output relationships. Non-linearities and interactions are identified without explicit assumptions about the functional form. These models lead to predictors to guide the reduction of the ranges of the designable factors in a sequence of experiments. Ultimately, the predictors are used to optimize the engineering design. We also show how a visualization of the fitted relationships facilitates an understanding of the engineering trade-offs between responses. The example used to demonstrate these methods, the design of a buffer circuit, has multiple targets for the responses, representing different trade-offs between the key performance measures.  相似文献   

Quantile regression models are a powerful tool for studying different points of the conditional distribution of univariate response variables. Their multivariate counterpart extension though is not straightforward, starting with the definition of multivariate quantiles. We propose here a flexible Bayesian quantile regression model when the response variable is multivariate, where we are able to define a structured additive framework for all predictor variables. We build on previous ideas considering a directional approach to define the quantiles of a response variable with multiple outputs, and we define noncrossing quantiles in every directional quantile model. We define a Markov chain Monte Carlo (MCMC) procedure for model estimation, where the noncrossing property is obtained considering a Gaussian process design to model the correlation between several quantile regression models. We illustrate the results of these models using two datasets: one on dimensions of inequality in the population, such as income and health; the second on scores of students in the Brazilian High School National Exam, considering three dimensions for the response variable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号