首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The purpose of this paper is to discuss response surface designs for multivariate generalized linear models (GLMs). Such models are considered whenever several response variables can be measured for each setting of a group of control variables, and the response variables are adequately represented by GLMs. The mean-squared error of prediction (MSEP) matrix is used to assess the quality of prediction associated with a given design. The MSEP incorporates both the prediction variance and the prediction bias, which results from using maximum likelihood estimates of the parameters of the fitted linear predictor. For a given design, quantiles of a scalar-valued function of the MSEP are obtained within a certain region of interest. The quantiles depend on the unknown parameters of the linear predictor. The dispersion of these quantiles over the space of the unknown parameters is determined and then depicted by the so-called quantile dispersion graphs. An application of the proposed methodology is presented using the special case of the bivariate binary distribution.  相似文献   

Vine copula provides a flexible tool to capture asymmetry in modeling multivariate distributions. Nevertheless, its flexibility is achieved at the expense of exponentially increasing complexity of the model. To alleviate this issue, the simplifying assumption (SA) is commonly adapted in specific applications of vine copula models. In this paper, generalized linear models (GLMs) are proposed for the parameters in conditional bivariate copulas to relax the SA. In the spirit of the principle of parsimony, a regularization methodology is developed to control the number of parameters, leading to sparse vine copula models. The conventional vine copula with the SA, the proposed GLM-based vine copula, and the sparse vine copula are applied to several financial datasets, and the results show that our proposed models outperform the one with SA significantly in terms of the Bayesian information criterion.  相似文献   

In real‐data analysis, deciding the best subset of variables in regression models is an important problem. Akaike's information criterion (AIC) is often used in order to select variables in many fields. When the sample size is not so large, the AIC has a non‐negligible bias that will detrimentally affect variable selection. The present paper considers a bias correction of AIC for selecting variables in the generalized linear model (GLM). The GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields. In the present study, we obtain a simple expression for a bias‐corrected AIC (corrected AIC, or CAIC) in GLMs. Furthermore, we provide an ‘R’ code based on our formula. A numerical study reveals that the CAIC has better performance than the AIC for variable selection.  相似文献   

The number of parameters mushrooms in a linear mixed effects (LME) model in the case of multivariate repeated measures data. Computation of these parameters is a real problem with the increase in the number of response variables or with the increase in the number of time points. The problem becomes more intricate and involved with the addition of additional random effects. A multivariate analysis is not possible in a small sample setting. We propose a method to estimate these many parameters in bits and pieces from baby models, by taking a subset of response variables at a time, and finally using these bits and pieces at the end to get the parameter estimates for the mother model, with all variables taken together. Applying this method one can calculate the fixed effects, the best linear unbiased predictions (BLUPs) for the random effects in the model, and also the BLUPs at each time of observation for each response variable, to monitor the effectiveness of the treatment for each subject. The proposed method is illustrated with an example of multiple response variables measured over multiple time points arising from a clinical trial in osteoporosis.  相似文献   

The variational approach to Bayesian inference enables simultaneous estimation of model parameters and model complexity. An interesting feature of this approach is that it also leads to an automatic choice of model complexity. Empirical results from the analysis of hidden Markov models with Gaussian observation densities illustrate this. If the variational algorithm is initialized with a large number of hidden states, redundant states are eliminated as the method converges to a solution, thereby leading to a selection of the number of hidden states. In addition, through the use of a variational approximation, the deviance information criterion for Bayesian model selection can be extended to the hidden Markov model framework. Calculation of the deviance information criterion provides a further tool for model selection, which can be used in conjunction with the variational approach.  相似文献   

This paper describes an EM algorithm for maximum likelihood estimation in generalized linear models (GLMs) with continuous measurement error in the explanatory variables. The algorithm is an adaptation of that for nonparametric maximum likelihood (NPML) estimation in overdispersed GLMs described in Aitkin (Statistics and Computing 6: 251–262, 1996). The measurement error distribution can be of any specified form, though the implementation described assumes normal measurement error. Neither the reliability nor the distribution of the true score of the variables with measurement error has to be known, nor are instrumental variables or replication required.Standard errors can be obtained by omitting individual variables from the model, as in Aitkin (1996).Several examples are given, of normal and Bernoulli response variables.  相似文献   

We look at prediction in regression models under squared loss for the random x case with many explanatory variables. Model reduction is done by conditioning upon only a small number of linear combinations of the original variables. The corresponding reduced model will then essentially be the population model for the chemometricians' partial least squares algorithm. Estimation of the selection matrix under this model is briefly discussed, and analoguous results for the case with multivariate response are formulated. Finally, it is shown that an assumption of multinormality may be weakened to assuming elliptically symmetric distribution, and that some of the results are valid without any distributional assumption at all.  相似文献   

Generalized linear models (GLMs) with error-in-covariates are useful in epidemiological research due to the ubiquity of non-normal response variables and inaccurate measurements. The link function in GLMs is chosen by the user depending on the type of response variable, frequently the canonical link function. When covariates are measured with error, incorrect inference can be made, compounded by incorrect choice of link function. In this article we propose three flexible approaches for handling error-in-covariates and estimating an unknown link simultaneously. The first approach uses a fully Bayesian (FB) hierarchical framework, treating the unobserved covariate as a latent variable to be integrated over. The second and third are approximate Bayesian approach which use a Laplace approximation to marginalize the variables measured with error out of the likelihood. Our simulation results show support that the FB approach is often a better choice than the approximate Bayesian approaches for adjusting for measurement error, particularly when the measurement error distribution is misspecified. These approaches are demonstrated on an application with binary response.  相似文献   

Linear discriminant analysis between two populations is considered in this paper. Error rate is reviewed as a criterion for selection of variables, and a stepwise procedure is outlined that selects variables on the basis of empirical estimates of error. Problems with assessment of the selected variables are highlighted. A leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself. Monte Carlo simulations, of multivariate binary as well as multivariate normal data, demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods.  相似文献   

We derive explicit formulas for Sobol's sensitivity indices (SSIs) under the generalized linear models (GLMs) with independent or multivariate normal inputs. We argue that the main-effect SSIs provide a powerful tool for variable selection under GLMs with identity links under polynomial regressions. We also show via examples that the SSI-based variable selection results are similar to the ones obtained by the random forest algorithm but without the computational burden of data permutation. Finally, applying our results to the problem of gene network discovery, we identify through the SSI analysis of a public microarray dataset several novel higher-order gene–gene interactions missed out by the more standard inference methods. The relevant functions for SSI analysis derived here under GLMs with identity, log, and logit links are implemented and made available in the R package Sobol sensitivity.  相似文献   

Given a set of possible models for variables X and a set of possible parameters for each model, the Bayesian estimate of the probability distribution for X given observed data is obtained by averaging over the possible models and their parameters. An often-used approximation for this estimate is obtained by selecting a single model and averaging over its parameters. The approximation is useful because it is computationally efficient, and because it provides a model that facilitates understanding of the domain. A common criterion for model selection is the posterior probability of the model. Another criterion for model selection, proposed by San Martini and Spezzafari (1984), is the predictive performance of a model for the next observation to be seen. From the standpoint of domain understanding, both criteria are useful, because one identifies the model that is most likely, whereas the other identifies the model that is the best predictor of the next observation. To highlight the difference, we refer to the posterior-probability and alternative criteria as the scientific criterion (SC) and engineering criterion (EC), respectively. When we are interested in predicting the next observation, the model-averaged estimate is at least as good as that produced by EC, which itself is at least as good as the estimate produced by SC. We show experimentally that, for Bayesian-network models containing discrete variables only, the predictive performance of the model average can be significantly better than those of single models selected by either criterion, and that differences between models selected by the two criterion can be substantial.  相似文献   

The location model is a familiar basis for discriminant analysis of mixtures of categorical and continuous variables. Its usual implementation involves second-order smoothing, using multivariate regression for the continuous variables and log-linear models for the categorical variables. In spite of the smoothing, these procedures still require many parameters to be estimated and this in turn restricts the categorical variables to a small number if implementation is to be feasible. In this paper we propose non-parametric smoothing procedures for both parts of the model. The number of parameters to be estimated is dramatically reduced and the range of applicability thereby greatly increased. The methods are illustrated on several data sets, and the performances are compared with a range of other popular discrimination techniques. The proposed method compares very favourably with all its competitors.  相似文献   

Variable selection is fundamental to high-dimensional multivariate generalized linear models. The smoothly clipped absolute deviation (SCAD) method can solve the problem of variable selection and estimation. The choice of the tuning parameter in the SCAD method is critical, which controls the complexity of the selected model. This article proposes a criterion to select the tuning parameter for the SCAD method in multivariate generalized linear models, which is shown to be able to identify the true model consistently. Simulation studies are conducted to support theoretical findings, and two real data analysis are given to illustrate the proposed method.  相似文献   

The generalized estimating equations (GEE) approach has attracted considerable interest for the analysis of correlated response data. This paper considers the model selection criterion based on the multivariate quasi‐likelihood (MQL) in the GEE framework. The GEE approach is closely related to the MQL. We derive a necessary and sufficient condition for the uniqueness of the risk function based on the MQL by using properties of differential geometry. Furthermore, we establish a formal derivation of model selection criterion as an asymptotically unbiased estimator of the prediction risk under this condition, and we explicitly take into account the effect of estimating the correlation matrix used in the GEE procedure.  相似文献   

In the problem of selecting variables in a multivariate linear regression model, we derive new Bayesian information criteria based on a prior mixing a smooth distribution and a delta distribution. Each of them can be interpreted as a fusion of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Inheriting their asymptotic properties, our information criteria are consistent in variable selection in both the large-sample and the high-dimensional asymptotic frameworks. In numerical simulations, variable selection methods based on our information criteria choose the true set of variables with high probability in most cases.  相似文献   

Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar mean-square errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with single-model approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution.  相似文献   

This article deals with a semisupervised learning based on naive Bayes assumption. A univariate Gaussian mixture density is used for continuous input variables whereas a histogram type density is adopted for discrete input variables. The EM algorithm is used for the computation of maximum likelihood estimators of parameters in the model when we fix the number of mixing components for each continuous input variable. We carry out a model selection for choosing a parsimonious model among various fitted models based on an information criterion. A common density method is proposed for the selection of significant input variables. Simulated and real datasets are used to illustrate the performance of the proposed method.  相似文献   

Cross-lagged panel studies are studies in which two or more variables are measured for a large number of subjects at each of several points in time. The variables divide naturally into two sets, and the purpose of the analysis is to estimate and test the cross-effects between the two sets. One approach to this analysis is to treat the cross-effects as parameters in regression equations. This study contributes to this approach by extending the regression model to a multivariate model that captures the correlation among the variables and allows the errors in the model to be correlated over time.  相似文献   

Panel count data arise in many fields and a number of estimation procedures have been developed along with two procedures for variable selection. In this paper, we discuss model selection and parameter estimation together. For the former, a focused information criterion (FIC) is presented and for the latter, a frequentist model average (FMA) estimation procedure is developed. A main advantage, also the difference from the existing model selection methods, of the FIC is that it emphasizes the accuracy of the estimation of the parameters of interest, rather than all parameters. Further efficiency gain can be achieved by the FMA estimation procedure as unlike existing methods, it takes into account the variability in the stage of model selection. Asymptotic properties of the proposed estimators are established, and a simulation study conducted suggests that the proposed methods work well for practical situations. An illustrative example is also provided. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics  相似文献   

This paper investigates the focused information criterion and plug-in average for vector autoregressive models with local-to-zero misspecification. These methods have the advantage of focusing on a quantity of interest rather than aiming at overall model fit. Any (su?ciently regular) function of the parameters can be used as a quantity of interest. We determine the asymptotic properties and elaborate on the role of the locally misspecified parameters. In particular, we show that the inability to consistently estimate locally misspecified parameters translates into suboptimal selection and averaging. We apply this framework to impulse response analysis. A Monte Carlo simulation study supports our claims.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号