首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 46 毫秒
We present APproximated Exhaustive Search (APES), which enables fast and approximated exhaustive variable selection in Generalised Linear Models (GLMs). While exhaustive variable selection remains as the gold standard in many model selection contexts, traditional exhaustive variable selection suffers from computational feasibility issues. More precisely, there is often a high cost associated with computing maximum likelihood estimates (MLE) for all subsets of GLMs. Efficient algorithms for exhaustive searches exist for linear models, most notably the leaps‐and‐bound algorithm and, more recently, the mixed integer optimisation (MIO) algorithm. The APES method learns from observational weights in a generalised linear regression super‐model and reformulates the GLM problem as a linear regression problem. In this way, APES can approximate a true exhaustive search in the original GLM space. Where exhaustive variable selection is not computationally feasible, we propose a best‐subset search, which also closely approximates a true exhaustive search. APES is made available in both as a standalone R package as well as part of the already existing mplot package.  相似文献   

In this paper, we revisit the alternative outlier model of Thompson [A note on restricted maximum likelihood estimation with an alternative outlier model, J. Roy. Stat. Soc. Ser. B 47 (1985), pp. 53–55] for detecting outliers in the linear model. Gumedze et al. [A variance shift model for detection of outliers in the linear mixed model, Comput. Statist. Data Anal. 54 (2010), pp. 2128–2144] called this model the variance shift outlier model (VSOM). The basic idea behind the VSOM is to detect observations with inflated variance and isolate them for further investigation. The VSOM is appealing because it downweights an outlier in the analysis, with the weighting determined automatically as part of the estimation procedure. We set up the VSOM as a linear mixed model and then use the likelihood ratio test (LRT) statistic as an objective measure for determining whether the weighting is required, i.e. whether the observation is an outlier. We also derived one-step updates of the variance parameter estimates based on observed, expected and average information matrices to obtain one-step LRT statistics which usually require less computation. Both the fully iterated and one-step LRTs are functions of the squared standard residuals from the null model and therefore can be computed directly without the need to fit the VSOM. We investigated the properties of the likelihood ratio tests and compare them. An extension of the model to detect a group of outliers is also given. We illustrate the proposed methodology using simulated datasets and a real dataset.  相似文献   


We consider effect of additive covariate error on linear model in observational (radiation epidemiology) study for exposure risk. Additive dose error affects dose-response shape under general linear regression settings covering identity-link GLM type models and linear excess-relative-risk grouped-Poisson models. Under independent error, dose distribution that log of dose density is up to quadratic polynomial on an interval (the log-quadratic density condition), normal, exponential, and uniform distributions, is the condition for linear regression calibration. Violation of the condition can result low-dose-high-sensitivity model from linear no-threshold (LNT) model by the dose error. Power density is also considered. A published example is given.  相似文献   

The purpose of this paper is to discuss response surface designs for multivariate generalized linear models (GLMs). Such models are considered whenever several response variables can be measured for each setting of a group of control variables, and the response variables are adequately represented by GLMs. The mean-squared error of prediction (MSEP) matrix is used to assess the quality of prediction associated with a given design. The MSEP incorporates both the prediction variance and the prediction bias, which results from using maximum likelihood estimates of the parameters of the fitted linear predictor. For a given design, quantiles of a scalar-valued function of the MSEP are obtained within a certain region of interest. The quantiles depend on the unknown parameters of the linear predictor. The dispersion of these quantiles over the space of the unknown parameters is determined and then depicted by the so-called quantile dispersion graphs. An application of the proposed methodology is presented using the special case of the bivariate binary distribution.  相似文献   

In this article, the parametric robust regression approaches are proposed for making inferences about regression parameters in the setting of generalized linear models (GLMs). The proposed methods are able to test hypotheses on the regression coefficients in the misspecified GLMs. More specifically, it is demonstrated that with large samples, the normal and gamma regression models can be properly adjusted to become asymptotically valid for inferences about regression parameters under model misspecification. These adjusted regression models can provide the correct type I and II error probabilities and the correct coverage probability for continuous data, as long as the true underlying distributions have finite second moments.  相似文献   

Heteroscedasticity generally exists when a linear regression model is applied to analyzing some real-world problems. Therefore, how to accurately estimate the variance functions of the error term in a heteroscedastic linear regression model is of great importance for obtaining efficient estimates of the regression parameters and making valid statistical inferences. A method for estimating the variance function of heteroscedastic linear regression models is proposed in this article based on the variance-reduced local linear smoothing technique. Some simulations and comparisons with other method are conducted to assess the performance of the proposed method. The results demonstrate that the proposed method can accurately estimate the variance functions and therefore produce more efficient estimates of the regression parameters.  相似文献   

This paper proposes a new robust Bayes factor for comparing two linear models. The factor is based on a pseudo‐model for outliers and is more robust to outliers than the Bayes factor based on the variance‐inflation model for outliers. If an observation is considered an outlier for both models this new robust Bayes factor equals the Bayes factor calculated after removing the outlier. If an observation is considered an outlier for one model but not the other then this new robust Bayes factor equals the Bayes factor calculated without the observation, but a penalty is applied to the model considering the observation as an outlier. For moderate outliers where the variance‐inflation model is suitable, the two Bayes factors are similar. The new Bayes factor uses a single robustness parameter to describe a priori belief in the likelihood of outliers. Real and synthetic data illustrate the properties of the new robust Bayes factor and highlight the inferior properties of Bayes factors based on the variance‐inflation model for outliers.  相似文献   

Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

This paper is concerned with selection of explanatory variables in generalized linear models (GLM). The class of GLM's is quite large and contains e.g. the ordinary linear regression, the binary logistic regression, the probit model and Poisson regression with linear or log-linear parameter structure. We show that, through an approximation of the log likelihood and a certain data transformation, the variable selection problem in a GLM can be converted into variable selection in an ordinary (unweighted) linear regression model. As a consequence no specific computer software for variable selection in GLM's is needed. Instead, some suitable variable selection program for linear regression can be used. We also present a simulation study which shows that the log likelihood approximation is very good in many practical situations. Finally, we mention briefly possible extensions to regression models outside the class of GLM's.  相似文献   

We present an algorithm for multivariate robust Bayesian linear regression with missing data. The iterative algorithm computes an approximative posterior for the model parameters based on the variational Bayes (VB) method. Compared to the EM algorithm, the VB method has the advantage that the variance for the model parameters is also computed directly by the algorithm. We consider three families of Gaussian scale mixture models for the measurements, which include as special cases the multivariate t distribution, the multivariate Laplace distribution, and the contaminated normal model. The observations can contain missing values, assuming that the missing data mechanism can be ignored. A Matlab/Octave implementation of the algorithm is presented and applied to solve three reference examples from the literature.  相似文献   


Model misspecification in generalized linear models (GLMs) occurs usually when the linear predictor and/or the link function assumed are incorrect. This article discusses the effect of such misspecification on design selection for multinomial GLMs and proposes the use of quantile dispersion graphs to select robust designs. Due to misspecification in the model, parameter estimates are usually biased and the designs are compared on the basis of their mean squared error of prediction. Several numerical examples including a real data set are presented to illustrate the proposed methodology.  相似文献   

The class of beta regression models proposed by Ferrari and Cribari-Neto [Beta regression for modelling rates and proportions, Journal of Applied Statistics 31 (2004), pp. 799–815] is useful for modelling data that assume values in the standard unit interval (0, 1). The dependent variable relates to a linear predictor that includes regressors and unknown parameters through a link function. The model is also indexed by a precision parameter, which is typically taken to be constant for all observations. Some authors have used, however, variable dispersion beta regression models, i.e., models that include a regression submodel for the precision parameter. In this paper, we show how to perform testing inference on the parameters that index the mean submodel without having to model the data precision. This strategy is useful as it is typically harder to model dispersion effects than mean effects. The proposed inference procedure is accurate even under variable dispersion. We present the results of extensive Monte Carlo simulations where our testing strategy is contrasted to that in which the practitioner models the underlying dispersion and then performs testing inference. An empirical application that uses real (not simulated) data is also presented and discussed.  相似文献   

In this article, we propose an outlier detection approach in a multiple regression model using the properties of a difference-based variance estimator. This type of a difference-based variance estimator was originally used to estimate error variance in a non parametric regression model without estimating a non parametric function. This article first employed a difference-based error variance estimator to study the outlier detection problem in a multiple regression model. Our approach uses the leave-one-out type method based on difference-based error variance. The existing outlier detection approaches using the leave-one-out approach are highly affected by other outliers, while ours is not because our approach does not use the regression coefficient estimator. We compared our approach with several existing methods using a simulation study, suggesting the outperformance of our approach. The advantages of our approach are demonstrated using a real data application. Our approach can be extended to the non parametric regression model for outlier detection.  相似文献   

Using generalized linear models (GLMs), Jalaludin  et al. (2006;  J. Exposure Analysis and Epidemiology   16 , 225–237) studied the association between the daily number of visits to emergency departments for cardiovascular disease by the elderly (65+) and five measures of ambient air pollution. Bayesian methods provide an alternative approach to classical time series modelling and are starting to be more widely used. This paper considers Bayesian methods using the dataset used by Jalaludin  et al.  (2006) , and compares the results from Bayesian methods with those obtained by Jalaludin  et al.  (2006) using GLM methods.  相似文献   

This paper presents an extension of instrumental variable estimation to nonlinear regression models. For the linear model, the extended estimator is equivalent to the two-stage least squares estimator. The extended estimator is consistent for an important class of nonlinear models, including the logistic model, under relatively weak assumptions on the distribution of the measurement error. An example and simulation study are presented for the logistic regression model. The simulations suggest the estimator is reasonably efficient.  相似文献   

Generalized linear models (GLMs) are widely studied to deal with complex response variables. For the analysis of categorical dependent variables with more than two response categories, multivariate GLMs are presented to build the relationship between this polytomous response and a set of regressors. Traditional variable selection approaches have been proposed for the multivariate GLM with a canonical link function when the number of parameters is fixed in the literature. However, in many model selection problems, the number of parameters may be large and grow with the sample size. In this paper, we present a new selection criterion to the model with a diverging number of parameters. Under suitable conditions, the criterion is shown to be model selection consistent. A simulation study and a real data analysis are conducted to support theoretical findings.  相似文献   

The use of graphical methods for comparing the quality of prediction throughout the design space of an experiment has been explored extensively for responses modeled with standard linear models. In this paper, fraction of design space (FDS) plots are adapted to evaluate designs for generalized linear models (GLMs). Since the quality of designs for GLMs depends on the model parameters, initial parameter estimates need to be provided by the experimenter. Consequently, an important question to consider is the design's robustness to user misspecification of the initial parameter estimates. FDS plots provide a graphical way of assessing the relative merits of different designs under a variety of types of parameter misspecification. Examples using logistic and Poisson regression models with their canonical links are used to demonstrate the benefits of the FDS plots.  相似文献   

We study the general linear model (GLM) with doubly exchangeable distributed error for m observed random variables. The doubly exchangeable general linear model (DEGLM) arises when the m-dimensional error vectors are “doubly exchangeable,” jointly normally distributed, which is a much weaker assumption than the independent and identically distributed error vectors as in the case of GLM or classical GLM (CGLM). We estimate the parameters in the model and also find their distributions. We show that the tests of intercept and slope are possible in DEGLM as a particular case using parametric bootstrap as well as multivariate Satterthwaite approximation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号