期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance of Variable Selection Methods in Regression Using Variations of the Bayesian Information Criterion

Tom Burr Herb Fry Brian McVey Eric Sander Joseph Cavanaugh Andrew Neath 《统计学通讯:模拟与计算》2013,42(3):507-520

The Bayesian information criterion (BIC) is widely used for variable selection. We focus on the regression setting for which variations of the BIC have been proposed. A version that includes the Fisher Information matrix of the predictor variables performed best in one published study. In this article, we extend the evaluation, introduce a performance measure involving how closely posterior probabilities are approximated, and conclude that the version that includes the Fisher Information often favors regression models having more predictors, depending on the scale and correlation structure of the predictor matrix. In the image analysis application that we describe, we therefore prefer the standard BIC approximation because of its relative simplicity and competitive performance at approximating the true posterior probabilities. 相似文献

2.

Consistent Bayesian information criterion based on a mixture prior for possibly high-dimensional multivariate linear regression models

Haruki Kono Tatsuya Kubokawa 《Scandinavian Journal of Statistics》2023,50(3):1022-1047

In the problem of selecting variables in a multivariate linear regression model, we derive new Bayesian information criteria based on a prior mixing a smooth distribution and a delta distribution. Each of them can be interpreted as a fusion of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Inheriting their asymptotic properties, our information criteria are consistent in variable selection in both the large-sample and the high-dimensional asymptotic frameworks. In numerical simulations, variable selection methods based on our information criteria choose the true set of variables with high probability in most cases. 相似文献

3.

Model selection criteria for dual-inflated data

Ting Hsiang Lin Min-Hsiao Tsai 《Journal of Statistical Computation and Simulation》2016,86(13):2663-2672

ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size. 相似文献

4.

A note on variable selection in functional regression via random subspace method

Łukasz Smaga Hidetoshi Matsui 《Statistical Methods and Applications》2018,27(3):455-477

Variable selection problem is one of the most important tasks in regression analysis, especially in a high-dimensional setting. In this paper, we study this problem in the context of scalar response functional regression model, which is a linear model with scalar response and functional regressors. The functional model can be represented by certain multiple linear regression model via basis expansions of functional variables. Based on this model and random subspace method of Mielniczuk and Teisseyre (Comput Stat Data Anal 71:725–742, 2014), two simple variable selection procedures for scalar response functional regression model are proposed. The final functional model is selected by using generalized information criteria. Monte Carlo simulation studies conducted and a real data example show very satisfactory performance of new variable selection methods under finite samples. Moreover, they suggest that considered procedures outperform solutions found in the literature in terms of correctly selected model, false discovery rate control and prediction error. 相似文献

5.

Variable selection of the quantile varying coefficient regression models

Weihua Zhao Riquan Zhang Yazhao Lv Jicai Liu 《Journal of the Korean Statistical Society》2013,42(3):343-358

As a useful supplement to mean regression, quantile regression is a completely distribution-free approach and is more robust to heavy-tailed random errors. In this paper, a variable selection procedure for quantile varying coefficient models is proposed by combining local polynomial smoothing with adaptive group LASSO. With an appropriate selection of tuning parameters by the BIC criterion, the theoretical properties of the new procedure, including consistency in variable selection and the oracle property in estimation, are established. The finite sample performance of the newly proposed method is investigated through simulation studies and the analysis of Boston house price data. Numerical studies confirm that the newly proposed procedure (QKLASSO) has both robustness and efficiency for varying coefficient models irrespective of error distribution, which is a good alternative and necessary supplement to the KLASSO method. 相似文献

6.

Regression model selection—a residual likelihood approach

Peide Shi Chih-Ling Tsai 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2002,64(2):237-252

Summary. We obtain the residual information criterion RIC, a selection criterion based on the residual log-likelihood, for regression models including classical regression models, Box–Cox transformation models, weighted regression models and regression models with autoregressive moving average errors. We show that RIC is a consistent criterion, and that simulation studies for each of the four models indicate that RIC provides better model order choices than the Akaike information criterion, corrected Akaike information criterion, final prediction error, C _p and R _adj², except when the sample size is small and the signal-to-noise ratio is weak. In this case, none of the criteria performs well. Monte Carlo results also show that RIC is superior to the consistent Bayesian information criterion BIC when the signal-to-noise ratio is not weak, and it is comparable with BIC when the signal-to-noise ratio is weak and the sample size is large. 相似文献

7.

Empirical Bayes vs. fully Bayes variable selection

Wen Cui Edward I. George 《Journal of statistical planning and inference》2008

For the problem of variable selection for the normal linear model, fixed penalty selection criteria such as AIC, _C_p

C_{p}

, BIC and RIC correspond to the posterior modes of a hierarchical Bayes model for various fixed hyperparameter settings. Adaptive selection criteria obtained by empirical Bayes estimation of the hyperparameters have been shown by George and Foster [2000. Calibration and Empirical Bayes variable selection. Biometrika 87(4), 731–747] to improve on these fixed selection criteria. In this paper, we study the potential of alternative fully Bayes methods, which instead margin out the hyperparameters with respect to prior distributions. Several structured prior formulations are considered for which fully Bayes selection and estimation methods are obtained. Analytical and simulation comparisons with empirical Bayes counterparts are studied. 相似文献

8.

Evaluating modified generalized information criterion in presence of multicollinearity

Ali Hussein Al-Marshadi Abdullah Hamoud Alharby 《统计学通讯:模拟与计算》2017,46(8):6298-6307

When there are many explanatory variables in the regression model, there is a chance that some of these are intercorrelated. This is where the problem of multicollinearity creeps in due to which precision and accuracy of the coefficients is marred, and the quest to find the best model becomes tedious. To tackle such a situation, Model selection criteria are applied for selecting the best model that fits the data. Current study focuses on the evaluation of the four unmodified and four modified versions of generalized information criteria—Akaike Information Criterion, Schwarz's Bayes Information Criteria, Hannan-Quinn Information Criterion, and Akaike Information Criterion corrected for small samples. A simulation study using SAS software was carried out in order to compare the unmodified and modified versions of the generalized information criteria and to discover the best version amongst the four modified model selection criteria, for identifying the best model, when the collinearity assumption is violated. For the proposed simulation, two samples of size 50 and 100, for three explanatory variables X₁, X₂, and X₃, are drawn from Normal distribution. Two situations of collinearity violations between X₁ and X₂ are looked into, first when ρ = 0.6 and second when ρ = 0.8. The outcomes of the simulations are displayed in the tables along with visual representations. The results revealed that modified versions of the generalized information criteria are more sensitive in identifying models marred with high multicollinearity as compared to the unmodified generalized information criteria. 相似文献

9.

Forecasting time series of economic processes by model averaging across data frames of various lengths

Nikita A. Moiseev 《Journal of Statistical Computation and Simulation》2017,87(16):3111-3131

This paper presents an extension of mean-squared forecast error (MSFE) model averaging for integrating linear regression models computed on data frames of various lengths. Proposed method is considered to be a preferable alternative to best model selection by various efficiency criteria such as Bayesian information criterion (BIC), Akaike information criterion (AIC), F-statistics and mean-squared error (MSE) as well as to Bayesian model averaging (BMA) and naïve simple forecast average. The method is developed to deal with possibly non-nested models having different number of observations and selects forecast weights by minimizing the unbiased estimator of MSFE. Proposed method also yields forecast confidence intervals with a given significance level what is not possible when applying other model averaging methods. In addition, out-of-sample simulation and empirical testing proves efficiency of such kind of averaging when forecasting economic processes. 相似文献

10.

Bayesian model selection for join point regression with application to age-adjusted cancer rates 总被引：3，自引：0，他引：3

Ram C. Tiwari Kathleen A. Cronin William Davis Eric J. Feuer Binbing Yu Siddhartha Chib 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(5):919-939

Summary. The method of Bayesian model selection for join point regression models is developed. Given a set of K +1 join point models M ₀, M ₁, …, M _K with 0, 1, …, K join points respec-tively, the posterior distributions of the parameters and competing models M _k are computed by Markov chain Monte Carlo simulations. The Bayes information criterion BIC is used to select the model M _k with the smallest value of BIC as the best model. Another approach based on the Bayes factor selects the model M _k with the largest posterior probability as the best model when the prior distribution of M _k is discrete uniform. Both methods are applied to analyse the observed US cancer incidence rates for some selected cancer sites. The graphs of the join point models fitted to the data are produced by using the methods proposed and compared with the method of Kim and co-workers that is based on a series of permutation tests. The analyses show that the Bayes factor is sensitive to the prior specification of the variance σ ², and that the model which is selected by BIC fits the data as well as the model that is selected by the permutation test and has the advantage of producing the posterior distribution for the join points. The Bayesian join point model and model selection method that are presented here will be integrated in the National Cancer Institute's join point software ( http://www.srab.cancer.gov/joinpoint/ ) and will be available to the public. 相似文献

11.

Particle swarm optimization-based variable selection in Poisson regression analysis via information complexity-type criteria

Haydar Koç Emre Dünder Tuba Koç Mehmet Ali Cengiz 《统计学通讯:理论与方法》2018,47(21):5298-5306

Modeling of count responses is widely performed via Poisson regression models. This paper covers the problem of variable selection in Poisson regression analysis. The basic emphasis of this paper is to present the usefulness of information complexity-based criteria for Poisson regression. Particle swarm optimization (PSO) algorithm was adopted to minimize the information criteria. A real dataset example and two simulation studies were conducted for highly collinear and lowly correlated datasets. Results demonstrate the capability of information complexity-type criteria. According to the results, information complexity-type criteria can be effectively used instead of classical criteria in count data modeling via the PSO algorithm. 相似文献

12.

The effects of different choices of order for autoregressive approximation on the Gaussian likelihood estimates for ARMA models

M. O. Salau 《Statistical Papers》2003,44(1):89-105

This paper investigates, by means of Monte Carlo simulation, the effects of different choices of order for autoregressive approximation on the fully efficient parameter estimates for autoregressive moving average models. Four order selection criteria, AIC, BIC, HQ and PKK, were compared and different model structures with varying sample sizes were used to contrast the performance of the criteria. Some asymptotic results which provide a useful guide for assessing the performance of these criteria are presented. The results of this comparison show that there are marked differences in the accuracy implied using these alternative criteria in small sample situations and that it is preferable to apply BIC criterion, which leads to greater precision of Gaussian likelihood estimates, in such cases. Implications of the findings of this study for the estimation of time series models are highlighted. 相似文献

13.

A note on model selection using information criteria for general linear models estimated using REML

Arunas Petras Verbyla 《Australian & New Zealand Journal of Statistics》2019,61(1):39-50

It is common practice to compare the fit of non‐nested models using the Akaike (AIC) or Bayesian (BIC) information criteria. The basis of these criteria is the log‐likelihood evaluated at the maximum likelihood estimates of the unknown parameters. For the general linear model (and the linear mixed model, which is a special case), estimation is usually carried out using residual or restricted maximum likelihood (REML). However, for models with different fixed effects, the residual likelihoods are not comparable and hence information criteria based on the residual likelihood cannot be used. For model selection, it is often suggested that the models are refitted using maximum likelihood to enable the criteria to be used. The first aim of this paper is to highlight that both the AIC and BIC can be used for the general linear model by using the full log‐likelihood evaluated at the REML estimates. The second aim is to provide a derivation of the criteria under REML estimation. This aim is achieved by noting that the full likelihood can be decomposed into a marginal (residual) and conditional likelihood and this decomposition then incorporates aspects of both the fixed effects and variance parameters. Using this decomposition, the appropriate information criteria for model selection of models which differ in their fixed effects specification can be derived. An example is presented to illustrate the results and code is available for analyses using the ASReml‐R package. 相似文献

14.

Forecasting Performance of Information Criteria with Many Macro Series

Clive Granger Yongil Jeon 《Journal of applied statistics》2004,31(10):1227-1240

Stock & Watson (1999) consider the relative quality of different univariate forecasting techniques. This paper extends their study on forecasting practice, comparing the forecasting performance of two popular model selection procedures, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). This paper considers several topics: how AIC and BIC choose lags in autoregressive models on actual series, how models so selected forecast relative to an AR(4) model, the effect of using a maximum lag on model selection, and the forecasting performance of combining AR(4), AIC, and BIC models with an equal weight. 相似文献

15.

On the convergence rate of model selection criteria

Ping Zhang 《统计学通讯:理论与方法》2013,42(10):2765-2775

The goal of the current paper is to compare consistent and inconsistent model selection criteria by looking at their convergence rates (to be defined in the first section). The prototypes of the two types of criteria are the AIC and BIC criterion respectively. For linear regression models with normally distributed errors, we show that the convergence rates for AIC and BIC are 0(n^-1) and 0((n log n)^-1/2) respectively. When the error distributions are unknown, the two criteria become indistinguishable, all having convergence rate O(n^-1/2). We also argue that the BIC criterion has nearly optimal convergence rate. The results partially justified some of the controversial simulation results in which inconsistent criteria seem to outperform consistent ones. 相似文献

16.

Variable Selection in Joint Location and Scale Models of the Skew-t-Normal Distribution

Liu-Cang Wu 《统计学通讯:模拟与计算》2013,42(3):615-630

Variable selection is an important issue in all regression analysis, and in this article, we investigate the simultaneous variable selection in joint location and scale models of the skew-t-normal distribution when the dataset under consideration involves heavy tail and asymmetric outcomes. We propose a unified penalized likelihood method which can simultaneously select significant variables in the location and scale models. Furthermore, the proposed variable selection method can simultaneously perform parameter estimation and variable selection in the location and scale models. With appropriate selection of the tuning parameters, we establish the consistency and the oracle property of the regularized estimators. These estimators are compared by simulation studies. 相似文献

17.

Variable selection in classification model via quadratic programming

Jun Huang Wei Wang 《统计学通讯:模拟与计算》2018,47(7):1922-1939

Variable selection is an important decision process in consumer credit scoring. However, with the rapid growth in credit industry, especially, after the rising of e-commerce, a huge amount of information on customer behavior is available to provide more informative implication of consumer credit scoring. In this study, a hybrid quadratic programming model is proposed for consumer credit scoring problems by variable selection. The proposed model is then solved with a bisection method based on Tabu search algorithm (BMTS), and the solution of this model provides alternative subsets of variables in different sizes. The final subset of variables used in consumer credit scoring model is selected based on both the size (number of variables in a subset) and predictive (classification) accuracy rate. Simulation studies are used to measure the performances of the proposed model, illustrating its effectiveness for simultaneous variable selection as well as classification. 相似文献

18.

Variable selection in finite mixture of regression models using the skew-normal distribution

Junhui Yin Liucang Wu Lin Dai 《Journal of applied statistics》2020,47(16):2941

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with asymmetric behavior. In this paper, we introduce a variable selection procedure for FMR models using the skew-normal distribution. With appropriate choice of the tuning parameters, we establish the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. To estimate the parameters of the model, a modified EM algorithm for numerical computations is developed. The methodology is illustrated through numerical experiments and a real data example. 相似文献

19.

What is the effective sample size of a spatial point process?

Ian W. Renner David I. Warton Francis K.C. Hui 《Australian & New Zealand Journal of Statistics》2021,63(1):144-158

Point process models are a natural approach for modelling data that arise as point events. In the case of Poisson counts, these may be fitted easily as a weighted Poisson regression. Point processes lack the notion of sample size. This is problematic for model selection, because various classical criteria such as the Bayesian information criterion (BIC) are a function of the sample size, n, and are derived in an asymptotic framework where n tends to infinity. In this paper, we develop an asymptotic result for Poisson point process models in which the observed number of point events, m, plays the role that sample size does in the classical regression context. Following from this result, we derive a version of BIC for point process models, and when fitted via penalised likelihood, conditions for the LASSO penalty that ensure consistency in estimation and the oracle property. We discuss challenges extending these results to the wider class of Gibbs models, of which the Poisson point process model is a special case. 相似文献

20.

Variable selection in joint location and scale models of the skew-normal distribution

Liu-Cang Wu Zhong-Zhan Zhang Deng-Ke Xu 《Journal of Statistical Computation and Simulation》2013,83(7):1266-1278

A regression model with skew-normal errors provides a useful extension for ordinary normal regression models when the data set under consideration involves asymmetric outcomes. Variable selection is an important issue in all regression analyses, and in this paper, we investigate the simultaneously variable selection in joint location and scale models of the skew-normal distribution. We propose a unified penalized likelihood method which can simultaneously select significant variables in the location and scale models. Furthermore, the proposed variable selection method can simultaneously perform parameter estimation and variable selection in the location and scale models. With appropriate selection of the tuning parameters, we establish the consistency and the oracle property of the regularized estimators. Simulation studies and a real example are used to illustrate the proposed methodologies. 相似文献