首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
In real‐data analysis, deciding the best subset of variables in regression models is an important problem. Akaike's information criterion (AIC) is often used in order to select variables in many fields. When the sample size is not so large, the AIC has a non‐negligible bias that will detrimentally affect variable selection. The present paper considers a bias correction of AIC for selecting variables in the generalized linear model (GLM). The GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields. In the present study, we obtain a simple expression for a bias‐corrected AIC (corrected AIC, or CAIC) in GLMs. Furthermore, we provide an ‘R’ code based on our formula. A numerical study reveals that the CAIC has better performance than the AIC for variable selection.  相似文献   

2.
Two different forms of Akaike's information criterion (AIC) are compared for selecting the smooth terms in penalized spline additive mixed models. The conditional AIC (cAIC) has been used traditionally as a criterion for both estimating penalty parameters and selecting covariates in smoothing, and is based on the conditional likelihood given the smooth mean and on the effective degrees of freedom for a model fit. By comparison, the marginal AIC (mAIC) is based on the marginal likelihood from the mixed‐model formulation of penalized splines which has recently become popular for estimating smoothing parameters. To the best of the authors' knowledge, the use of mAIC for selecting covariates for smoothing in additive models is new. In the competing models considered for selection, covariates may have a nonlinear effect on the response, with the possibility of group‐specific curves. Simulations are used to compare the performance of cAIC and mAIC in model selection settings that have correlated and hierarchical smooth terms. In moderately large samples, both formulations of AIC perform extremely well at detecting the function that generated the data. The mAIC does better for simple functions, whereas the cAIC is more sensitive to detecting a true model that has complex and hierarchical terms.  相似文献   

3.
We derive two types of Akaike information criterion (AIC)‐like model‐selection formulae for the semiparametric pseudo‐maximum likelihood procedure. We first adapt the arguments leading to the original AIC formula, related to empirical estimation of a certain Kullback–Leibler information distance. This gives a significantly different formula compared with the AIC, which we name the copula information criterion. However, we show that such a model‐selection procedure cannot exist for copula models with densities that grow very fast near the edge of the unit cube. This problem affects most popular copula models. We then derive what we call the cross‐validation copula information criterion, which exists under weak conditions and is a first‐order approximation to exact cross validation. This formula is very similar to the standard AIC formula but has slightly different motivation. A brief illustration with real data is given.  相似文献   

4.
ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size.  相似文献   

5.
6.
Stock & Watson (1999) consider the relative quality of different univariate forecasting techniques. This paper extends their study on forecasting practice, comparing the forecasting performance of two popular model selection procedures, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). This paper considers several topics: how AIC and BIC choose lags in autoregressive models on actual series, how models so selected forecast relative to an AR(4) model, the effect of using a maximum lag on model selection, and the forecasting performance of combining AR(4), AIC, and BIC models with an equal weight.  相似文献   

7.
This paper derives Akaike information criterion (AIC), corrected AIC, the Bayesian information criterion (BIC) and Hannan and Quinn’s information criterion for approximate factor models assuming a large number of cross-sectional observations and studies the consistency properties of these information criteria. It also reports extensive simulation results comparing the performance of the extant and new procedures for the selection of the number of factors. The simulation results show the di?culty of determining which criterion performs best. In practice, it is advisable to consider several criteria at the same time, especially Hannan and Quinn’s information criterion, Bai and Ng’s ICp2 and BIC3, and Onatski’s and Ahn and Horenstein’s eigenvalue-based criteria. The model-selection criteria considered in this paper are also applied to Stock and Watson’s two macroeconomic data sets. The results differ considerably depending on the model-selection criterion in use, but evidence suggesting five factors for the first data and five to seven factors for the second data is obtainable.  相似文献   

8.
Model selection criteria are frequently developed by constructing estimators of discrepancy measures that assess the disparity between the 'true' model and a fitted approximating model. The Akaike information criterion (AIC) and its variants result from utilizing Kullback's directed divergence as the targeted discrepancy. The directed divergence is an asymmetric measure of separation between two statistical models, meaning that an alternative directed divergence can be obtained by reversing the roles of the two models in the definition of the measure. The sum of the two directed divergences is Kullback's symmetric divergence. In the framework of linear models, a comparison of the two directed divergences reveals an important distinction between the measures. When used to evaluate fitted approximating models that are improperly specified, the directed divergence which serves as the basis for AIC is more sensitive towards detecting overfitted models, whereas its counterpart is more sensitive towards detecting underfitted models. Since the symmetric divergence combines the information in both measures, it functions as a gauge of model disparity which is arguably more balanced than either of its individual components. With this motivation, the paper proposes a new class of criteria for linear model selection based on targeting the symmetric divergence. The criteria can be regarded as analogues of AIC and two of its variants: 'corrected' AIC or AICc and 'modified' AIC or MAIC. The paper examines the selection tendencies of the new criteria in a simulation study and the results indicate that they perform favourably when compared to their AIC analogues.  相似文献   

9.
SUMMARY We compare properties of parameter estimators under Akaike information criterion (AIC) and 'consistent' AIC (CAIC) model selection in a nested sequence of open population capture-recapture models. These models consist of product multinomials, where the cell probabilities are parameterized in terms of survival ( ) and capture ( p ) i i probabilities for each time interval i . The sequence of models is derived from 'treatment' effects that might be (1) absent, model H ; (2) only acute, model H ; or (3) acute and 0 2 p chronic, lasting several time intervals, model H . Using a 35 factorial design, 1000 3 repetitions were simulated for each of 243 cases. The true number of parameters ranged from 7 to 42, and the sample size ranged from approximately 470 to 55 000 per case. We focus on the quality of the inference about the model parameters and model structure that results from the two selection criteria. We use achieved confidence interval coverage as an integrating metric to judge what constitutes a 'properly parsimonious' model, and contrast the performance of these two model selection criteria for a wide range of models, sample sizes, parameter values and study interval lengths. AIC selection resulted in models in which the parameters were estimated with relatively little bias. However, these models exhibited asymptotic sampling variances that were somewhat too small, and achieved confidence interval coverage that was somewhat below the nominal level. In contrast, CAIC-selected models were too simple, the parameter estimators were often substantially biased, the asymptotic sampling variances were substantially too small and the achieved coverage was often substantially below the nominal level. An example case illustrates a pattern: with 20 capture occasions, 300 previously unmarked animals are released at each occasion, and the survival and capture probabilities in the control group on each occasion were 0.9 and 0.8 respectively using model H . There was a strong acute treatment effect 3 on the first survival ( ) and first capture probability ( p ), and smaller, chronic effects 1 2 on the second and third survival probabilities ( and ) as well as on the second capture 2 3 probability ( p ); the sample size for each repetition was approximately 55 000. CAIC 3 selection led to a model with exactly these effects in only nine of the 1000 repetitions, compared with 467 times under AIC selection. Under CAIC selection, even the two acute effects were detected only 555 times, compared with 998 for AIC selection. AIC selection exhibited a balance between underfitted and overfitted models (270 versus 263), while CAIC tended strongly to select underfitted models. CAIC-selected models were overly parsimonious and poor as a basis for statistical inferences about important model parameters or structure. We recommend the use of the AIC and not the CAIC for analysis and inference from capture-recapture data sets.  相似文献   

10.
Autoregressive model is a popular method for analysing the time dependent data, where selection of order parameter is imperative. Two commonly used selection criteria are the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), which are known to suffer the potential problems regarding overfit and underfit, respectively. To our knowledge, there does not exist a criterion in the literature that can satisfactorily perform under various situations. Therefore, in this paper, we focus on forecasting the future values of an observed time series and propose an adaptive idea to combine the advantages of AIC and BIC but to mitigate their weaknesses based on the concept of generalized degrees of freedom. Instead of applying a fixed criterion to select the order parameter, we propose an approximately unbiased estimator of mean squared prediction errors based on a data perturbation technique for fairly comparing between AIC and BIC. Then use the selected criterion to determine the final order parameter. Some numerical experiments are performed to show the superiority of the proposed method and a real data set of the retail price index of China from 1952 to 2008 is also applied for illustration.  相似文献   

11.
In a recent volume of this journal, Holden [Testing the normality assumption in the Tobit Model, J. Appl. Stat. 31 (2004) pp. 521–532] presents Monte Carlo evidence comparing several tests for departures from normality in the Tobit Model. This study adds to the work of Holden by considering another test, and several information criteria, for detecting departures from normality in the Tobit Model. The test given here is a modified likelihood ratio statistic based on a partially adaptive estimator of the Censored Regression Model using the approach of Caudill [A partially adaptive estimator for the Censored Regression Model based on a mixture of normal distributions, Working Paper, Department of Economics, Auburn University, 2007]. The information criteria examined include the Akaike’s Information Criterion (AIC), the Consistent AIC (CAIC), the Bayesian information criterion (BIC), and the Akaike’s BIC (ABIC). In terms of fewest ‘rejections’ of a true null, the best performance is exhibited by the CAIC and the BIC, although, like some of the statistics examined by Holden, there are computational difficulties with each.  相似文献   

12.
Monte Carlo experiments are conducted to compare the Bayesian and sample theory model selection criteria in choosing the univariate probit and logit models. We use five criteria: the deviance information criterion (DIC), predictive deviance information criterion (PDIC), Akaike information criterion (AIC), weighted, and unweighted sums of squared errors. The first two criteria are Bayesian while the others are sample theory criteria. The results show that if data are balanced none of the model selection criteria considered in this article can distinguish the probit and logit models. If data are unbalanced and the sample size is large the DIC and AIC choose the correct models better than the other criteria. We show that if unbalanced binary data are generated by a leptokurtic distribution the logit model is preferred over the probit model. The probit model is preferred if unbalanced data are generated by a platykurtic distribution. We apply the model selection criteria to the probit and logit models that link the ups and downs of the returns on S&P500 to the crude oil price.  相似文献   

13.
In the problem of selecting variables in a multivariate linear regression model, we derive new Bayesian information criteria based on a prior mixing a smooth distribution and a delta distribution. Each of them can be interpreted as a fusion of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Inheriting their asymptotic properties, our information criteria are consistent in variable selection in both the large-sample and the high-dimensional asymptotic frameworks. In numerical simulations, variable selection methods based on our information criteria choose the true set of variables with high probability in most cases.  相似文献   

14.
We derive and investigate a variant of AIC, the Akaike information criterion, for model selection in settings where the observed data is incomplete. Our variant is based on the motivation provided for the PDIO (‘predictive divergence for incomplete observation models’) criterion of Shimodaira (1994, in: Selecting Models from Data: Artificial Intelligence and Statistics IV, Lecture Notes in Statistics, vol. 89, Springer, New York, pp. 21–29). However, our variant differs from PDIO in its ‘goodness-of-fit’ term. Unlike AIC and PDIO, which require the computation of the observed-data empirical log-likelihood, our criterion can be evaluated using only complete-data tools, readily available through the EM algorithm and the SEM (‘supplemented’ EM) algorithm of Meng and Rubin (Journal of the American Statistical Association 86 (1991) 899–909). We compare the performance of our AIC variant to that of both AIC and PDIO in simulations where the data being modeled contains missing values. The results indicate that our criterion is less prone to overfitting than AIC and less prone to underfitting than PDIO.  相似文献   

15.
In statistical analysis, one of the most important subjects is to select relevant exploratory variables that perfectly explain the dependent variable. Variable selection methods are usually performed within regression analysis. Variable selection is implemented so as to minimize the information criteria (IC) in regression models. Information criteria directly affect the power of prediction and the estimation of selected models. There are numerous information criteria in literature such as Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC). These criteria are modified for to improve the performance of the selected models. BIC is extended with alternative modifications towards the usage of prior and information matrix. Information matrix-based BIC (IBIC) and scaled unit information prior BIC (SPBIC) are efficient criteria for this modification. In this article, we proposed a combination to perform variable selection via differential evolution (DE) algorithm for minimizing IBIC and SPBIC in linear regression analysis. We concluded that these alternative criteria are very useful for variable selection. We also illustrated the efficiency of this combination with various simulation and application studies.  相似文献   

16.
In this article, we propose a new empirical information criterion (EIC) for model selection which penalizes the likelihood of the data by a non-linear function of the number of parameters in the model. It is designed to be used where there are a large number of time series to be forecast. However, a bootstrap version of the EIC can be used where there is a single time series to be forecast. The EIC provides a data-driven model selection tool that can be tuned to the particular forecasting task.

We compare the EIC with other model selection criteria including Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC). The comparisons show that for the M3 forecasting competition data, the EIC outperforms both the AIC and BIC, particularly for longer forecast horizons. We also compare the criteria on simulated data and find that the EIC does better than existing criteria in that case also.  相似文献   

17.
This paper presents an extension of mean-squared forecast error (MSFE) model averaging for integrating linear regression models computed on data frames of various lengths. Proposed method is considered to be a preferable alternative to best model selection by various efficiency criteria such as Bayesian information criterion (BIC), Akaike information criterion (AIC), F-statistics and mean-squared error (MSE) as well as to Bayesian model averaging (BMA) and naïve simple forecast average. The method is developed to deal with possibly non-nested models having different number of observations and selects forecast weights by minimizing the unbiased estimator of MSFE. Proposed method also yields forecast confidence intervals with a given significance level what is not possible when applying other model averaging methods. In addition, out-of-sample simulation and empirical testing proves efficiency of such kind of averaging when forecasting economic processes.  相似文献   

18.
Selection of a parsimonious model as a basis for statistical inference from capture-recapture data is critical, especially when using open models in the analysis of multiple, interrelated data sets (e.g. males and females, with two to three age classes, over three to five areas and 10-15 years). The global (i.e. most general) model for such data sets might contain hundreds of survival and recapture parameters. Here, we focus on a series of nested models of the Cormack-Jolly-Seber type wherein the likelihood arises from products of multinomial distributions whose cell probabilities are reparameterized in terms of survival ( phi ) and mean capture ( p ) probabilities. This paper presents numerical results on two information-theoretic methods for model selection when the capture probabilities are heterogeneous over individual animals: Akaike's Information Criterion (AIC) and a dimension-consistent criterion (CAIC), derived from a Bayesian viewpoint. Quality of model selection was evaluated based on the relative Euclidian distance between standardized theta and theta (parameter theta is vector-valued and contains the survival ( phi ) and mean capture ( p ) probabilities); this quantity (RSS = sigma{(theta i - theta i )/ theta i } 2 ) is a sum of squared bias and variance. Thus, the quality of inference (RSS) was judged by comparing the performance of the two information criteria and the use of the true model (used to generate the data), in relation to the model that provided the smallest RSS. We found that heterogeneity in the capture probabilities had a negligible effect on model selection using AIC or CAIC. Model size increased as sample size increased with both AIC- and CAIC-selected models.  相似文献   

19.
Variable selection in the presence of outliers may be performed by using a robust version of Akaike's information criterion (AIC). In this paper, explicit expressions are obtained for such criteria when S- and MM-estimators are used. The performance of these criteria is compared with the existing AIC based on M-estimators and with the classical non-robust AIC. In a simulation study and in data examples, we observe that the proposed AIC with S and MM-estimators selects more appropriate models in case outliers are present.  相似文献   

20.
In linear mixed‐effects (LME) models, if a fitted model has more random‐effect terms than the true model, a regularity condition required in the asymptotic theory may not hold. In such cases, the marginal Akaike information criterion (AIC) is positively biased for (?2) times the expected log‐likelihood. The asymptotic bias of the maximum log‐likelihood as an estimator of the expected log‐likelihood is evaluated for LME models with balanced design in the context of parameter‐constrained models. Moreover, bias‐reduced marginal AICs for LME models based on a Monte Carlo method are proposed. The performance of the proposed criteria is compared with existing criteria by using example data and by a simulation study. It was found that the bias of the proposed criteria was smaller than that of the existing marginal AIC when a larger model was fitted and that the probability of choosing a smaller model incorrectly was decreased.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号