首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
2.
ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size.  相似文献   

3.
Model selection aims to find the best model. Most of the usual criteria are based on goodness of fit and parsimony and aim to maximize a transformed version of likelihood. The situation is less clear when two models are equivalent: are they close to the unknown true model or are they far from it? Based on simulations, we study the results of Vuong's test, Cox's test, AIC and BIC and the ability of these four tests to discriminate between models.  相似文献   

4.
Monte Carlo experiments are conducted to compare the Bayesian and sample theory model selection criteria in choosing the univariate probit and logit models. We use five criteria: the deviance information criterion (DIC), predictive deviance information criterion (PDIC), Akaike information criterion (AIC), weighted, and unweighted sums of squared errors. The first two criteria are Bayesian while the others are sample theory criteria. The results show that if data are balanced none of the model selection criteria considered in this article can distinguish the probit and logit models. If data are unbalanced and the sample size is large the DIC and AIC choose the correct models better than the other criteria. We show that if unbalanced binary data are generated by a leptokurtic distribution the logit model is preferred over the probit model. The probit model is preferred if unbalanced data are generated by a platykurtic distribution. We apply the model selection criteria to the probit and logit models that link the ups and downs of the returns on S&P500 to the crude oil price.  相似文献   

5.
The conceptual predictive statistic, Cp, is a widely used criterion for model selection in linear regression. Cp serves as an estimator of a discrepancy, a measure that reflects the disparity between the generating model and a fitted candidate model. This discrepancy, based on scaled squared error loss, is asymmetric: an alternate measure is obtained by reversing the roles of the two models in the definition of the measure. We propose a variant of the Cp statistic based on estimating a symmetrized version of the discrepancy targeted by Cp. We claim that the resulting criterion provides better protection against overfitting than Cp, since the symmetric discrepancy is more sensitive towards detecting overspecification than its asymmetric counterpart. We illustrate our claim by presenting simulation results. Finally, we demonstrate the practical utility of the new criterion by discussing a modeling application based on data collected in a cardiac rehabilitation program at University of Iowa Hospitals and Clinics.  相似文献   

6.
Variable selection in the presence of outliers may be performed by using a robust version of Akaike's information criterion (AIC). In this paper, explicit expressions are obtained for such criteria when S- and MM-estimators are used. The performance of these criteria is compared with the existing AIC based on M-estimators and with the classical non-robust AIC. In a simulation study and in data examples, we observe that the proposed AIC with S and MM-estimators selects more appropriate models in case outliers are present.  相似文献   

7.
This paper deals with the implementation of model selection criteria to data generated by ARMA processes. The recently introduced modified divergence information criterion is used and compared with traditional selection criteria like the Akaike information criterion (AIC) and the Schwarz information criterion (SIC). The appropriateness of the selected model is tested for one- and five-step ahead predictions with the use of the normalized mean squared forecast errors (NMSFE).  相似文献   

8.
In real‐data analysis, deciding the best subset of variables in regression models is an important problem. Akaike's information criterion (AIC) is often used in order to select variables in many fields. When the sample size is not so large, the AIC has a non‐negligible bias that will detrimentally affect variable selection. The present paper considers a bias correction of AIC for selecting variables in the generalized linear model (GLM). The GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields. In the present study, we obtain a simple expression for a bias‐corrected AIC (corrected AIC, or CAIC) in GLMs. Furthermore, we provide an ‘R’ code based on our formula. A numerical study reveals that the CAIC has better performance than the AIC for variable selection.  相似文献   

9.
Stock & Watson (1999) consider the relative quality of different univariate forecasting techniques. This paper extends their study on forecasting practice, comparing the forecasting performance of two popular model selection procedures, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). This paper considers several topics: how AIC and BIC choose lags in autoregressive models on actual series, how models so selected forecast relative to an AR(4) model, the effect of using a maximum lag on model selection, and the forecasting performance of combining AR(4), AIC, and BIC models with an equal weight.  相似文献   

10.
It is common practice to compare the fit of non‐nested models using the Akaike (AIC) or Bayesian (BIC) information criteria. The basis of these criteria is the log‐likelihood evaluated at the maximum likelihood estimates of the unknown parameters. For the general linear model (and the linear mixed model, which is a special case), estimation is usually carried out using residual or restricted maximum likelihood (REML). However, for models with different fixed effects, the residual likelihoods are not comparable and hence information criteria based on the residual likelihood cannot be used. For model selection, it is often suggested that the models are refitted using maximum likelihood to enable the criteria to be used. The first aim of this paper is to highlight that both the AIC and BIC can be used for the general linear model by using the full log‐likelihood evaluated at the REML estimates. The second aim is to provide a derivation of the criteria under REML estimation. This aim is achieved by noting that the full likelihood can be decomposed into a marginal (residual) and conditional likelihood and this decomposition then incorporates aspects of both the fixed effects and variance parameters. Using this decomposition, the appropriate information criteria for model selection of models which differ in their fixed effects specification can be derived. An example is presented to illustrate the results and code is available for analyses using the ASReml‐R package.  相似文献   

11.
When selecting a model, robustness is a desirable property. However, most model selection criteria that are based on the Kullback–Leibler divergence tend to have reduced performance when the data are contaminated by outliers. In this paper, we derive and investigate a family of criteria that generalize the Akaike information criterion (AIC). When applied to a polynomial regression model, in the non contaminated case, the performance of this family of criteria is asymptotically equal to that of the AIC. Moreover, the proposed criteria tend to maintain sufficient levels of performance even in the presence of outliers.  相似文献   

12.
SUMMARY We compare properties of parameter estimators under Akaike information criterion (AIC) and 'consistent' AIC (CAIC) model selection in a nested sequence of open population capture-recapture models. These models consist of product multinomials, where the cell probabilities are parameterized in terms of survival ( ) and capture ( p ) i i probabilities for each time interval i . The sequence of models is derived from 'treatment' effects that might be (1) absent, model H ; (2) only acute, model H ; or (3) acute and 0 2 p chronic, lasting several time intervals, model H . Using a 35 factorial design, 1000 3 repetitions were simulated for each of 243 cases. The true number of parameters ranged from 7 to 42, and the sample size ranged from approximately 470 to 55 000 per case. We focus on the quality of the inference about the model parameters and model structure that results from the two selection criteria. We use achieved confidence interval coverage as an integrating metric to judge what constitutes a 'properly parsimonious' model, and contrast the performance of these two model selection criteria for a wide range of models, sample sizes, parameter values and study interval lengths. AIC selection resulted in models in which the parameters were estimated with relatively little bias. However, these models exhibited asymptotic sampling variances that were somewhat too small, and achieved confidence interval coverage that was somewhat below the nominal level. In contrast, CAIC-selected models were too simple, the parameter estimators were often substantially biased, the asymptotic sampling variances were substantially too small and the achieved coverage was often substantially below the nominal level. An example case illustrates a pattern: with 20 capture occasions, 300 previously unmarked animals are released at each occasion, and the survival and capture probabilities in the control group on each occasion were 0.9 and 0.8 respectively using model H . There was a strong acute treatment effect 3 on the first survival ( ) and first capture probability ( p ), and smaller, chronic effects 1 2 on the second and third survival probabilities ( and ) as well as on the second capture 2 3 probability ( p ); the sample size for each repetition was approximately 55 000. CAIC 3 selection led to a model with exactly these effects in only nine of the 1000 repetitions, compared with 467 times under AIC selection. Under CAIC selection, even the two acute effects were detected only 555 times, compared with 998 for AIC selection. AIC selection exhibited a balance between underfitted and overfitted models (270 versus 263), while CAIC tended strongly to select underfitted models. CAIC-selected models were overly parsimonious and poor as a basis for statistical inferences about important model parameters or structure. We recommend the use of the AIC and not the CAIC for analysis and inference from capture-recapture data sets.  相似文献   

13.
To measure the distance between a robust function evaluated under the true regression model and under a fitted model, we propose generalized Kullback–Leibler information. Using this generalization we have developed three robust model selection criteria, AICR*, AICCR* and AICCR, that allow the selection of candidate models that not only fit the majority of the data but also take into account non-normally distributed errors. The AICR* and AICCR criteria can unify most existing Akaike information criteria; three examples of such unification are given. Simulation studies are presented to illustrate the relative performance of each criterion.  相似文献   

14.
In linear mixed‐effects (LME) models, if a fitted model has more random‐effect terms than the true model, a regularity condition required in the asymptotic theory may not hold. In such cases, the marginal Akaike information criterion (AIC) is positively biased for (?2) times the expected log‐likelihood. The asymptotic bias of the maximum log‐likelihood as an estimator of the expected log‐likelihood is evaluated for LME models with balanced design in the context of parameter‐constrained models. Moreover, bias‐reduced marginal AICs for LME models based on a Monte Carlo method are proposed. The performance of the proposed criteria is compared with existing criteria by using example data and by a simulation study. It was found that the bias of the proposed criteria was smaller than that of the existing marginal AIC when a larger model was fitted and that the probability of choosing a smaller model incorrectly was decreased.  相似文献   

15.
Slack-variable models are compared against Scheffé's polynomial model for mixture experiments. The notion of model equivalence and the use of various diagnostic measures provide effective tools in making such comparisons, particularly when the experimental region is highly constrained. It is demonstrated that the choice of the best fitting model, through variable selection, depends on which mixture component is selected as a slack variable, and on the size of the fitted model. In addition, the equivalence of two well-known representations of a complete mixture model is shown to be valid. Two numerical examples are presented.  相似文献   

16.
This paper derives Akaike information criterion (AIC), corrected AIC, the Bayesian information criterion (BIC) and Hannan and Quinn’s information criterion for approximate factor models assuming a large number of cross-sectional observations and studies the consistency properties of these information criteria. It also reports extensive simulation results comparing the performance of the extant and new procedures for the selection of the number of factors. The simulation results show the di?culty of determining which criterion performs best. In practice, it is advisable to consider several criteria at the same time, especially Hannan and Quinn’s information criterion, Bai and Ng’s ICp2 and BIC3, and Onatski’s and Ahn and Horenstein’s eigenvalue-based criteria. The model-selection criteria considered in this paper are also applied to Stock and Watson’s two macroeconomic data sets. The results differ considerably depending on the model-selection criterion in use, but evidence suggesting five factors for the first data and five to seven factors for the second data is obtainable.  相似文献   

17.
The analysis of residuals may reveal various functional forms suitable for the regression model. In this paper, we investigate some selection criteria for selecting important regression variables. In doing so, we use statistical selection and ranking procedures. Thus, we derive an appropriate criterion to measure the influence and bias for the reduced models. We show that the reduced models are based on some noncentrality parameters which provide a measure of goodness of fit for the fitted models. In this paper, we also discuss the relationships of influence diagnostics and the statistic proposed earlier by Gupta and Huang (J. Statist. Plann. Inference 20 (1988) 155–167). We introduce a new measure for detecting influential data as an alternative to Cook's measure.  相似文献   

18.
This paper is concerned with the problem of constructing a good predictive distribution relative to the Kullback–Leibler information in a linear regression model. The problem is equivalent to the simultaneous estimation of regression coefficients and error variance in terms of a complicated risk, which yields a new challenging issue in a decision-theoretic framework. An estimator of the variance is incorporated here into a loss for estimating the regression coefficients. Several estimators of the variance and of the regression coefficients are proposed and shown to improve on usual benchmark estimators both analytically and numerically. Finally, the prediction problem of a distribution is noted to be related to an information criterion for model selection like the Akaike information criterion (AIC). Thus, several AIC variants are obtained based on proposed and improved estimators and are compared numerically with AIC as model selection procedures.  相似文献   

19.
The goal of the current paper is to compare consistent and inconsistent model selection criteria by looking at their convergence rates (to be defined in the first section). The prototypes of the two types of criteria are the AIC and BIC criterion respectively. For linear regression models with normally distributed errors, we show that the convergence rates for AIC and BIC are 0(n-1) and 0((n log n)-1/2) respectively. When the error distributions are unknown, the two criteria become indistinguishable, all having convergence rate O(n-1/2). We also argue that the BIC criterion has nearly optimal convergence rate. The results partially justified some of the controversial simulation results in which inconsistent criteria seem to outperform consistent ones.  相似文献   

20.
Clustered binary data are common in medical research and can be fitted to the logistic regression model with random effects which belongs to a wider class of models called the generalized linear mixed model. The likelihood-based estimation of model parameters often has to handle intractable integration which leads to several estimation methods to overcome such difficulty. The penalized quasi-likelihood (PQL) method is the one that is very popular and computationally efficient in most cases. The expectation–maximization (EM) algorithm allows to estimate maximum-likelihood estimates, but requires to compute possibly intractable integration in the E-step. The variants of the EM algorithm to evaluate the E-step are introduced. The Monte Carlo EM (MCEM) method computes the E-step by approximating the expectation using Monte Carlo samples, while the Modified EM (MEM) method computes the E-step by approximating the expectation using the Laplace's method. All these methods involve several steps of approximation so that corresponding estimates of model parameters contain inevitable errors (large or small) induced by approximation. Understanding and quantifying discrepancy theoretically is difficult due to the complexity of approximations in each method, even though the focus is on clustered binary data. As an alternative competing computational method, we consider a non-parametric maximum-likelihood (NPML) method as well. We review and compare the PQL, MCEM, MEM and NPML methods for clustered binary data via simulation study, which will be useful for researchers when choosing an estimation method for their analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号