首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
Model choice is one of the most crucial aspect in any statistical data analysis. It is well known that most models are just an approximation to the true data-generating process but among such model approximations, it is our goal to select the ‘best’ one. Researchers typically consider a finite number of plausible models in statistical applications, and the related statistical inference depends on the chosen model. Hence, model comparison is required to identify the ‘best’ model among several such candidate models. This article considers the problem of model selection for spatial data. The issue of model selection for spatial models has been addressed in the literature by the use of traditional information criteria-based methods, even though such criteria have been developed based on the assumption of independent observations. We evaluate the performance of some of the popular model selection critera via Monte Carlo simulation experiments using small to moderate samples. In particular, we compare the performance of some of the most popular information criteria such as Akaike information criterion (AIC), Bayesian information criterion, and corrected AIC in selecting the true model. The ability of these criteria to select the correct model is evaluated under several scenarios. This comparison is made using various spatial covariance models ranging from stationary isotropic to nonstationary models.  相似文献   

2.
Abstract

In the model selection problem, the consistency of the selection criterion has been often discussed. This paper derives a family of criteria based on a robust statistical divergence family by using a generalized Bayesian procedure. The proposed family can achieve both consistency and robustness at the same time since it has good performance with respect to contamination by outliers under appropriate circumstances. We show the selection accuracy of the proposed criterion family compared with the conventional methods through numerical experiments.  相似文献   

3.
We study bandwidth selection for a class of semi-parametric models. The proper choice of optimal bandwidth minimizes the prediction errors of the model. We provide detailed derivation of our procedure and the corresponding computation algorithms. Our proposed method simplifies the computation of the cross-validation criteria and facilitates more complicated inference and analysis in practice. A data set from Wisconsin Diabetes Registry has been analysed as an illustration.  相似文献   

4.
Predictive criteria, including the adjusted squared multiple correlation coefficient, the adjusted concordance correlation coefficient, and the predictive error sum of squares, are available for model selection in the linear mixed model. These criteria all involve some sort of comparison of observed values and predicted values, adjusted for the complexity of the model. The predicted values can be conditional on the random effects or marginal, i.e., based on averages over the random effects. These criteria have not been investigated for model selection success.

We used simulations to investigate selection success rates for several versions of these predictive criteria as well as several versions of Akaike's information criterion and the Bayesian information criterion, and the pseudo F-test. The simulations involved the simple scenario of selection of a fixed parameter when the covariance structure is known.

Several variance–covariance structures were used. For compound symmetry structures, higher success rates for the predictive criteria were obtained when marginal rather than conditional predicted values were used. Information criteria had higher success rates when a certain term (normally left out in SAS MIXED computations) was included in the criteria. Various penalty functions were used in the information criteria, but these had little effect on success rates. The pseudo F-test performed as expected. For the autoregressive with random effects structure, the results were the same except that success rates were higher for the conditional version of the predictive error sum of squares.

Characteristics of the data, such as the covariance structure, parameter values, and sample size, greatly impacted performance of various model selection criteria. No one criterion was consistently better than the others.  相似文献   

5.
This article deals with model comparison as an essential part of generalized linear modelling in the presence of covariates missing not at random (MNAR). We provide an evaluation of the performances of some of the popular model selection criteria, particularly of deviance information criterion (DIC) and weighted L (WL) measure, for comparison among a set of candidate MNAR models. In addition, we seek to provide deviance and quadratic loss-based model selection criteria with alternative penalty terms targeting directly the MNAR models. This work is motivated by the need in the literature to understand the performances of these important model selection criteria for comparison among a set of MNAR models. A Monte Carlo simulation experiment is designed to assess the finite sample performances of these model selection criteria in the context of interest under different scenarios for missingness amounts. Some naturally driven DIC and WL extensions are also discussed and evaluated.  相似文献   

6.
7.
Monte Carlo experiments are conducted to compare the Bayesian and sample theory model selection criteria in choosing the univariate probit and logit models. We use five criteria: the deviance information criterion (DIC), predictive deviance information criterion (PDIC), Akaike information criterion (AIC), weighted, and unweighted sums of squared errors. The first two criteria are Bayesian while the others are sample theory criteria. The results show that if data are balanced none of the model selection criteria considered in this article can distinguish the probit and logit models. If data are unbalanced and the sample size is large the DIC and AIC choose the correct models better than the other criteria. We show that if unbalanced binary data are generated by a leptokurtic distribution the logit model is preferred over the probit model. The probit model is preferred if unbalanced data are generated by a platykurtic distribution. We apply the model selection criteria to the probit and logit models that link the ups and downs of the returns on S&P500 to the crude oil price.  相似文献   

8.
The main focus of our paper is to compare the performance of different model selection criteria used for multivariate reduced rank time series. We consider one of the most commonly used reduced rank model, that is, the reduced rank vector autoregression (RRVAR (p, r)) introduced by Velu et al. [Reduced rank models for multiple time series. Biometrika. 1986;7(31):105–118]. In our study, the most popular model selection criteria are included. The criteria are divided into two groups, that is, simultaneous selection and two-step selection criteria, accordingly. Methods from the former group select both an autoregressive order p and a rank r simultaneously, while in the case of two-step criteria, first an optimal order p is chosen (using model selection criteria intended for the unrestricted VAR model) and then an optimal rank r of coefficient matrices is selected (e.g. by means of sequential testing). Considered model selection criteria include well-known information criteria (such as Akaike information criterion, Schwarz criterion, Hannan–Quinn criterion, etc.) as well as widely used sequential tests (e.g. the Bartlett test) and the bootstrap method. An extensive simulation study is carried out in order to investigate the efficiency of all model selection criteria included in our study. The analysis takes into account 34 methods, including 6 simultaneous methods and 28 two-step approaches, accordingly. In order to carefully analyse how different factors affect performance of model selection criteria, we consider over 150 simulation settings. In particular, we investigate the influence of the following factors: time series dimension, different covariance structure, different level of correlation among components and different level of noise (variance). Moreover, we analyse the prediction accuracy concerned with the application of the RRVAR model and compare it with results obtained for the unrestricted vector autoregression. In this paper, we also present a real data application of model selection criteria for the RRVAR model using the Polish macroeconomic time series data observed in the period 1997–2007.  相似文献   

9.
Several estimators of squared prediction error have been suggested for use in model and bandwidth selection problems. Among these are cross-validation, generalized cross-validation and a number of related techniques based on the residual sum of squares. For many situations with squared error loss, e.g. nonparametric smoothing, these estimators have been shown to be asymptotically optimal in the sense that in large samples the estimator minimizing the selection criterion also minimizes squared error loss. However, cross-validation is known not to be asymptotically optimal for some `easy' location problems. We consider selection criteria based on estimators of squared prediction risk for choosing between location estimators. We show that criteria based on adjusted residual sum of squares are not asymptotically optimal for choosing between asymptotically normal location estimators that converge at rate n 1/2but are when the rate of convergence is slower. We also show that leave-one-out cross-validation is not asymptotically optimal for choosing between √ n -differentiable statistics but leave- d -out cross-validation is optimal when d ∞ at the appropriate rate.  相似文献   

10.
The purpose of this paper is threefold. First, we obtain the asymptotic properties of the modified model selection criteria proposed by Hurvich et al. (1990. Improved estimators of Kullback-Leibler information for autoregressive model selection in small samples. Biometrika 77, 709–719) for autoregressive models. Second, we provide some highlights on the better performance of this modified criteria. Third, we extend the modification introduced by these authors to model selection criteria commonly used in the class of self-exciting threshold autoregressive (SETAR) time series models. We show the improvements of the modified criteria in their finite sample performance. In particular, for small and medium sample size the frequency of selecting the true model improves for the consistent criteria and the root mean square error (RMSE) of prediction improves for the efficient criteria. These results are illustrated via simulation with SETAR models in which we assume that the threshold and the parameters are unknown.  相似文献   

11.
In this article, we propose a new empirical information criterion (EIC) for model selection which penalizes the likelihood of the data by a non-linear function of the number of parameters in the model. It is designed to be used where there are a large number of time series to be forecast. However, a bootstrap version of the EIC can be used where there is a single time series to be forecast. The EIC provides a data-driven model selection tool that can be tuned to the particular forecasting task.

We compare the EIC with other model selection criteria including Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC). The comparisons show that for the M3 forecasting competition data, the EIC outperforms both the AIC and BIC, particularly for longer forecast horizons. We also compare the criteria on simulated data and find that the EIC does better than existing criteria in that case also.  相似文献   

12.
Summary. The classical approach to statistical analysis is usually based upon finding values for model parameters that maximize the likelihood function. Model choice in this context is often also based on the likelihood function, but with the addition of a penalty term for the number of parameters. Though models may be compared pairwise by using likelihood ratio tests for example, various criteria such as the Akaike information criterion have been proposed as alternatives when multiple models need to be compared. In practical terms, the classical approach to model selection usually involves maximizing the likelihood function associated with each competing model and then calculating the corresponding criteria value(s). However, when large numbers of models are possible, this quickly becomes infeasible unless a method that simultaneously maximizes over both parameter and model space is available. We propose an extension to the traditional simulated annealing algorithm that allows for moves that not only change parameter values but also move between competing models. This transdimensional simulated annealing algorithm can therefore be used to locate models and parameters that minimize criteria such as the Akaike information criterion, but within a single algorithm, removing the need for large numbers of simulations to be run. We discuss the implementation of the transdimensional simulated annealing algorithm and use simulation studies to examine its performance in realistically complex modelling situations. We illustrate our ideas with a pedagogic example based on the analysis of an autoregressive time series and two more detailed examples: one on variable selection for logistic regression and the other on model selection for the analysis of integrated recapture–recovery data.  相似文献   

13.
We address the issue of model selection in beta regressions with varying dispersion. The model consists of two submodels, namely: for the mean and for the dispersion. Our focus is on the selection of the covariates for each submodel. Our Monte Carlo evidence reveals that the joint selection of covariates for the two submodels is not accurate in finite samples. We introduce two new model selection criteria that explicitly account for varying dispersion and propose a fast two step model selection scheme which is considerably more accurate and is computationally less costly than usual joint model selection. Monte Carlo evidence is presented and discussed. We also present the results of an empirical application.  相似文献   

14.
In this article, we consider the order estimation of autoregressive models with incomplete data using the expectation–maximization (EM) algorithm-based information criteria. The criteria take the form of a penalization of the conditional expectation of the log-likelihood. The evaluation of the penalization term generally involves numerical differentiation and matrix inversion. We introduce a simplification of the penalization term for autoregressive model selection and we propose a penalty factor based on a resampling procedure in the criteria formula. The simulation results show the improvements yielded by the proposed method when compared with the classical information criteria for model selection with incomplete data.  相似文献   

15.
In this we consider the problem of model selection for infinite variance time series. We introduce a group of model selection critera based on a general loss function Ψ. This family includes various generalizations of predictive least square and AIC Parameter estimation is carried out using Ψ. We use two loss functions commonly used in robust estimation and show that certain criteria out perform the conventional approach based on least squares or Yule-Walker estima­tion for heavy tailed innovations. Our conclusions are based on a comprehensive study of the performance of competing criteria for a wide selection of AR(2) models. We also consider the performance of these techniques when the ‘true’ model is not contained in the family of candidate models.  相似文献   

16.
There has been significant new work published recently on the subject of model selection. Notably Rissanen (1986, 1987, 1988) has introduced new criteria based on the notion of stochastic complexity and Hurvich and Tsai(1989) have introduced a bias corrected version of Akaike's information criterion. In this paper, a Monte Carlo study is conducted to evaluate the relative performance of these new model selection criteria against the commonly used alternatives. In addition, we compare the performance of all the criteria in a number of situations not considered in earlier studies: robustness to distributional assumptions, collinearity among regressors, and non-stationarity in a time series. The evaluation is based on the number of times the correct model is chosen and the out of sample prediction error. The results of this study suggest that Rissanen's criteria are sensitive to the assumptions and choices that need to made in their application, and so are sometimes unreliable. While many of the criteria often perform satisfactorily, across experiments the Schwartz Bayesian Information Criterion (and the related Bayesian Estimation Criterion of Geweke-Meese) seem to consistently outperfom the other alternatives considered.  相似文献   

17.
We deal with parametric inference and selection problems for jump components in discretely observed diffusion processes with jumps. We prepare several competing parametric models for the Lévy measure that might be misspecified, and select the best model from the aspect of information criteria. We construct quasi-information criteria (QIC), which are approximations of the information criteria based on continuous observations.  相似文献   

18.
ABSTRACT

In this paper, we investigate the consistency of the Expectation Maximization (EM) algorithm-based information criteria for model selection with missing data. The criteria correspond to a penalization of the conditional expectation of the complete data log-likelihood given the observed data and with respect to the missing data conditional density. We present asymptotic properties related to maximum likelihood estimation in the presence of incomplete data and we provide sufficient conditions for the consistency of model selection by minimizing the information criteria. Their finite sample performance is illustrated through simulation and real data studies.  相似文献   

19.
There has been significant new work published recently on the subject of model selection. Notably Rissanen (1986, 1987, 1988) has introduced new criteria based on the notion of stochastic complexity and Hurvich and Tsai(1989) have introduced a bias corrected version of Akaike's information criterion. In this paper, a Monte Carlo study is conducted to evaluate the relative performance of these new model selection criteria against the commonly used alternatives. In addition, we compare the performance of all the criteria in a number of situations not considered in earlier studies: robustness to distributional assumptions, collinearity among regressors, and non-stationarity in a time series. The evaluation is based on the number of times the correct model is chosen and the out of sample prediction error. The results of this study suggest that Rissanen's criteria are sensitive to the assumptions and choices that need to made in their application, and so are sometimes unreliable. While many of the criteria often perform satisfactorily, across experiments the Schwartz Bayesian Information Criterion (and the related Bayesian Estimation Criterion of Geweke-Meese) seem to consistently outperfom the other alternatives considered.  相似文献   

20.
Varying-coefficient models (VCMs) are useful tools for analysing longitudinal data. They can effectively describe the relationship between predictors and responses repeatedly measured. VCMs estimated by regularization methods are strongly affected by values of regularization parameters, and therefore selecting these values is a crucial issue. In order to choose these parameters objectively, we derive model selection criteria for evaluating VCMs from the viewpoints of information-theoretic and Bayesian approach. Models are estimated by the method of regularization with basis expansions, and then they are evaluated by model selection criteria. We demonstrate the effectiveness of the proposed criteria through Monte Carlo simulations and real data analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号