首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
一种新的空间权重矩阵选择方法   总被引:1,自引:0,他引:1       下载免费PDF全文
任英华  游万海 《统计研究》2012,29(6):99-105
空间权重矩阵选择问题一直是空间计量经济学中的一个难题,权重矩阵的选择正确与否关系到模型的最终估计结果。本文在空间滞后模型框架下,把空间权重矩阵选择问题转化为变量选择问题,然后利用CWB方法进行变量选择。中国城市服务业集聚机理实证研究显示,利用本文所提出的方法所选取的空间权重矩阵较为合理,进而可以减少因为空间权重矩阵误设问题而引起的模型估计偏误。在大样本情形下,该方法可以非常有效地降低计算成本。  相似文献   

2.
We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as $$L_2$$-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt variable selection. Due to the variable selection, we build on recent proposals for post-selection inference. However, the iterative nature of component-wise boosting, which can repeatedly select the same component to update, necessitates adaptations and extensions to existing approaches. We propose tests and confidence intervals for linear, grouped and penalized additive model components selected by $$L_2$$-Boosting. Our concepts also transfer to slow-learning algorithms more generally, and to other selection techniques which restrict the response space to more complex sets than polyhedra. We apply our framework to an additive model for sales prices of residential apartments and investigate the properties of our concepts in simulation studies.  相似文献   

3.
We derive two types of Akaike information criterion (AIC)‐like model‐selection formulae for the semiparametric pseudo‐maximum likelihood procedure. We first adapt the arguments leading to the original AIC formula, related to empirical estimation of a certain Kullback–Leibler information distance. This gives a significantly different formula compared with the AIC, which we name the copula information criterion. However, we show that such a model‐selection procedure cannot exist for copula models with densities that grow very fast near the edge of the unit cube. This problem affects most popular copula models. We then derive what we call the cross‐validation copula information criterion, which exists under weak conditions and is a first‐order approximation to exact cross validation. This formula is very similar to the standard AIC formula but has slightly different motivation. A brief illustration with real data is given.  相似文献   

4.
In this article, we propose a new empirical information criterion (EIC) for model selection which penalizes the likelihood of the data by a non-linear function of the number of parameters in the model. It is designed to be used where there are a large number of time series to be forecast. However, a bootstrap version of the EIC can be used where there is a single time series to be forecast. The EIC provides a data-driven model selection tool that can be tuned to the particular forecasting task.

We compare the EIC with other model selection criteria including Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC). The comparisons show that for the M3 forecasting competition data, the EIC outperforms both the AIC and BIC, particularly for longer forecast horizons. We also compare the criteria on simulated data and find that the EIC does better than existing criteria in that case also.  相似文献   

5.
Model selection is the most persuasive problem in generalized linear models. A model selection criterion based on deviance called the deviance-based criterion (DBC) is proposed. The DBC is obtained by penalizing the difference between the deviance of the fitted model and the full model. Under certain weak conditions, DBC is shown to be a consistent model selection criterion in the sense that with probability approaching to one, the selected model asymptotically equals the optimal model relating response and predictors. Further, the use of DBC in link function selection is also discussed. We compare the proposed model selection criterion with existing methods. The small sample efficiency of proposed model selection criterion is evaluated by the simulation study.  相似文献   

6.
The theoretical foundation for a number of model selection criteria is established in the context of inhomogeneous point processes and under various asymptotic settings: infill, increasing domain and combinations of these. For inhomogeneous Poisson processes we consider Akaike's information criterion and the Bayesian information criterion, and in particular we identify the point process analogue of ‘sample size’ needed for the Bayesian information criterion. Considering general inhomogeneous point processes we derive new composite likelihood and composite Bayesian information criteria for selecting a regression model for the intensity function. The proposed model selection criteria are evaluated using simulations of Poisson processes and cluster point processes.  相似文献   

7.
Zhouping Li  Yiming Liu 《Statistics》2017,51(5):1006-1022
In estimation of multiplicative or accelerated failure time models, the relative error criterion has been recognized as an alternative to the squared or absolute error criterion. The general relative error criterion introduced by Chen et al. [Least product relative error estimation. J Multivariate Anal. 2016;144:91–98] is a unified framework for efficient estimation, which includes the least absolute relative error estimation and least product relative error estimation as special cases. In this paper, by combining the empirical likelihood and general relative error criterion in multiplicative model, we develop a new empirical likelihood method for inference on the unknown parameters under high-dimensional setting. Limiting theory is established for the proposed empirical likelihood statistic. We conduct some simulation studies and real data analysis to evaluate the effectiveness of the proposed method.  相似文献   

8.
This article considers panel data models in the presence of a large number of potential predictors and unobservable common factors. The model is estimated by the regularization method together with the principal components procedure. We propose a panel information criterion for selecting the regularization parameter and the number of common factors under a diverging number of predictors. Under the correct model specification, we show that the proposed criterion consistently identifies the true model. If the model is instead misspecified, the proposed criterion achieves asymptotically efficient model selection. Simulation results confirm these theoretical arguments.  相似文献   

9.
This paper considers model averaging for the ordered probit and nested logit models, which are widely used in empirical research. Within the frameworks of these models, we examine a range of model averaging methods, including the jackknife method, which is proved to have an optimal asymptotic property in this paper. We conduct a large-scale simulation study to examine the behaviour of these model averaging estimators in finite samples, and draw comparisons with model selection estimators. Our results show that while neither averaging nor selection is a consistently better strategy, model selection results in the poorest estimates far more frequently than averaging, and more often than not, averaging yields superior estimates. Among the averaging methods considered, the one based on a smoothed version of the Bayesian Information criterion frequently produces the most accurate estimates. In three real data applications, we demonstrate the usefulness of model averaging in mitigating problems associated with the ‘replication crisis’ that commonly arises with model selection.  相似文献   

10.
One of the fundamental issues in analyzing microarray data is to determine which genes are expressed and which ones are not for a given group of subjects. In datasets where many genes are expressed and many are not expressed (i.e., underexpressed), a bimodal distribution for the gene expression levels often results, where one mode of the distribution represents the expressed genes and the other mode represents the underexpressed genes. To model this bimodality, we propose a new class of mixture models that utilize a random threshold value for accommodating bimodality in the gene expression distribution. Theoretical properties of the proposed model are carefully examined. We use this new model to examine the problem of differential gene expression between two groups of subjects, develop prior distributions, and derive a new criterion for determining which genes are differentially expressed between the two groups. Prior elicitation is carried out using empirical Bayes methodology in order to estimate the threshold value as well as elicit the hyperparameters for the two component mixture model. The new gene selection criterion is demonstrated via several simulations to have excellent false positive rate and false negative rate properties. A gastric cancer dataset is used to motivate and illustrate the proposed methodology.  相似文献   

11.
Generalized linear models (GLMs) are widely studied to deal with complex response variables. For the analysis of categorical dependent variables with more than two response categories, multivariate GLMs are presented to build the relationship between this polytomous response and a set of regressors. Traditional variable selection approaches have been proposed for the multivariate GLM with a canonical link function when the number of parameters is fixed in the literature. However, in many model selection problems, the number of parameters may be large and grow with the sample size. In this paper, we present a new selection criterion to the model with a diverging number of parameters. Under suitable conditions, the criterion is shown to be model selection consistent. A simulation study and a real data analysis are conducted to support theoretical findings.  相似文献   

12.
The problem of constructing classification methods based on both labeled and unlabeled data sets is considered for analyzing data with complex structures. We introduce a semi-supervised logistic discriminant model with Gaussian basis expansions. Unknown parameters included in the logistic model are estimated by regularization method along with the technique of EM algorithm. For selection of adjusted parameters, we derive a model selection criterion from Bayesian viewpoints. Numerical studies are conducted to investigate the effectiveness of our proposed modeling procedures.  相似文献   

13.
Selection of relevant predictor variables for building a model is an important problem in the multiple linear regression. Variable selection method based on ordinary least squares estimator fails to select the set of relevant variables for building a model in the presence of outliers and leverage points. In this article, we propose a new robust variable selection criterion for selection of relevant variables in the model and establish its consistency property. Performance of the proposed method is evaluated through simulation study and real data.  相似文献   

14.
This article deals with a semisupervised learning based on naive Bayes assumption. A univariate Gaussian mixture density is used for continuous input variables whereas a histogram type density is adopted for discrete input variables. The EM algorithm is used for the computation of maximum likelihood estimators of parameters in the model when we fix the number of mixing components for each continuous input variable. We carry out a model selection for choosing a parsimonious model among various fitted models based on an information criterion. A common density method is proposed for the selection of significant input variables. Simulated and real datasets are used to illustrate the performance of the proposed method.  相似文献   

15.
Linear discriminant analysis between two populations is considered in this paper. Error rate is reviewed as a criterion for selection of variables, and a stepwise procedure is outlined that selects variables on the basis of empirical estimates of error. Problems with assessment of the selected variables are highlighted. A leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself. Monte Carlo simulations, of multivariate binary as well as multivariate normal data, demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods.  相似文献   

16.
This article proposes a mixture double autoregressive model by introducing the flexibility of mixture models to the double autoregressive model, a novel conditional heteroscedastic model recently proposed in the literature. To make it more flexible, the mixing proportions are further assumed to be time varying, and probabilistic properties including strict stationarity and higher order moments are derived. Inference tools including the maximum likelihood estimation, an expectation–maximization (EM) algorithm for searching the estimator and an information criterion for model selection are carefully studied for the logistic mixture double autoregressive model, which has two components and is encountered more frequently in practice. Monte Carlo experiments give further support to the new models, and the analysis of an empirical example is also reported.  相似文献   

17.
姚青松等 《统计研究》2018,35(5):119-128
本文考虑了非线性GARCH族的模型平均估计方法。在备选模型集合包含拥有不同模型结构的非线性GARCH族的情况下,本文构建了非线性GARCH族的模型平均估计量,并给出相应的权重选择准则。在一定正则条件下,本文证明上述模型平均估计量具有渐近最优性,即渐近实现真实最优的KL偏离度。蒙特卡洛模拟结果表明,在大部分情形下,本文提出的模型平均估计量取得了更小的相对KL偏离值。作为非线性GARCH族的模型平均估计方法的应用,本文对2016年6月1日至2017年6月1日上证指数的日波动率进行估计,与现有模型选择与模型平均方法相比较,本文模型平均估计方法具有更高的精度。  相似文献   

18.
This paper studies the Bridge estimator for a high-dimensional panel data model with heterogeneous varying coefficients, where the random errors are assumed to be serially correlated and cross-sectionally dependent. We establish oracle efficiency and the asymptotic distribution of the Bridge estimator, when the number of covariates increases to infinity with the sample size in both dimensions. A BIC-type criterion is also provided for tuning parameter selection. We further generalise the marginal Bridge estimator for our model to asymptotically correctly identify the covariates with zero coefficients even when the number of covariates is greater than the sample size under a partial orthogonality condition. The finite sample performance of the proposed estimator is demonstrated by simulated data examples, and an empirical application with the US stock dataset is also provided.  相似文献   

19.
Recent literature provides many computational and modeling approaches for covariance matrices estimation in a penalized Gaussian graphical models but relatively little study has been carried out on the choice of the tuning parameter. This paper tries to fill this gap by focusing on the problem of shrinkage parameter selection when estimating sparse precision matrices using the penalized likelihood approach. Previous approaches typically used K-fold cross-validation in this regard. In this paper, we first derived the generalized approximate cross-validation for tuning parameter selection which is not only a more computationally efficient alternative, but also achieves smaller error rate for model fitting compared to leave-one-out cross-validation. For consistency in the selection of nonzero entries in the precision matrix, we employ a Bayesian information criterion which provably can identify the nonzero conditional correlations in the Gaussian model. Our simulations demonstrate the general superiority of the two proposed selectors in comparison with leave-one-out cross-validation, 10-fold cross-validation and Akaike information criterion.  相似文献   

20.
In this article, we study model selection and model averaging in quantile regression. Under general conditions, we develop a focused information criterion and a frequentist model average estimator for the parameters in quantile regression model, and examine their theoretical properties. The new procedures provide a robust alternative to the least squares method or likelihood method, and a major advantage of the proposed procedures is that when the variance of random error is infinite, the proposed procedure works beautifully while the least squares method breaks down. A simulation study and a real data example are presented to show that the proposed method performs well with a finite sample and is easy to use in practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号