首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are “close” to the chosen “base model,” and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure “Next-Door analysis” since it examines models “next” to the base model. It can be applied to supervised learning problems with 1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library. The Canadian Journal of Statistics 48: 447–470; 2020 © 2020 Statistical Society of Canada  相似文献   

2.
The results of analyzing experimental data using a parametric model may heavily depend on the chosen model for regression and variance functions, moreover also on a possibly underlying preliminary transformation of the variables. In this paper we propose and discuss a complex procedure which consists in a simultaneous selection of parametric regression and variance models from a relatively rich model class and of Box-Cox variable transformations by minimization of a cross-validation criterion. For this it is essential to introduce modifications of the standard cross-validation criterion adapted to each of the following objectives: 1. estimation of the unknown regression function, 2. prediction of future values of the response variable, 3. calibration or 4. estimation of some parameter with a certain meaning in the corresponding field of application. Our idea of a criterion oriented combination of procedures (which usually if applied, then in an independent or sequential way) is expected to lead to more accurate results. We show how the accuracy of the parameter estimators can be assessed by a “moment oriented bootstrap procedure", which is an essential modification of the “wild bootstrap” of Härdle and Mammen by use of more accurate variance estimates. This new procedure and its refinement by a bootstrap based pivot (“double bootstrap”) is also used for the construction of confidence, prediction and calibration intervals. Programs written in Splus which realize our strategy for nonlinear regression modelling and parameter estimation are described as well. The performance of the selected model is discussed, and the behaviour of the procedures is illustrated, e.g., by an application in radioimmunological assay.  相似文献   

3.
李昕  谭莹 《统计研究》2019,36(7):26-38
通过比较资本外流的三种核算口径,本文将间接法与权益差额调整法相结合,对我国1982-2016年资本外流的规模进行估计。结果显示,2008年全球金融危机以来,我国资本外流规模逐年增加且近年呈增速递增趋势。通过构建非限制性VAR模型,本文进一步对导致资本外流的结构性与周期性“推动”及“拉动”因素进行分析,经验结果显示,2008年全球金融危机爆发至2013年美联储货币周期转向以前,导致我国资本外流的主要因素为人民币贬值预期与我国经济周期性调整等国内“推动”因素。2014年以来,我国资本外流进一步加剧主要受美国经济复苏步伐加快以及我国房地产市场调整等国内外“拉动”与“推动”因素共同作用。其中,外部“拉动”因素是主要诱因。在资本加速流出背景下,我国金融开放步伐应渐进审慎。  相似文献   

4.
A new general method of combining estimators is proposed in order to obtain an estimator with “improved” small sample properties. It is based on a specification test statistic and incorporates some well-known methods like preliminary testing. It is used to derive an alternative estimator for the slope in the simple errors-in-variables model, combining OLS and the modified instrumental variable estimator by Fuller. Small sample properties of the new estimator are investigated by means of a Monte Carlo study.  相似文献   

5.
Heavily right-censored time to event, or survival, data arise frequently in research areas such as medicine and industrial reliability. Recently, there have been suggestions that auxiliary outcomes which are more fully observed may be used to “enhance” or increase the efficiency of inferences for a primary survival time variable. However, efficiency gains from this approach have mostly been very small. Most of the situations considered have involved semiparametric models, so in this note we consider two very simple fully parametric models. In the one case involving a correlated auxiliary variable that is always observed, we find that efficiency gains are small unless the response and auxiliary variable are very highly correlated and the response is heavily censored. In the second case, which involves an intermediate stage in a three-stage model of failure, the efficiency gains can be more substantial. We suggest that careful study of specific situations is needed to identify opportunities for “enhanced” inferences, but that substantial gains seem more likely when auxiliary information involves structural information about the failure process.  相似文献   

6.
Variable selection is an effective methodology for dealing with models with numerous covariates. We consider the methods of variable selection for semiparametric Cox proportional hazards model under the progressive Type-II censoring scheme. The Cox proportional hazards model is used to model the influence coefficients of the environmental covariates. By applying Breslow’s “least information” idea, we obtain a profile likelihood function to estimate the coefficients. Lasso-type penalized profile likelihood estimation as well as stepwise variable selection method are explored as means to find the important covariates. Numerical simulations are conducted and Veteran’s Administration Lung Cancer data are exploited to evaluate the performance of the proposed method.  相似文献   

7.
When the target variable exhibits a semicontinuous behavior (a point mass in a single value and a continuous distribution elsewhere), parametric “two-part models” have been extensively used and investigated. The applications have mainly been related to non negative variables with a point mass in zero (zero-inflated data). In this article, a semiparametric Bayesian two-part model for dealing with such variables is proposed. The model allows a semiparametric expression for the two parts of the model by using Dirichlet processes. A motivating example, based on grape wine production in Tuscany (an Italian region), is used to show the capabilities of the model. Finally, two simulation experiments evaluate the model. Results show a satisfactory performance of the suggested approach for modeling and predicting semicontinuous data when parametric assumptions are not reasonable.  相似文献   

8.
Many Bayes factors have been proposed for comparing population means in two-sample (independent samples) studies. Recently, Wang and Liu presented an “objective” Bayes factor (BF) as an alternative to a “subjective” one presented by Gönen et al. Their report was evidently intended to show the superiority of their BF based on “undesirable behavior” of the latter. A wonderful aspect of Bayesian models is that they provide an opportunity to “lay all cards on the table.” What distinguishes the various BFs in the two-sample problem is the choice of priors (cards) for the model parameters. This article discusses desiderata of BFs that have been proposed, and proposes a new criterion to compare BFs, no matter whether subjectively or objectively determined. A BF may be preferred if it correctly classifies the data as coming from the correct model most often. The criterion is based on a famous result in classification theory to minimize the total probability of misclassification. This criterion is objective, easily verified by simulation, shows clearly the effects (positive or negative) of assuming particular priors, provides new insights into the appropriateness of BFs in general, and provides a new answer to the question, “Which BF is best?”  相似文献   

9.
李腊生 《统计研究》1996,13(4):60-64
Based on the introduction of basic model of rational expectation and its policy effects, the essay explores deeply into the policy effects from the view point of the basic model structure and basic assumption of the rational expectation. The results show that “lapsed economic policy ” in rational expectation theory is unaccepted. Taking into consideration of disequilibrium status of pragmatic economic operation, the author finally sets up a nonwalrasian equilibrium model of ancillary expectation variable, and reaches the conclusion that whether economic policies are efficient or not depend on the status of economy.  相似文献   

10.
This article provides a strategy to identify the existence and direction of a causal effect in a generalized nonparametric and nonseparable model identified by instrumental variables. The causal effect concerns how the outcome depends on the endogenous treatment variable. The outcome variable, treatment variable, other explanatory variables, and the instrumental variable can be essentially any combination of continuous, discrete, or “other” variables. In particular, it is not necessary to have any continuous variables, none of the variables need to have large support, and the instrument can be binary even if the corresponding endogenous treatment variable and/or outcome is continuous. The outcome can be mismeasured or interval-measured, and the endogenous treatment variable need not even be observed. The identification results are constructive, and can be empirically implemented using standard estimation results.  相似文献   

11.
马文杰  胡玥 《统计研究》2022,39(1):59-74
定向增发是上市公司重要的再融资工具,如何完善监管制度以保障定向增发市场平稳有序发展是全面深化改革和实现高质量发展的关键环节。结合定向增发合理折价率的决定机制及双边随机前沿模型,本文提出了有效识别“合理”与“非合理”折价的方法,并利用我国定向增发市场数据进行了实证分析。结果表明,折价率实施“九折原则”以来,我国定向增发中“折价不足” 发行比例远远高于 “过度折价”的比例,并且“过度溢价”增发会显著拉升股价实现利益输送。只强调限制折价比例的监管思路,既达不到有效抑制利益输送的目的,又会挫伤投资者参与定向增发投资的积极性。本文不仅对有关定向增发折价率的研究进行了有益拓展,而且为进一步优化调整定向增发折价监管方式提供了政策参考。  相似文献   

12.
If the experimental design (or lack of design) results in a nodel which is not of f u l l rank, the problem of variable selection becomes rather complex. Tkis is due to the fact that, in less than f u l l rank models, not every linear combination of regression parameters is estimable. In this paper, we present a procedure fortesting all “testable” subsets of a complete set of regression parameters, using a technique based on Scheffe's method (1959). A class of “adequate” subsets of regression parameters is obtained in a manner similar to that of Aitkin (1974). The proposed procedure is illustraced with an example.  相似文献   

13.
When confronted with multiple covariates and a response variable, analysts sometimes apply a variable‐selection algorithm to the covariate‐response data to identify a subset of covariates potentially associated with the response, and then wish to make inferences about parameters in a model for the marginal association between the selected covariates and the response. If an independent data set were available, the parameters of interest could be estimated by using standard inference methods to fit the postulated marginal model to the independent data set. However, when applied to the same data set used by the variable selector, standard (“naive”) methods can lead to distorted inferences. The authors develop testing and interval estimation methods for parameters reflecting the marginal association between the selected covariates and response variable, based on the same data set used for variable selection. They provide theoretical justification for the proposed methods, present results to guide their implementation, and use simulations to assess and compare their performance to a sample‐splitting approach. The methods are illustrated with data from a recent AIDS study. The Canadian Journal of Statistics 37: 625–644; 2009 © 2009 Statistical Society of Canada  相似文献   

14.
Continuous-variable panel models are widely used in social and business research to assess the relationships between variables over time given measurements at several waves on a single sample of individuals. Emphasis is often placed on estimating and testing the cross-effects which are the regression coefficients indicating the lagged effect of one variable on another. For the simple two-variable model we consider the problem of incorpomating the contemporaneous relationship between the variables into the model. Three extensions of the “independent regressions” model are considered. Their similarities and differences are examined. The use of the models is illustrated by examining data on the attitudes toward the criminal justice system and capital punishment for a panel of petit jurors.  相似文献   

15.
Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.  相似文献   

16.
The author considers a reparameterized version of the Bayesian ordinal cumulative link regression model as a tool for exploring relationships between covariates and “cutpoint” parameters. The use of this parameterization allows one to fit models using the leapfrog hybrid Monte Carlo method, and to bypass latent variable data augmentation and the slow convergence of the cutpoints which it usually entails. The proposed Gibbs sampler is not model specific and can be easily modified to handle different link functions. The approach is illustrated by considering data from a pediatric radiology study.  相似文献   

17.
Ranked set sampling is a procedure which may be used to improve the precision of the estimator of the mean. It is useful in cases where the variable of interest is much more difficult to measure than to order. However, even if ordering is difficult, but there is an easily ranked concomitant variable available, then it may be used to “judgment order” the original variable. The amount of increase in the precision of the estimator is dependent upon the correlation between the 2 variables.  相似文献   

18.
孙文浩等 《统计研究》2021,38(6):102-115
研究政府减税何以提升高新技术“僵尸企业”的创新能力,对推动我国供给侧结构性改革和落实创新驱动发展战略具有重大的理论与现实意义。本文使用2008-2014年全国创新调查企业数据库,借鉴ABBGH(2005)的一般均衡模型,界定了高新技术“僵尸企业”的概念与识别方法。主要发现:第一,给高新技术“僵尸企业”减税对企业创新存在显著促进效应,尤其对创新型“僵尸企业”减税存在较强的“杠杆效应”;第二,给高新技术“僵尸企业”减税对企业创新的促进效应显著高于非高新技术“僵尸企业”;第三,政府对偏向科研固定资产投资的创新型“僵尸企业”和倾向基础科学研究的效率型“僵尸企业”增加减税额度更有利于激发企业的创新活力,是促使高新技术“僵尸企业”起死回生的重要途径。政府可利用创新型“僵尸企业”“重资产、轻科研”与效率型“僵尸企业” “重科研、轻资产”的创新策略优化税收优惠政策,为平稳、持续、高效推动我国供给侧结构性改革和创新驱动发展提供新的治理框架。  相似文献   

19.
Several authors have discussed Kalman filtering procedures using a mixture of normals as a model for the distributions of the noise in the observation and/or the state space equations. Under this model, resulting posteriors involve a mixture of normal distributions, and a “collapsing method” must be found in order to keep the recursive procedure simple. We prove that the Kullback-Leibler distance between the mixture posterior and that of a single normal distribution is minimized when we choose the mean and variance of the single normal distribution to be the mean and variance of the mixture posterior. Hence, “collapsing by moments” is optimal in this sense. We then develop the resulting optimal algorithm for “Kalman filtering” for this situation, and illustrate its performance with an example.  相似文献   

20.
In searching for the “best” growth inhibitor, we decided to consider growth inhibition in terms of the lengths of the terminal sprouts. For it is logical to infer that the trees with the longer sprouts (after a 20-month period) will most likely be the ones that will need trimming in the future. Additionally, we reasoned that if a particular treatment produced a smaller proportion of “long” sprouts, then it would be a more effective growth inhibitor. It was now necessary to define what was meant by “long”. After consultation with foresters we chose cutoff lengths of 15.0, 25.0 and 35.0 cm. Hence the response variable was chosen to be the proportion of the terminal sprouts on a tree that exceeded a specified cutoff length. By varying the cutoff lengths, we would minimize the effect of the arbitrariness involved in choosing one particular length.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号