首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 203 毫秒
1.
王小燕等 《统计研究》2014,31(9):107-112
变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构作筛选,实现了组内和组间双层选择。该方法的优点是对各单个系数和组系数采取不同程度的惩罚,避免了过度惩罚大系数,从而提高了模型的估计和预测精度。求解的难点是惩罚似然函数不是严格凸的,因此本文基于组坐标下降法求解模型,并建立了调整参数的选取准则。模拟分析表明,对比现有代表性方法Sparse Group Lasso、Group Lasso及Lasso,adSGL法不仅提高了双层选择精度,而且降低了模型误差。最后本文将adSGL法应用到信用卡信用评分研究,对比logistic回归,它具有更高的分类精度和稳健性。  相似文献   

2.
大数据具有数据来源差异性、高维性及稀疏性等特点,如何挖掘数据集间的异质性和共同性并降维去噪是大数据分析的目标与挑战之一。整合分析(Integrative Analysis)同时分析多个独立数据集,避免因地域、时间等因素造成的样本差异而引起模型不稳定,是研究大数据差异性的有效方法。它的特点是将每个解释变量在所有数据集中的系数视为一组,通过惩罚函数对系数组进行压缩,研究变量间的关联性并实现降维。本文从同构数据整合分析、异构数据整合分析以及考虑网络结构的整合分析三方面梳理了惩罚整合分析方法的原理、算法和研究现状。统计模拟发现,在弱相关、一般相关和强相关三种情形下, Group Bridge、 Group MCP、Composite MCP都表现良好,其中 Group Bridge的假阳数最低且最稳定。最后,将整合分析用于研究具有来源差异性的新农合家庭医疗支出,以及具有超高维、小样本等大数据典型特征的癌症基因数据,得到了一些有意义的结论。  相似文献   

3.
张景肖  刘燕平 《统计研究》2012,29(9):95-102
本文对函数性广义线性模型曲线选择的正则化方法进行了较全面地综述,并比较了各种方法的性质。结果发现,函数性广义线性模型曲线选择问题具有群组效应,另外可能具有高维数据性质。同时通过数据模拟发现,Group Bridge、Group MCP、Elastic Net和Mnet表现出较好的数值结果。  相似文献   

4.
王娜 《统计研究》2016,33(11):56-62
为了研究大数据是否能够帮助我们预测碳排放权价格,本文讨论了结构化数据和非结构化信息对预测碳价所起的作用。结构化数据选取了国际碳现货价格、碳期货价格和汇率,非结构化信息选择百度搜索指数和媒体指数。考虑到当解释变量很多时,平等对待每一个解释变量是不合理的,所以提出了网络结构自回归分布滞后(ADL)模型,在参数估计和变量选择的同时兼顾了解释变量之间的网络关系。实证分析表明,网络结构ADL模型明显优于其他模型,可以获得较高的预测准确性,更适合基于大数据的预测。  相似文献   

5.
基于AUC回归的不平衡数据特征选择模型研究   总被引:1,自引:0,他引:1  
针对不平衡数据的泛化预测和特征选择问题,提出了一种引入MCP惩罚函数的AUC回归模型(MCP-AUCR)。该模型采用考虑所有阈值信息的优化目标函数,具有处理不平衡数据的能力,并具有较好的特征选择效果;在讨论该模型定义与原理的基础上,提出相应的循环坐标下降训练算法,并通过数值模拟研究验证其优良性质;针对中国股票市场机械、设备、仪表板块中的上市公司,构建了基于MCP-AUCR的财务预警模型。研究结果显示:该财务预警模型可以选择出可解释的重要财务指标并进行有效预测,显著优于传统模型。  相似文献   

6.
文章基于变系数模型,研究了模型变量选择的问题.采用B样条函数逼近模型中的系数函数,结合LASSO、SCAD和MCP罚函数,利用组坐标下降算法进行变量选择.通过模拟比较了这三种罚函数的效果.模拟结果印证提出方法的有效性,并且得到MCP和SCAD优于LASSO.  相似文献   

7.
一、多重共线性现象多元回归模型是进行经济预测与分析的一种常见有效的方法。但是,在实际工作中,常常只注意研究解释变量与被解释变量之间的经济联系,忽视了解释变量之间的相关性。事实上,经济变量之间关系错综复杂,一个变量的变动常常受几个变量的制约,解释变量之...  相似文献   

8.
文章基于解释变量与被解释变量之间的互信息提出一种新的变量选择方法:MI-SIS。该方法可以处理解释变量数目p远大于观测样本量n的超高维问题,即p=O(exp(nε))ε>0。另外,该方法是一种不依赖于模型假设的变量选择方法。数值模拟和实证研究表明,MI-SIS方法在小样本情形下能够有效地发现微弱信号。  相似文献   

9.
在建模的实践中,并不存在惟一的方法来表现被解释变量和解释变量之间依存关系,因此在实际建模过程中,就存在一个从多个备择模型形式中进行选择以及对所选模型形式进行诊断检验的问题.函数设定形式的检验是指关于对条件期望函数(条件均值方程)的具体函数形式的设定以及对所设定的函数形式进行检验两个方面.文章对此进行了讨论.  相似文献   

10.
《统计与信息论坛》2019,(10):100-107
企业运营管理活动的结果可能表现在多个方面,这些结果的产生会受到若干个因素的影响。影响因素及其组合对企业运营管理结果的作用效果是否存在差异,从计量角度加以测算分析,有助于企业资源的合理配置和利用。运用统计回归分析的基本原理,讨论了多个影响因素及其组合对企业经营多个目标回归效应差异问题,包括:不同组解释变量对同一被解释变量回归效应的比较,同一组解释变量对不同被解释变量回归效应的比较,不同组解释变量对所有被解释变量回归效应的比较,并结合事实数据进行了应用说明。  相似文献   

11.
Structural vector autoregressive analysis for cointegrated variables   总被引:1,自引:0,他引:1  
Summary Vector autoregressive (VAR) models are capable of capturing the dynamic structure of many time series variables. Impulse response functions are typically used to investigate the relationships between the variables included in such models. In this context the relevant impulses or innovations or shocks to be traced out in an impulse response analysis have to be specified by imposing appropriate identifying restrictions. Taking into account the cointegration structure of the variables offers interesting possibilities for imposing identifying restrictions. Therefore VAR models which explicitly take into account the cointegration structure of the variables, so-called vector error correction models, are considered. Specification, estimation and validation of reduced form vector error correction models is briefly outlined and imposing structural short- and long-run restrictions within these models is discussed. I thank an anonymous reader for comments on an earlier draft of this paper that helped me to improve the exposition.  相似文献   

12.
Motivated by an entropy inequality, we propose for the first time a penalized profile likelihood method for simultaneously selecting significant variables and estimating unknown coefficients in multiple linear regression models in this article. The new method is robust to outliers or errors with heavy tails and works well even for error with infinite variance. Our proposed approach outperforms the adaptive lasso in both theory and practice. It is observed from the simulation studies that (i) the new approach possesses higher probability of correctly selecting the exact model than the least absolute deviation lasso and the adaptively penalized composite quantile regression approach and (ii) exact model selection via our proposed approach is robust regardless of the error distribution. An application to a real dataset is also provided.  相似文献   

13.
Many recent multiple testing papers have provided more efficient and/or robust methodology for control of a particular error rate. However, different multiple testing scenarios call for the control of different error rates. Hence, the procedure possessing the desired optimality and/or robustness properties may not be applicable to the problem at hand. This paper provides a general method for extending any multiple testing procedure to control any error rate, thereby allowing for the procedure possessing the desired properties to be used to control the most relevant error rate. As an example, two popular procedures that were originally designed to control the marginal and positive False Discovery Rate are extended to control the False Discovery Rate and Family-wise Error Rate. It is shown that optimality and/or robustness properties of the original procedure are retained when it is modified using the proposed method.  相似文献   

14.
Dimensionality reduction is one of the important preprocessing steps in high-dimensional data analysis. In this paper we propose a supervised manifold learning method, it makes use of the information of continuous dependent variables to distinguish intrinsic neighbourhood and extrinsic neighbourhood of data samples, and construct two graphs according to these two kinds of neighbourhoods. Following the idea of Laplacian eigenmaps, we reveal that on the low-dimensional manifold the neighbourhood structure can be preserved or even improved. Our approach has two important characteristics: (i) it uses dependent variables to find an informative low-dimensional projection which is robust to noisy independent variables and (ii) the objective function simultaneously enlarges the distance between dissimilar samples and pushes similar samples close to each other according to the graph constructed with the help of continuous dependent variables. Our experiments demonstrate that the effectiveness of our method is over their traditional rivals.  相似文献   

15.
随着计算机的飞速发展,极大地便利了数据的获取和存储,很多企业积累了大量的数据,同时数据的维度也越来越高,噪声变量越来越多,因此在建模分析时面临的重要问题之一就是从高维的变量中筛选出少数的重要变量。针对因变量取值为(0,1)区间的比例数据提出了正则化Beta回归,研究了在LASSO、SCAD和MCP三种惩罚方法下的极大似然估计及其渐进性质。统计模拟表明MCP的方法会优于SCAD和LASSO,并且随着样本量的增大,SCAD的方法也将优于LASSO。最后,将该方法应用到中国上市公司股息率的影响因素研究中。  相似文献   

16.
Linear discriminant analysis between two populations is considered in this paper. Error rate is reviewed as a criterion for selection of variables, and a stepwise procedure is outlined that selects variables on the basis of empirical estimates of error. Problems with assessment of the selected variables are highlighted. A leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself. Monte Carlo simulations, of multivariate binary as well as multivariate normal data, demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods.  相似文献   

17.
Cluster analysis methods are based on measures of 'distance' between objects. Sometimes the objects have an internal structure, and use of this can be made when defining such distances. This leads to non-standard cluster analysis methods. We illustrate with an application in which the objects are themselves classes and the aim is to produce clusters of classes which minimize the error rate of a supervised classification rule. For supervised classification problems with more than a handful of classes, there may exist groups of classes which are well separated from other groups, even though individual classes are not all well separated. In such cases, the overall misclassification rate is a crude measure of performance and more subtle measures, taking note of subgroup separation, are desirable. The fact that points can be assigned accurately to groups, if not to individual classes, can sometimes be practically useful.  相似文献   

18.
The measurement error model (MEM) is an important model in statistics because in a regression problem, the measurement error of the explanatory variable will seriously affect the statistical inferences if measurement errors are ignored. In this paper, we revisit the MEM when both the response and explanatory variables are further involved with rounding errors. Additionally, the use of a normal mixture distribution to increase the robustness of model misspecification for the distribution of the explanatory variables in measurement error regression is in line with recent developments. This paper proposes a new method for estimating the model parameters. It can be proved that the estimates obtained by the new method possess the properties of consistency and asymptotic normality.  相似文献   

19.
Constructing pair-copula using the minimum information approach is an appropriate and flexible way to survey the dependency structure between variables of interest. Minimum information pair-copula method approximates multivariate copula by applying some constraints between desired variables that are elicited from the data itself or experts’ judgment. In minimum information pair-copula, selecting basis constraints is a challenge. In this article, we apply genetic algorithms as a heuristic way to select basis constraints to optimize approximated pair-copula. The results gained show that our method optimizes model selection criteria and lead to better pair-copula approximation. Finally, we apply our proposed method to approximate pair-copula density in real dataset.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号