首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
本文首先梳理了我国金融业分类的演化,借鉴国际标准产业分类体系(ISIC4.0)、新加坡金融业分类和我国香港金融业分类的基础上,对我国2011年公布的《国民经济行业分类与代码(GB/T4754–2011)》中金融业的分类提出了改进建议;其次,基于改进的金融业分类条件下,分析我国季度金融业增加值计算方法的缺陷,提出了比重模型和回归模型来改进计算方法;最后,利用2008普查年度某地区金融业数据进行实证研究,发现两个模型的计算结果均较准确。  相似文献   

2.
在数据挖掘的分类问题中,经常出现数据集内类别不平衡现象。大部分分类方法对于不平衡数据集内的小类数据,分类精度并不理想。文章分析了多目标线性规划分类方法(简称MCLP)在不平衡数据集上的表现;然后从模型角度,提出了面向不平衡数据集的加权MCLP分类模型。从理论上分析了加权MCLP分类模型的有效性,并从实证角度,与其他方法进行了比较。  相似文献   

3.
人工神经网络变量选取与隐藏单元数的确定   总被引:2,自引:0,他引:2  
根据多隐藏层所有训练样本误差平方和最小设计优化问题,求解并绘出计算流程图。Trevor等人认为隐藏单元过多比过少好,交叉验证估计(隐藏单元)正则化参数没有必要。还有一种通常做法是常常利用分类树挑选变量作为输入变量进行人工神经网络建模。而从人工神经网络与多元统计、传统回归和其他数据挖掘工具的区别和联系出发,认为这些观点和做法值得商酌;用ZIP编码实例说明隐藏单元过多不一定比过少好,实际数据分析中所需隐藏单元数的确定可以用交叉验证结合经验判断来实现,利用分类树选择的变量对于人工神经网络没有太大的效果;通过分类树删节变量以降低计算量的效果不如通过压缩隐藏单元个数降低计算量来得好;非完全问题“从简单到一般”思想与完全问题中选择所有变量的思想不矛盾。在总结了Le Cun等人的局部联结以有效降低权数思想的基础上,提出通过随机选择人工变量建立人工神经网络分布式模型系统的设想。  相似文献   

4.
本文运用分类与回归树挖掘技术,对西安市社会医疗保险基金中心2005年1月到2006年8月870个冠心病住院病例样本数据作分类回归树决策分析。分析结果表明是否手术、住院天数、医院级别是统筹支付费用的决定性因素,并分别给出了该病种不同类别的统筹支付费用的上限。为医疗统筹支付费用政策的制定提供了科学依据。  相似文献   

5.
周巍等 《统计研究》2015,32(7):81-86
遥感影像是大数据的一种,利用遥感对农作物播种面积进行估算常采用回归估计量或校准估计量,通常都需要将地面样本数据与遥感分类信息相结合。但对于大多数回归估计量,对省级总体的农作物面积估算只能满足对省级总体的精度要求而不能分解到更小区域,比如县和乡级。本文利用黑龙江省2011年的地面实测样本数据结合遥感分类结果,构建了单元层次的多响应变量的多元回归形式的小域模型,并将小域效应设定为固定形式。这样基于回归估计方法,既可以估算分县的主要作物播种面积,也可以使得各县播种面积估计结果相加就等于回归模型含义下的省级总体的总量估计。对黑龙江省玉米、水稻、大豆分县小域估计结果的精度评价(变异系数C.V),平均而言均可以满足县级精度要求。本文的结果表明小域估计方法在解决省级总体对全省和分县的农作物种植面积多级估算问题中具有很好的应用。  相似文献   

6.
也谈异常数据的处理安徽财贸学院杨桂元《中国统计》1995年第7期发表了柳金甫同志的文章《怎样处理异常数据》(以下简称柳文),笔者对于柳金甫同志有关回归分析中判别异常数据的意见有不同看法,在此与之商榷。按照柳文中的例子,以x表示儿童年龄(按月计算),y...  相似文献   

7.
数据挖掘功能是数据挖掘研究与应用的一个重要方面。数据挖掘功能用于指定数据挖掘任务中要找的模式类型。当前,数据挖掘的功能所处理的主要是传统的数据,对于函数型数据的研究还不是很多。文章探讨了数据挖掘中可以挖掘的几种函数型数据模式,包括数据描述、分类、聚类和回归。  相似文献   

8.
国家形象是国家软硬实力的综合体现,对于一国在国际舞台的表现有着重要影响。现实中存在许多影响国家形象的因素。本文从国际竞争力的概念和数据出发,通过对世界主要国家和地区的国家形象的影响因素做出实证分析研究。利用洛桑国际管理发展学院(IMD)的国际竞争力数据库中关于国家形象指标的数据,我们采用分位回归方法,以揭示造成不同国家形象水平的客观影响因素,并与普通最小二乘回归结果进行对比,从而得到更全面的信息,为提高我国的国家形象提供实证依据。  相似文献   

9.
王全众 《统计研究》2007,24(2):81-83
摘  要:本文主要结合具体例子,针对具有相关关系的分类数据的统计分析,介绍了两类Logistic回归模型,并分析了它们的联系与区别 。  相似文献   

10.
刘汉中 《统计研究》2007,24(11):74-79
摘  要:理论研究表明许多经济变量呈现出非对称的门限自回归(TAR)或动态门限自回归(M-TAR)数据生成机制,因而非对称单位根检验就成为该领域的主要研究方向之一。本文对非对称单位根检验Enders-Granger方法在GARCH(1,1)-正态误差项下的检验水平与检验势作了系统的仿真研究。研究表明:GARCH(1,1)-正态误差项的TAR或M-TAR模型会对该方法的检验水平和检验势产生重要影响。  相似文献   

11.
Nonparametric seemingly unrelated regression provides a powerful alternative to parametric seemingly unrelated regression for relaxing the linearity assumption. The existing methods are limited, particularly with sharp changes in the relationship between the predictor variables and the corresponding response variable. We propose a new nonparametric method for seemingly unrelated regression, which adopts a tree-structured regression framework, has satisfiable prediction accuracy and interpretability, no restriction on the inclusion of categorical variables, and is less vulnerable to the curse of dimensionality. Moreover, an important feature is constructing a unified tree-structured model for multivariate data, even though the predictor variables corresponding to the response variable are entirely different. This unified model can offer revelatory insights such as underlying economic meaning. We propose the key factors of tree-structured regression, which are an impurity function detecting complex nonlinear relationships between the predictor variables and the response variable, split rule selection with negligible selection bias, and tree size determination solving underfitting and overfitting problems. We demonstrate our proposed method using simulated data and illustrate it using data from the Korea stock exchange sector indices.  相似文献   

12.
We derive the optimal regression function (i.e., the best approximation in the L2 sense) when the vector of covariates has a random dimension. Furthermore, we consider applications of these results to problems in statistical regression and classification with missing covariates. It will be seen, perhaps surprisingly, that the correct regression function for the case with missing covariates can sometimes perform better than the usual regression function corresponding to the case with no missing covariates. This is because even if some of the covariates are missing, an indicator random variable δδ, which is always observable, and is equal to 1 if there are no missing values (and 0 otherwise), may have far more information and predictive power about the response variable Y than the missing covariates do. We also propose kernel-based procedures for estimating the correct regression function nonparametrically. As an alternative estimation procedure, we also consider the least-squares method.  相似文献   

13.
This paper relates computational commutative algebra to tree classification with binary covariates. With a single classification variable, properties of uniqueness of a tree polynomial are established. In a binary multivariate context, it is shown how trees for many response variables can be made into a single ideal of polynomials for computations. Finally, a new sequential algorithm is proposed for uniform conditional sampling. The algorithm combines the lexicographic Groebner basis with importance sampling and it can be used for conditional comparisons of regulatory network maps. The binary state space leads to an explicit form for the design ideal, which leads to useful radical and extension properties that play a role in the algorithms.  相似文献   

14.
The aim of this article is to improve the quality of cookies production by classifying them as good or bad from the curves of resistance of dough observed during the kneading process. As the predictor variable is functional, functional classification methodologies such as functional logit regression and functional discriminant analysis are considered. A P-spline approximation of the sample curves is proposed to improve the classification ability of these models and to suitably estimate the relationship between the quality of cookies and the resistance of dough. Inference results on the functional parameters and related odds ratios are obtained using the asymptotic normality of the maximum likelihood estimators under the classical regularity conditions. Finally, the classification results are compared with alternative functional data analysis approaches such as componentwise classification on the logit regression model.  相似文献   

15.
Variable selection problem is one of the most important tasks in regression analysis, especially in a high-dimensional setting. In this paper, we study this problem in the context of scalar response functional regression model, which is a linear model with scalar response and functional regressors. The functional model can be represented by certain multiple linear regression model via basis expansions of functional variables. Based on this model and random subspace method of Mielniczuk and Teisseyre (Comput Stat Data Anal 71:725–742, 2014), two simple variable selection procedures for scalar response functional regression model are proposed. The final functional model is selected by using generalized information criteria. Monte Carlo simulation studies conducted and a real data example show very satisfactory performance of new variable selection methods under finite samples. Moreover, they suggest that considered procedures outperform solutions found in the literature in terms of correctly selected model, false discovery rate control and prediction error.  相似文献   

16.
We are concerned with cumulative regression models for an ordered categorical response variable Y. We propose two methods to build partial residuals from regression on a subset Z1 of covariates Z., which take into regard the ordinal character of the response. The first method makes use of a multivariate GLM-representation of the model and produces residual measures for diagnostic purposes. The second uses a latent continuous variable model and yields new (adjusted) ordinal data Y*. Both methods are illustrated by a data set from forestry.  相似文献   

17.
In this paper we design a sure independent ranking and screening procedure for censored regression (cSIRS, for short) with ultrahigh dimensional covariates. The inverse probability weighted cSIRS procedure is model-free in the sense that it does not specify a parametric or semiparametric regression function between the response variable and the covariates. Thus, it is robust to model mis-specification. This model-free property is very appealing in ultrahigh dimensional data analysis, particularly when there is lack of information for the underlying regression structure. The cSIRS procedure is also robust in the presence of outliers or extreme values as it merely uses the rank of the censored response variable. We establish both the sure screening and the ranking consistency properties for the cSIRS procedure when the number of covariates p satisfies \(p=o\{\exp (an)\}\), where a is a positive constant and n is the available sample size. The advantages of cSIRS over existing competitors are demonstrated through comprehensive simulations and an application to the diffuse large-B-cell lymphoma data set.  相似文献   

18.
Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many irrelevant variables and the number of predictors exceeds the number of observations. We propose the multistep regression tree with adaptive variable selection to handle this problem. The variable selection step and the fitting step comprise the multistep method.

The multistep generalized unbiased interaction detection and estimation (GUIDE) with adaptive forward selection (fg) algorithm, as a variable selection tool, performs better than some of the well-known variable selection algorithms such as efficacy adaptive regression tube hunting (EARTH), FSR (false selection rate), LSCV (least squares cross-validation), and LASSO (least absolute shrinkage and selection operator) for the regression problem. The results based on simulation study show that fg outperforms other algorithms in terms of selection result and computation time. It generally selects the important variables correctly with relatively few irrelevant variables, which gives good prediction accuracy with less computation time.  相似文献   

19.
Many algorithms originated from decision trees have been developed for classification problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy, namely high misclassification rates when there are many irrelevant variables. We propose multi-step classification trees with adaptive variable selection (the multi-step GUIDE classification tree (MG) and the multi-step CRUISE classification tree (MC) to handle this problem. The variable selection step and the fitting step comprise the multi-step method.

We compare the performance of classification trees in the presence of irrelevant variables. MG and MC perform better than Random Forest and C4.5 with an extremely noisy dataset. Furthermore, the prediction accuracy of our proposed algorithm is relatively stable even when the number of irrelevant variables increases, while that of other algorithms worsens.  相似文献   

20.
Here we consider a multinomial probit regression model where the number of variables substantially exceeds the sample size and only a subset of the available variables is associated with the response. Thus selecting a small number of relevant variables for classification has received a great deal of attention. Generally when the number of variables is substantial, sparsity-enforcing priors for the regression coefficients are called for on grounds of predictive generalization and computational ease. In this paper, we propose a sparse Bayesian variable selection method in multinomial probit regression model for multi-class classification. The performance of our proposed method is demonstrated with one simulated data and three well-known gene expression profiling data: breast cancer data, leukemia data, and small round blue-cell tumors. The results show that compared with other methods, our method is able to select the relevant variables and can obtain competitive classification accuracy with a small subset of relevant genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号