首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 738 毫秒
1.
基于信用卡邮寄业务响应率分析来讨论Logistic模型和分类树模型在变量选取上的区别,并尝试从几个不同角度去解释两类模型变量筛选差异的原因。笔者认为没有绝对占优势的方法,需要结合具体场景和模型的特点来选择合适的模型。分类树模型在训练集上容易过度拟合,对单个变量的影响很敏感,在进行危险因素分析时结果更能强调危险因素,对孤立点的识别率很高。Logistic模型容易受到解释变量依存关系的影响,加上分类变量的影响容易过多地选入变量或者因子,对孤立点敏感,对噪点不敏感。判别函数的差异是变量筛选差异的关键因素。  相似文献   

2.
关于利用因子分析方法对变量分类的探讨   总被引:5,自引:0,他引:5  
因子分析是一种将具有错综复杂关系的变量综合成数量较少的几个因子,以再现原始变量与因子之间的相互关系的一种多元统计分析方法.目前学术界对因子分析的应用一般局限于对多元变量群进行降维,从而降低分析问题的复杂性,而至于对变量分类,人们常用的方法是R型聚类分析.但是通过本文的研究却可以发现利用因子分析的方法同样可以达到对变量进行分类的目的,而且利用因子分析对变量分类的方法的理论本质与R型聚类分析是一致的.本文首先介绍了利用因子分析方法对变量分类的基本思想,然后通过实证分析,验证了该方法的可行性和可靠性.  相似文献   

3.
文章着重研究了带有有序分类变量的结构方程模型的模型选择问题,并将一个基于贝叶斯准则的统计量称为测度,应用到此类模型中进行模型选择。通过实例分析说明了上述方法的应用,并给出了根据贝叶斯因子进行模型选择的结果。  相似文献   

4.
文章结合基函数逼近以及光滑门限估计方程,提出了一个变系数模型的快速变量选择方法。该变量选择方法可以同时进行系数估计和变量选择,并且不需要解任何凸优化问题。因此,与已有方法相比,该方法在实际应用中将大大减少计算量。  相似文献   

5.
针对非线性模型中调节作用检验不能只依赖交叉项系数的问题,目前存在两个解决办法:构造更好的统计量和通过图形直观表现.但是已有研究对于非线性模型中的离散型和连续型调节变量没有作进一步区分,尤其是非线性模型中连续型调节变量的检验一直缺乏简便有效的方法.文章从模型和数据两个维度划分了四种调节变量的类型,梳理了不同类型调节变量的检验方法和存在的问题,针对连续型调节变量检验的问题,提出了对连续变量进行多点分割的处理方式,并采用三维作图的方法,直观展现了调节作用的连续变化.  相似文献   

6.
在典型相关分析中,求得典型相关变量的表达式并没有全部完成任务,例如需要确定典型相关变量的个数和变量选择。针对典型相关变量的个数问题,发现了常用的卡方检验和冗余分析方法的不足,进而提出了一种新的算法。针对原始变量的选择问题,提出了三种可能的路径。最后利用人体尺寸数据对相关结论进行了验证。  相似文献   

7.
支持向量回归(SVR)是机器学习中重要的数据挖掘方法,当前关于SVR的研究大多基于二次规划理论,同时,利用交叉验证或一些智能算法选取模型中的超参数,然而,基于二次规划理论的SVR估计方法不仅计算量较大,而且不能进行后续的统计推断分析。文章基于贝叶斯方法研究SVR,通过引入两个潜在变量将SVR的?不敏感损失函数表示为双重正态-尺度混合模型并构建似然函数,通过选取适当的先验分布获得兴趣参数和超参数的Gibbs抽样算法。为筛选重要变量和最优模型,引入0-1指示变量并选取回归参数的Spike and Slab先验来获得贝叶斯变量选择算法。数值模拟证明了所提算法的有效性,并在非正态误差下表现出很好的稳健性。最后将所提方法应用于房价数据分析,得到了有意义的结果。  相似文献   

8.
购物网站的用户搜索量数据是近年来出现的一种新类型数据源。基于该数据在合理选择关键词以及对数据进行季节调整和假日处理的基础上建立中国全国与城镇CPI的及时预测模型。模型以分布滞后模型为基础,采用Elastic-Net方法进行收缩估计进而实现变量选择。在确定最优惩罚因子和调整参数时采用了K重交叉验证技术。实证结果表明,搜索量变量与CPI具有显著的因果关系,在此基础上建立的预测模型经济含义合理,并能对CPI做出较精确的预测。同时,从模型均方误差角度看,基于Elastic-Net的变量选择显著优于基于逐步回归的方法,而城镇CPI预测模型也优于全国CPI预测模型。  相似文献   

9.
针对传统交叉分类信度模型计算复杂且在结构参数先验信息不足的情况下不能得到参数无偏后验估计的问题,利用MCMC模拟和GLMM方法,对交叉分类信度模型进行实证分析证明模型的有效性。结果表明:基于MCMC方法能够动态模拟参数的后验分布,并可提高模型估计的精度;基于GLMM能大大简化计算过程且操作方便,可利用图形和其它诊断工具选择模型,并对模型实用性做出评价。  相似文献   

10.
高校科研效率提升是当前中国高等院校科研创新工作亟待解决的核心问题。针对传统DEA模型在高校科研效率评价中存在的有效决策单元无法全排序、交叉效率值不唯一等问题,引入TOPSIS思想,建立基于理想决策单元的DEA交叉模型,并采用熵权法集结各决策单元效率值,构建改进的DEA交叉模型;运用因子分析构建评价指标体系,通过消除指标间相关性以满足DEA模型对指标技术要求;采用改进的DEA交叉模型对中国40所"一流大学"建设高校科研效率进行评价,研究结果表明40所"一流大学"建设高校的整体科研效率水平偏低,还存在较大的改进空间。  相似文献   

11.
Income and wealth data are typically modelled by some variant of the classical Pareto distribution. Often, in practice, the observed data are truncated with respect to some unobserved covariate. In this paper, a hidden truncation formulation of this scenario is proposed and analysed. For this purpose, a bivariate Pareto (IV) distribution is assumed for the variable of interest and the unobserved covariate. Some important distributional properties of the resulting model as well as associated inferential methods are studied. An example is used finally to illustrate the results developed here. In this case, it is noted that hidden truncation on the left does not result in any new model, but the hidden truncation on the right does. The properties and fit of such a model pose a challenging problem and that is what is focused here in this work.  相似文献   

12.
数据挖掘中多分类有序变量间距差异分析及应用   总被引:1,自引:0,他引:1  
文章在明确累积logistic回归模型的基础上,针对多分类有序变量存在间距差异的问题,提出了统计检验方法并引入工具虚拟变量对logistic模型加以改进,通过其在实际中的应用,取得了良好的效果.  相似文献   

13.
The problem of deciding whether an intercept model or a no-intercept model is more appropriate for a given set of data is a problem with no simple solution. Often, the underlying physical situation will suggest an appropriate model; however, there still may be interest in assessing which model best fits the data or is the better predictor. In this article a different interpretation of regression through the origin is derived, that of a full fit to the original data set augmented by one further point. Examination of the leverage and influence of the augmented data point can provide help in comparing the models.  相似文献   

14.
Multi-layer perceptrons (MLPs), a common type of artificial neural networks (ANNs), are widely used in computer science and engineering for object recognition, discrimination and classification, and have more recently found use in process monitoring and control. Training such networks is not a straightforward optimisation problem, and we examine features of these networks which contribute to the optimisation difficulty.Although the original perceptron, developed in the late 1950s (Rosenblatt 1958, Widrow and Hoff 1960), had a binary output from each node, this was not compatible with back-propagation and similar training methods for the MLP. Hence the output of each node (and the final network output) was made a differentiable function of the network inputs. We reformulate the MLP model with the original perceptron in mind so that each node in the hidden layers can be considered as a latent (that is, unobserved) Bernoulli random variable. This maintains the property of binary output from the nodes, and with an imposed logistic regression of the hidden layer nodes on the inputs, the expected output of our model is identical to the MLP output with a logistic sigmoid activation function (for the case of one hidden layer).We examine the usual MLP objective function—the sum of squares—and show its multi-modal form and the corresponding optimisation difficulty. We also construct the likelihood for the reformulated latent variable model and maximise it by standard finite mixture ML methods using an EM algorithm, which provides stable ML estimates from random starting positions without the need for regularisation or cross-validation. Over-fitting of the number of nodes does not affect this stability. This algorithm is closely related to the EM algorithm of Jordan and Jacobs (1994) for the Mixture of Experts model.We conclude with some general comments on the relation between the MLP and latent variable models.  相似文献   

15.
Properties of the Weibull cumulative exposure model   总被引:1,自引:0,他引:1  
This article is aimed at the investigation of some properties of the Weibull cumulative exposure model on multiple-step step-stress accelerated life test data. Although the model includes a probabilistic idea of Miner's rule in order to express the effect of cumulative damage in fatigue, our result shows that the application of only this is not sufficient to express degradation of specimens and the shape parameter must be larger than 1. For a random variable obeying the model, its average and standard deviation are investigated on a various sets of parameter values. In addition, a way of checking the validity of the model is illustrated through an example of the maximum likelihood estimation on an actual data set, which is about time to breakdown of cross-linked polyethylene-insulated cables.  相似文献   

16.
The prediction problem of sea state based on the field measurements of wave and meteorological factors is a topic of interest from the standpoints of navigation safety and fisheries. Various statistical methods have been considered for the prediction of the distribution of sea surface elevation. However, prediction of sea state in the transitional situation when waves are developing by blowing wind has been a difficult problem until now, because the statistical expression of the dynamic mechanism during this situation is very complicated. In this article, we consider this problem through the development of a statistical model. More precisely, we develop a model for the prediction of the time-varying distribution of sea surface elevation, taking into account a non-homogeneous hidden Markov model in which the time-varying structures are influenced by wind speed and wind direction. Our prediction experiments suggest the possibility that the proposed model contributes to an improvement of the prediction accuracy by using a homogenous hidden Markov model. Furthermore, we found that the prediction accuracy is influenced by the circular distribution of the circular hidden Markov model for the directional time series wind direction data.  相似文献   

17.
A crucial issue for principal components analysis (PCA) is to determine the number of principal components to capture the variability of usually high-dimensional data. In this article the dimension detection for PCA is formulated as a variable selection problem for regressions. The adaptive LASSO is used for the variable selection in this application. Simulations demonstrate that this approach is more accurate than existing methods in some cases and competitive in some others. The performance of this model is also illustrated using a real example.  相似文献   

18.
In most software reliability models which utilize the nonhomogeneous Poisson process (NHPP), the intensity function for the counting process is usually assumed to be continuous and monotone. However, on account of various practical reasons, there may exist some change points in the intensity function and thus the assumption of continuous and monotone intensity function may be unrealistic in many real situations. In this article, the Bayesian change-point approach using beta-mixtures for modeling the intensity function with possible change points is proposed. The hidden Markov model with non constant transition probabilities is applied to the beta-mixture for detecting the change points of the parameters. The estimation and interpretation of the model is illustrated using the Naval Tactical Data System (NTDS) data. The proposed change point model will be also compared with the competing models via marginal likelihood. It can be seen that the proposed model has the highest marginal likelihood and outperforms the competing models.  相似文献   

19.
Introducing a shape parameter to an exponential model is nothing new. There are many ways to introduce a shape parameter to an exponential distribution. The different methods may result in variety of weighted exponential (WE) distributions. In this article, we have introduced a shape parameter to an exponential model using the idea of Azzalini, which results in a new class of WE distributions. This new WE model has the probability density function (PDF) whose shape is very close to the shape of the PDFS of Weibull, gamma or generalized exponential distributions. Therefore, this model can be used as an alternative to any of these distributions. It is observed that this model can also be obtained as a hidden truncation model. Different properties of this new model have been discussed and compared with the corresponding properties of well-known distributions. Two data sets have been analysed for illustrative purposes and it is observed that in both the cases it fits better than Weibull, gamma or generalized exponential distributions.  相似文献   

20.
This article tests the Fourier flexible form on quarterly U.S. monetary data. The data have been prescreened for consistency with the general axiom of revealed preference, and subindexes are formed using the Divisia approach. In this article, the global Fourier model fits well, although there is a potential problem of overfitting and certain data points exhibit behavior inconsistent with the model. The elasticities are variable over time, particularly around business-cycle troughs. It appears that financial asset demand surfaces are highly nonlinear and the many unsuccessful existing attempts to estimate money demand may not have worked well for this reason.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号