首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 207 毫秒
在惩罚样条回归方法中,截断幂基系数的惩罚权重是相等的,这导致在数据具有局部异质性时不能很好地去拟合原函数.文章以结点两端数据点的方差构造了一种新的局部惩罚样条回归方法,能够很好地解决数据具有局部异质性的问题.该方法对数据波动较小的区域给予较大的惩罚,而对数据波动较大的区域给予较小的惩罚,从而实现惩罚的局部性.通过模拟的结果可知,当数据具有局部异质性时,构造的新的局部惩罚样条比整体惩罚样条和光滑样条具有更好的拟合效果.  相似文献   

基于空气质量数据特征,在B-样条基底拟合曲线的基础上,将曲线本身信息、曲线变化信息引入分析,构造加权曲线深度指标,探索一种异常曲线探测方法。与现有仅考虑离散点信息和曲线本身信息的方法相比较,该探测方法更加符合空气质量数据特点,具备缺失值处理能力及整体异常和局部异常的识别能力。将该方法应用于兰州市空气质量数据采集点的二氧化氮水平曲线异常情况分析,结果表明该方法具有更好的异常情况识别效果。  相似文献   

基于经典计量模型的统计数据质量评估方法   总被引:6,自引:2,他引:4       下载免费PDF全文
刘洪  黄燕 《统计研究》2009,26(3):91-96
 本文以经济理论为基础,从整个经济系统出发,利用研究对象的相关影响因素构造计量模型,在既定模型下,运用异常值的检验方法及统计诊断原理进行数据质量的定量评估。通过选择合适的模型对考察对象的变化规律进行模拟,找出异常数据(离群值),判断异常数据是否显著异常,对异常数据进行多方查证和原因分析来进一步判断数据的质量,并对我国的统计数据质量进行了实证分析。  相似文献   

黄超  黄丽丽 《统计与决策》2012,(22):154-156
对具有长记忆性的汇率数据进行准确预测具有重要的理论和现实意义。文章基于样条小波构造了一类新的双正交小波核函数并建立了相应的支持向量机模型。通过分数差分方法消除汇率数据的长记忆性,对欧元兑美元和欧元兑日元两个汇率数据进行了预测研究。结果表明双正交小波核支持向量机能够有效的避免过学习,其拟合优度和预测精度均优于正交小波核支持向量机和高斯核支持向量机。  相似文献   

回归分析是数据挖掘中重要的方法之一。文章研究了基于半参数Beta回归模型结合惩罚样条估计的数据挖掘方法。当数据中因变量的数据取值为(0,1)区间(或某个区间)时,利用半参数Beta回归模型进行数据挖掘,不仅具有很好的解释效果,而且能挖掘出隐含在数据内部的有用信息。实验结果验证了研究方法的有效性。  相似文献   

文章对非线性回归模型的参数估计递推方法进行了介绍,给出了它们在非线性模型参数估计中的MATLAB实现.通过计算机仿真说明参数估计递推方法与传统数据拟合的方法相比,具有更好的拟合效果.  相似文献   

随着样本量的增大和复杂度的提高,蕴含的信息更加多样化,一般的参数拟合满足不了需求.非参数拟合应运而生,其中的样条拟合就是一个典型.通过构造样条函数,转化成向量表达式Y=Xβ+ε,依据参数估计中的最小二乘法求解出插值函数.结合理财产品收益率的相关数据,在投资方向结构型、期限结构、发行金额等几个方面,对三种不同风险系数的理财产品采用非参数拟合模型拟合三条收益率曲线,对比曲线趋势,对未来时刻进行分析预测.  相似文献   

为了研究我国宏观统计指标数据的有效性,文章采用CPI作为因变量,通过平稳性和单位根的检验,确定有效的滞后期;然后采用协整分析和格兰杰检验确定影响CPI的自变量,进行回归拟合和数据质量评估。结果表明,CPI的数据误差主要来源于随机因素而非模型与实际数据的系统偏离.因此可以判断该模型的拟合与预测效果非常优良。通过相对误差的波动规律发现,CPI数据的相对误差在区间[2%,2%内波动,而且主要集中在【-1%,1%这个范围内。因此总体来说,我国统计的CPI数据的质量的可信度处于较高的水平。  相似文献   

通常情况下,对用电量进行预测的问题可以采用广义可加模型(GAM),但当数据集很大时,在计算机上实现起来就非常困难,甚至是不可行的.因此,本文给出了大数据集下实用的广义可加模型拟合方法,模型中的平滑项用惩罚回归样条函数来表示.只需保证在任何时候模型矩阵的子矩阵可以在计算机上实现,该方法就可以通过迭代更新的方式得到模型矩阵的因子.本文研究证明,该方法可以有效地对平滑参数进行估计.当有新数据加入时,用电量预测模型需要不断地拟合更新,并且需要对新的用电量数据序列的自相关性进行处理.本文给出了处理这些问题的方法,以及在计算机上的实现过程.该方法可以实现使用一般的中型计算机来处理大数据集的广义可加模型的估计问题.最后,对法国用电量预测的实证研究表明,降秩样条平滑方法也能够很好地处理复杂的模型问题.  相似文献   

文章采用McCulloch的三次样条方法估计了利率期限结构。经上交所国债交易数据检验,该方法拟合效果较好,操作简便,具有较强的适应性,可以估计不同类型的利率期限结构。  相似文献   

数据窥查效应是金融学研究中的一种常见现象,虽然很早就引起了学者们的关注,但由于研究方法的限制,目前国内还没有关于数据窥查效应的系统研究。为此,针对数据窥察效应,借助平稳Bootstrap模拟,探讨可靠性检验的统计学原理和计算步骤。构建了2 000多个模型,采用递归最小二乘的估计方法,研究是否可以利用技术交易规则预测沪深300指数的走势,并进一步探讨可靠性检验P值的动态演变过程,验证理论分析的结论,进而通过增加用于预测的样本长度来有效克服数据窥查效应。对平稳Bootstrap模拟区组选择的探讨表明,同时选择多个区组长度进行实证分析可以使结论更加稳健。  相似文献   

Estimates from an EM algorithm are somewhat sensitive to the initial values for the estimates, and this sensitivity is likely to increase when the model becomes larger and more complicated. In this paper, we examined how the estimates fluctuate during an EM procedure for a recursive model of categorical variables. It is found that the fluctuation takes place mostly during the initial stage of the procedure and that it can be reduced by applying a Bayes method of estimation. Both real and simulated data are used for illustration.  相似文献   

针对收入分布函数形式选择问题,提出具有"自适应"能力的收入分布序列拟合思路,给出基于B-样条的收入分布函数形式,并对收入分布参数进行最小二乘估计。拟合了中国历年城镇居民收入分布序列;导出中国1996-2009年洛伦兹曲线和基尼系数;从函数角度刻画了城镇居民收入水平不断提高的同时,收入差距扩大的动态趋势;验证了城镇居民收入差距的变动轨迹体现着"阶梯形"扩大的特征。  相似文献   

Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data.  相似文献   

Several approaches have been suggested for fitting linear regression models to censored data. These include Cox's propor­tional hazard models based on quasi-likelihoods. Methods of fitting based on least squares and maximum likelihoods have also been proposed. The methods proposed so far all require special purpose optimization routines. We describe an approach here which requires only a modified standard least squares routine.

We present methods for fitting a linear regression model to censored data by least squares and method of maximum likelihood. In the least squares method, the censored values are replaced by their expectations, and the residual sum of squares is minimized. Several variants are suggested in the ways in which the expect­ation is calculated. A parametric (assuming a normal error model) and two non-parametric approaches are described. We also present a method for solving the maximum likelihood equations in the estimation of the regression parameters in the censored regression situation. It is shown that the solutions can be obtained by a recursive algorithm which needs only a least squares routine for optimization. The suggested procesures gain considerably in computational officiency. The Stanford Heart Transplant data is used to illustrate the various methods.  相似文献   

欧阳敏华  章贵军 《统计研究》2016,33(12):101-109
在STAR模型框架下,考虑时间序列具有线性确定性趋势成分,本文建立了一个递归退势单位根检验统计量,推导了其渐近分布;并在考虑初始条件情形下,对递归退势、OLS和GLS退势单位根检验统计量的有限样本性质进行了细致的比较研究。若忽略初始条件的影响,GLS退势和递归退势单位根检验统计量的检验势都显著高于OLS退势。随着初始条件的增大,GLS退势单位根检验统计量的检验势下降得比较厉害,递归退势单位根检验统计量的检验势较为稳定,且在样本量较大情形下更具优势。  相似文献   

The Hosmer–Lemeshow (H–L) test is a widely used method when assessing the goodness-of-fit of a logistic regression model. However, the H–L test is sensitive to the sample sizes and the number of groups in H–L test. Cautions need to be taken for interpreting an H–L test with a large sample size. In this paper, we propose a simple test procedure to evaluate the model fit of logistic regression model with a large sample size, in which a bootstrap method is used and the test result is determined by the power of H–L test at the target sample size. Simulation studies show that the proposed method can effectively standardize the power of the H–L test under the pre-specified level of type I error. Application to the two datasets illustrates the usefulness of the proposed model.  相似文献   

文章利用中国证券市场的日内交易数据实证了非参数ACD模型。非参数ACD模型不依赖条件均值的函数形式和误差项的分布形式,更具有一般意义。文章从多个方面进行实证分析。利用非参数方法进行分析的结果表明:数据不能用线性ACD模型来刻画,根据非参数拟合曲面的形状可以把此ACD模型的函数形式设定为某种非线性形式。  相似文献   

The B-spline representation is a common tool to improve the fitting of smooth nonlinear functions, it offers a fitting as a piecewise polynomial. The regions that define the pieces are separated by a sequence of knots. The main difficulty in this type of modeling is the choice of the number and the locations of these knots. The Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm provides a solution to simultaneously select these two parameters by considering the knots as free parameters. This algorithm belongs to the MCMC techniques that allow simulations from target distributions on spaces of varying dimension. The aim of the present investigation is to use this algorithm in the framework of the analysis of survival time, for the Cox model in particular. In fact, the relation between the hazard ratio function and the covariates being assumed to be log-linear, this assumption is too restrictive. Thus, we propose to use the RJMCMC algorithm to model the log hazard ratio function by a B-spline representation with an unknown number of knots at unknown locations. This method is illustrated with two real data sets: the Stanford heart transplant data and lung cancer survival data. Another application of the RJMCMC is selecting the significant covariates, and a simulation study is performed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号