回归分析是数据挖掘中重要的方法之一。文章研究了基于半参数Beta回归模型结合惩罚样条估计的数据挖掘方法。当数据中因变量的数据取值为(0,1)区间(或某个区间)时,利用半参数Beta回归模型进行数据挖掘,不仅具有很好的解释效果,而且能挖掘出隐含在数据内部的有用信息。实验结果验证了研究方法的有效性。   

支持向量回归(SVR)是机器学习中重要的数据挖掘方法,当前关于SVR的研究大多基于二次规划理论,同时,利用交叉验证或一些智能算法选取模型中的超参数,然而,基于二次规划理论的SVR估计方法不仅计算量较大,而且不能进行后续的统计推断分析。文章基于贝叶斯方法研究SVR,通过引入两个潜在变量将SVR的?不敏感损失函数表示为双重正态-尺度混合模型并构建似然函数,通过选取适当的先验分布获得兴趣参数和超参数的Gibbs抽样算法。为筛选重要变量和最优模型,引入0-1指示变量并选取回归参数的Spike and Slab先验来获得贝叶斯变量选择算法。数值模拟证明了所提算法的有效性,并在非正态误差下表现出很好的稳健性。最后将所提方法应用于房价数据分析,得到了有意义的结果。   

张景肖  刘燕平 《统计研究》2012,29(9):95-102
本文对函数性广义线性模型曲线选择的正则化方法进行了较全面地综述,并比较了各种方法的性质。结果发现,函数性广义线性模型曲线选择问题具有群组效应,另外可能具有高维数据性质。同时通过数据模拟发现,Group Bridge、Group MCP、Elastic Net和Mnet表现出较好的数值结果。   

宋鹏等 《统计研究》2020,37(7):116-128
高维协方差矩阵的估计问题现已成为大数据统计分析中的基本问题,传统方法要求数据满足正态分布假定且未考虑异常值影响,当前已无法满足应用需要,更加稳健的估计方法亟待被提出。针对高维协方差矩阵,一种稳健的基于子样本分组的均值-中位数估计方法被提出且简单易行,然而此方法估计的矩阵并不具备正定稀疏特性。基于此问题,本文引进一种中心正则化算法,弥补了原始方法的缺陷,通过在求解过程中对估计矩阵的非对角元素施加L1范数惩罚,使估计的矩阵具备正定稀疏的特性,显著提高了其应用价值。在数值模拟中,本文所提出的中心正则稳健估计有着更高的估计精度,同时更加贴近真实设定矩阵的稀疏结构。在后续的投资组合实证分析中,与传统样本协方差矩阵估计方法、均值-中位数估计方法和RA-LASSO方法相比,基于中心正则稳健估计构造的最小方差投资组合收益率有着更低的波动表现。   

文章利用贝叶斯方法研究分位数回归的组间和组内双变量选择问题。基于偏态拉普拉斯分布和贝叶斯统计推断方法,结合组间和组内系数的Spike-and-Slab先验分布,提出了分位数回归的贝叶斯双层变量选择方法,并给出易于实施的Gibbs后验抽样算法。通过大量数值模拟和实证分析验证了所提变量选择方法的有效性。   

针对高维混合效应模型,本文提出了一种双正则化分位回归方法.通过对随机和固定效应系数同时实施L1正则化惩罚,一方面能够对重要解释变量进行挑选,另一方面能够消除个体随机波动带来的偏差.求解参数估计的交替迭代算法不仅破解了要同时确定两个调整参数的难题,而且算法速度快.模拟结果也表明该方法不仅对误差类型有很强的抗干扰能力,同时在模型有不同稀疏程度时均表现良好,尤其是对于解释变量多于样本的高维情况.为了方便在实际问题中选择最优正则化参数,本文还对两种参数选取标准进行了比较研究.最后利用新方法对一个教育方面的数据进行了实证演示,找出了在各个分位点处对学生成绩有影响的重要因素.   

结构方程模型估计中出现的不收敛和估计结果不恰当现象,可以归结为数学上的不适定问题。Tik honov正则化是解决不适定问题的一种有效方法。文章将Tikhonov正则化方法与结构方程模型的参数估计相结合,增加了其模型的正确收敛率,加快了模型收敛速度,为改进结构方程模型的估计方法提供了有价值的参考。   

对于半连续两部回归模型,考虑到每个回归部分都会遇到大量的候选变量,此时就会产生变量选择问题。文章主要研究Bernoulli-Normal两部回归模型的变量选择问题。先提出一种基于Lasso惩罚函数的变量选择方法,但考虑到Lasso估计量不具有Oracle性质,又提出一种基于自适应Lasso惩罚函数的变量选择方法。模拟结果表明:两种方法都能够对Bernoulli-Normal回归模型进行变量选择,且自适应Lasso方法的变量选择性能往往优于Lasso方法。   

回归深度(RD)是用来处理线性模型的一种方法。RD方法定义为给定观测值寻找具有最大深度的拟合。从几何的观点看,RD方法就是将观测点最均匀地分布在直线(或超平面)的两侧的拟合。文章简单地总结了深度回归的基本概念和稳健性质,同时用实例对回归深度和最小二乘法回归方法进行了比较。   

蒋文杰 《浙江统计》2000,(12):12-13
在经济计量分析中,利用回归模型方法对经济现象的变化进行描述、模拟和预测是经常使用的方法。在实际工作中,由于研究的特定现象不仅受到诸如产量、销售量、收入、价格、成本等数量变量的影响,而且也常常受到诸如政治、经济、自然条件以及社会文化背景等品质因素变化的影响。经常出现的情况是,这些因素的变化给回归模型的预测精度带来了较大的影响。所以,在建立回归模型时,既要考虑数量变量的变化,也要考虑品质因素的影响。因此,有必要将品质因素的变化加以量化,引入回归模型中。一、什么是虚拟变量虚拟变量是将非数量的品质因素影响加以量化,引入回归模型中。   

Beta Regression for Modelling Rates and Proportions   总被引:9,自引:0,他引:9  
This paper proposes a regression model where the response is beta distributed using a parameterization of the beta law that is indexed by mean and dispersion parameters. The proposed model is useful for situations where the variable of interest is continuous and restricted to the interval (0, 1) and is related to other variables through a regression structure. The regression parameters of the beta regression model are interpretable in terms of the mean of the response and, when the logit link is used, of an odds ratio, unlike the parameters of a linear regression that employs a transformed response. Estimation is performed by maximum likelihood. We provide closed-form expressions for the score function, for Fisher's information matrix and its inverse. Hypothesis testing is performed using approximations obtained from the asymptotic normality of the maximum likelihood estimator. Some diagnostic measures are introduced. Finally, practical applications that employ real data are presented and discussed.  相似文献   

为了尝试使用贝叶斯方法研究比例数据的分位数回归统计推断问题,首先基于Tobit模型给出了分位数回归建模方法,然后通过选取合适的先验分布得到了贝叶斯层次模型,进而给出了各参数的后验分布并用于Gibbs抽样。数值模拟分析验证了所提出的贝叶斯推断方法对于比例数据分析的有效性。最后,将贝叶斯方法应用于美国加州海洛因吸毒数据,在不同的分位数水平下揭示了吸毒频率的影响因素。   

Penalized regression methods have for quite some time been a popular choice for addressing challenges in high dimensional data analysis. Despite their popularity, their application to time series data has been limited. This paper concerns bridge penalized methods in a linear regression time series model. We first prove consistency, sparsity and asymptotic normality of bridge estimators under a general mixing model. Next, as a special case of mixing errors, we consider bridge regression with autoregressive and moving average (ARMA) error models and develop a computational algorithm that can simultaneously select important predictors and the orders of ARMA models. Simulated and real data examples demonstrate the effective performance of the proposed algorithm and the improvement over ordinary bridge regression.  相似文献   

In many regression problems, predictors are naturally grouped. For example, when a set of dummy variables is used to represent categorical variables, or a set of basis functions of continuous variables is included in the predictor set, it is important to carry out a feature selection both at the group level and at individual variable levels within the group simultaneously. To incorporate the group and variables within-group information into a regularized model fitting, several regularization methods have been developed, including the Cox regression and the conditional mean regression. Complementary to earlier works, the simultaneous group and within-group variables selection method is examined in quantile regression. We propose a hierarchically penalized quantile regression, and show that the hierarchical penalty possesses the oracle property in quantile regression, as well as in the Cox regression. The proposed method is evaluated through simulation studies and a real data application.  相似文献   

Stepwise variable selection procedures are computationally inexpensive methods for constructing useful regression models for a single dependent variable. At each step a variable is entered into or deleted from the current model, based on the criterion of minimizing the error sum of squares (SSE). When there is more than one dependent variable, the situation is more complex. In this article we propose variable selection criteria for multivariate regression which generalize the univariate SSE criterion. Specifically, we suggest minimizing some function of the estimated error covariance matrix: the trace, the determinant, or the largest eigenvalue. The computations associated with these criteria may be burdensome. We develop a computational framework based on the use of the SWEEP operator which greatly reduces these calculations for stepwise variable selection in multivariate regression.  相似文献   

In this article, we study stepwise AIC method for variable selection comparing with other stepwise method for variable selection, such as, Partial F, Partial Correlation, and Semi-Partial Correlation in linear regression modeling. Then we show mathematically that the stepwise AIC method and other stepwise methods lead to the same method as Partial F. Hence, there are more reasons to use the stepwise AIC method than the other stepwise methods for variable selection, since the stepwise AIC method is a model selection method that can be easily managed and can be widely extended to more generalized models and applied to non normally distributed data. We also treat problems that always appear in applications, that are validation of selected variables and problem of collinearity.  相似文献   


Errors-in-variable (EIV) regression is often used to gauge linear relationship between two variables both suffering from measurement and other errors, such as, the comparison of two measurement platforms (e.g., RNA sequencing vs. microarray). Scientists are often at a loss as to which EIV regression model to use for there are infinite many choices. We provide sound guidelines toward viable solutions to this dilemma by introducing two general nonparametric EIV regression frameworks: the compound regression and the constrained regression. It is shown that these approaches are equivalent to each other and, to the general parametric structural modeling approach. The advantages of these methods lie in their intuitive geometric representations, their distribution free nature, and their ability to offer candidate solutions with various optimal properties when the ratio of the error variances is unknown. Each includes the classic nonparametric regression methods of ordinary least squares, geometric mean regression (GMR), and orthogonal regression as special cases. Under these general frameworks, one can readily uncover some surprising optimal properties of the GMR, and truly comprehend the benefit of data normalization. Supplementary materials for this article are available online.  相似文献   

In ridge regression, the estimation of the ridge parameter is an important issue. This article generalizes some methods for estimating the ridge parameter for probit ridge regression (PRR) model based on the work of Kibria et al. (2011 Kibria, B. M. G., Månsson, K. and Shukur, G. 2011. Performance of some logistic ridge regression parameters. Computational Economics, DOI: 10.1007/s10614-011-9275-x [Google Scholar]). The performance of these new estimators is judged by calculating the mean squared error (MSE) using Monte Carlo simulations. In the design of the experiment, we chose to vary the sample size and the number of regressors. Furthermore, we generate explanatory variables that are linear combinations of other regressors, which is a common situation in economics. In an empirical application regarding Swedish job search data, we also illustrate the benefits of the new method.  相似文献   

田茂再  梅波 《统计研究》2019,36(8):114-128
本文考虑函数型数据的结构特征,针对两类函数型变量分位回归模型(函数型因变量对标量自变量和函数型因变量对函数型自变量),基于函数型倾斜分位曲线的定义构建新型函数型倾斜分位回归模型。对于第二类模型,本文分别考虑样条基函数对模型系数展开和函数型主成分基函数对函数型自变量展开,得到倾斜分位回归模型的基本形式。参数估计采用成分梯度Boosting算法最小化加权非对称损失函数,提高计算效率。在理论上证明了倾斜分位回归模型的系数估计量均服从渐近正态分布。模拟和实证研究结果显示,倾斜分位回归模型比已有的逐点分位回归模型具有更好的拟合效果。根据积分均方预测误差准则,本文提出的模型有一致较好的预测能力。   

We extend nonparametric regression models with local linear least squares fitting using kernel weights to the case of linear and circular predictors. We derive the asymptotic properties of the conditional bias and variance of bivariate local linear least squares kernel estimators. A small simulation study and a real experiment are given.  相似文献   

