期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Least absolute value regression: a special case of piecewise linear minimization

Richard H. Bartels Andrew R. Conn 《统计学通讯:模拟与计算》2013,42(4):329-339

The Barrodale and Roberts algorithm for least absolute value (LAV) regression and the algorithm proposed by Bartels and Conn both have the advantage that they are often able to skip across points at which the conventional simplex-method algorithms for LAV regression would be required to carry out an (expensive) pivot operation.

We indicate here that this advantage holds in the Bartels-Conn approach for a wider class of problems: the minimization of piecewise linear functions. We show how LAV regression, restricted LAV regression, general linear programming and least maximum absolute value regression can all be easily expressed as piecewise linear minimization problems. 相似文献

2.

Quantile regression for large-scale data via sparse exponential transform method

Q. F. Xu C. Cai X. Huang 《Statistics》2019,53(1):26-42

In recent decades, quantile regression has received much more attention from academics and practitioners. However, most of existing computational algorithms are only effective for small or moderate size problems. They cannot solve quantile regression with large-scale data reliably and efficiently. To this end, we propose a new algorithm to implement quantile regression on large-scale data using the sparse exponential transform (SET) method. This algorithm mainly constructs a well-conditioned basis and a sampling matrix to reduce the number of observations. It then solves a quantile regression problem on this reduced matrix and obtains an approximate solution. Through simulation studies and empirical analysis of a 5% sample of the US 2000 Census data, we demonstrate efficiency of the SET-based algorithm. Numerical results indicate that our new algorithm is effective in terms of computation time and performs well for large-scale quantile regression. 相似文献

3.

Variational Inference of Linear Regression with Nonzero Prior Means

Zijian Dong Zhongming Wang 《统计学通讯:模拟与计算》2016,45(7):2241-2248

In this article, we employ the variational Bayesian method to study the parameter estimation problems of linear regression model, wherein some regressors are of Gaussian distribution with nonzero prior means. We obtain an analytical expression of the posterior parameter distribution, and then propose an iterative algorithm for the model. Simulations are carried out to test the performance of the proposed algorithm, and the simulation results confirm both the effectiveness and the reliability of the proposed algorithm. 相似文献

4.

Efficient sampling schemes for Bayesian MARS models with many predictors

David J. Nott Anthony Y. C. Kuk Hiep Duc 《Statistics and Computing》2005,15(2):93-101

Multivariate adaptive regression spline fitting or MARS (Friedman 1991) provides a useful methodology for flexible adaptive regression with many predictors. The MARS methodology produces an estimate of the mean response that is a linear combination of adaptively chosen basis functions. Recently, a Bayesian version of MARS has been proposed (Denison, Mallick and Smith 1998a, Holmes and Denison, 2002) combining the MARS methodology with the benefits of Bayesian methods for accounting for model uncertainty to achieve improvements in predictive performance. In implementation of the Bayesian MARS approach, Markov chain Monte Carlo methods are used for computations, in which at each iteration of the algorithm it is proposed to change the current model by either (a) Adding a basis function (birth step) (b) Deleting a basis function (death step) or (c) Altering an existing basis function (change step). In the algorithm of Denison, Mallick and Smith (1998a), when a birth step is proposed, the type of basis function is determined by simulation from the prior. This works well in problems with a small number of predictors, is simple to program, and leads to a simple form for Metropolis-Hastings acceptance probabilities. However, in problems with very large numbers of predictors where many of the predictors are useless it may be difficult to find interesting interactions with such an approach. In the original MARS algorithm of Friedman (1991) a heuristic is used of building up higher order interactions from lower order ones, which greatly reduces the complexity of the search for good basis functions to add to the model. While we do not exactly follow the intuition of the original MARS algorithm in this paper, we nevertheless suggest a similar idea in which the Metropolis-Hastings proposals of Denison, Mallick and Smith (1998a) are altered to allow dependence on the current model. Our modification allows more rapid identification and exploration of important interactions, especially in problems with very large numbers of predictor variables and many useless predictors. Performance of the algorithms is compared in simulation studies. 相似文献

5.

Multilevel zero-inflated negative binomial regression modeling for over-dispersed count data with extra zeros

Abbas Moghimbeigi Kazem Mohammad Brian Mcardle 《Journal of applied statistics》2008,35(10):1193-1202

Count data with excess zeros often occurs in areas such as public health, epidemiology, psychology, sociology, engineering, and agriculture. Zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression are useful for modeling such data, but because of hierarchical study design or the data collection procedure, zero-inflation and correlation may occur simultaneously. To overcome these challenges ZIP or ZINB may still be used. In this paper, multilevel ZINB regression is used to overcome these problems. The method of parameter estimation is an expectation-maximization algorithm in conjunction with the penalized likelihood and restricted maximum likelihood estimates for variance components. Alternative modeling strategies, namely the ZIP distribution are also considered. An application of the proposed model is shown on decayed, missing, and filled teeth of children aged 12 years old. 相似文献

6.

Approximate penalization path for smoothly clipped absolute deviation

《Journal of Statistical Computation and Simulation》2012,82(5):643-652

Feature selection often constitutes one of the central aspects of many scientific investigations. Among the methodologies for feature selection in penalized regression, the smoothly clipped and absolute deviation seems to be very useful because it satisfies the oracle property. However, its estimation algorithms such as the local quadratic approximation and the concave–convex procedure are not computationally efficient. In this paper, we propose an efficient penalization path algorithm. Through numerical examples on real and simulated data, we illustrate that our path algorithm can be useful for feature selection in regression problems. 相似文献

7.

Robust Coordinate Descent Algorithm Robust Solution Path for High-dimensional Sparse Regression Modeling

H. Park S. Konishi 《统计学通讯:模拟与计算》2016,45(1):115-129

The L₁-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L₁-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L₁-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers. 相似文献

8.

The regression dilemma

R.R. Hocking O.J. Pendleton 《统计学通讯:理论与方法》2013,42(5):497-527

A substantial fraction of the statistical analyses and in particular statistical computing is done under the heading of multiple linear regression. That is the fitting of equations to multivariate data using the least squares technique for estimating parameters The optimality properties of these estimates are described in an ideal setting which is not often realized in practice.

Frequently, we do not have "good" data in the sense that the errors are non-normal or the variance is non-homogeneous. The data may contain outliers or extremes which are not easily detectable but variables in the proper functional, and we. may have the linearity

Prior to the mid-sixties regression programs provided just the basic least squares computations plus possibly a step-wise algorithm for variable selection. The increased interest in regression prompted by dramatic improvements in computers has led to a vast amount of literatur describing alternatives to least squares improved variable selection methods and extensive diagnostic procedures

The purpose of this paper is to summarize and illustrate some of these recent developments. In particular we shall review some of the potential problems with regression data discuss the statistics and techniques used to detect these problems and consider some of the proposed solutions. An example is presented to illustrate the effectiveness of these diagnostic methods in revealing such problems and the potential consequences of employing the proposed methods. 相似文献

9.

非参数异方差模型中条件回归函数的EM算法——基于农村食品消费与纯收入的实证研究

王继霞申培萍《统计与信息论坛》2014,(1):9-12

对非参数异方差模型中回归函数的EM算法进行研究,并基于EM算法得到了条件回归函数的估计。此外,通过对农村居民食品消费支出与纯收入关系的实证分析,说明了基于EM算法的估计方法比最小二乘估计方法的拟合效果更好,并对恩格尔系数进行了拟合,分析了其变化走势。相似文献

10.

Unsupervised learning of regression mixture models with unknown number of components

《Journal of Statistical Computation and Simulation》2012,82(12):2308-2334

ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications. 相似文献

11.

A non-iterative posterior sampling algorithm for linear quantile regression model

Fengkai Yang 《统计学通讯:模拟与计算》2017,46(8):5861-5878

In this article, a non-iterative posterior sampling algorithm for linear quantile regression model based on the asymmetric Laplace distribution is proposed. The algorithm combines the inverse Bayes formulae, sampling/importance resampling, and the expectation maximization algorithm to obtain independently and identically distributed samples approximately from the observed posterior distribution, which eliminates the convergence problems in the iterative Gibbs sampling and overcomes the difficulty in evaluating the standard deviance in the EM algorithm. The numeric results in simulations and application to the classical Engel data show that the non-iterative sampling algorithm is more effective than the Gibbs sampling and EM algorithm. 相似文献

12.

一种新的Boosting回归树方法

宋捷吴喜之《统计与信息论坛》2010,25(5):9-13

梯度Boosting思想在解释Boosting算法的运行机制时基于基学习器张成的空间为连续泛函空间,但是实际上在有限样本条件下形成的基学习器空间不一定是连续的。针对这一问题,从可加模型的角度出发,基于平方损失,提出一种重抽样提升回归树的新方法。该方法是一种加权的加法模型的逐步更新算法。实验结果表明,这种方法可以显著地提升一棵回归树的效果,减小预测误差,并且能得到比L2Boost算法更低的预测误差。相似文献

13.

Random gradient boosting for predicting conditional quantiles

《Journal of Statistical Computation and Simulation》2012,82(18):3716-3726

Gradient Boosting (GB) was introduced to address both classification and regression problems with great power. People have studied the boosting with L2 loss intensively both in theory and practice. However, the L2 loss is not proper for learning distributional functionals beyond the conditional mean such as conditional quantiles. There are huge amount of literatures studying conditional quantile prediction with various methods including machine learning techniques such like random forests and boosting. Simulation studies reveal that the weakness of random forests lies in predicting centre quantiles and that of GB lies in predicting extremes. Is there an algorithm that enjoys the advantages of both random forests and boosting so that it can perform well over all quantiles? In this article, we propose such a boosting algorithm called random GB which embraces the merits of both random forests and GB. Empirical results will be presented to support the superiority of this algorithm in predicting conditional quantiles. 相似文献

14.

A gradient search maximization algorithm for the asymmetric Laplace likelihood

《Journal of Statistical Computation and Simulation》2012,82(10):1919-1925

The asymmetric Laplace likelihood naturally arises in the estimation of conditional quantiles of a response variable given covariates. The estimation of its parameters entails unconstrained maximization of a concave and non-differentiable function over the real space. In this note, we describe a maximization algorithm based on the gradient of the log-likelihood that generates a finite sequence of parameter values along which the likelihood increases. The algorithm can be applied to the estimation of mixed-effects quantile regression, Laplace regression with censored data, and other models based on Laplace likelihood. In a simulation study and in a number of real-data applications, the proposed algorithm has shown notable computational speed. 相似文献

15.

Weighted bagging: a modification of AdaBoost from the perspective of importance sampling

Qingzhao Yu 《Journal of applied statistics》2011,38(3):451-463

We motivate the success of AdaBoost (ADA) in classification problems by appealing to an importance sampling perspective. Based on this insight, we propose the Weighted Bagging (WB) algorithm, a regularization method that naturally extends ADA to solve both classification and regression problems. WB uses a part of the available data to build models, and a separate part to modify the weights of observations. The method is used with categorical and regression tress and is compared with ADA, Boosting, Bagging, Random Forest and Support Vector Machine. We apply these methods to some real data sets and report some results of simulations. These applications and simulations show the effectiveness of WB. 相似文献

16.

Linear Quantile Regression Based on EM Algorithm

Yuzhu Tian Maozai Tian Qianqian Zhu 《统计学通讯:理论与方法》2014,43(16):3464-3484

This article aims to put forward a new method to solve the linear quantile regression problems based on EM algorithm using a location-scale mixture of the asymmetric Laplace error distribution. A closed form of the estimator of the unknown parameter vector β based on EM algorithm, is obtained. In addition, some simulations are conducted to illustrate the performance of the proposed method. Simulation results demonstrate that the proposed algorithm performs well. Finally, the classical Engel data is fitted and the Bootstrap confidence intervals for estimators are provided. 相似文献

17.

Genetic Algorithm in the Wavelet Domain for Large p Small n Regression

Eylem Deniz Howe Orietta Nicolis 《统计学通讯:模拟与计算》2015,44(5):1144-1157

Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature. 相似文献

18.

A New Regression Model: Modal Linear Regression

Weixin Yao Longhai Li 《Scandinavian Journal of Statistics》2014,41(3):656-671

The mode of a distribution provides an important summary of data and is often estimated on the basis of some non‐parametric kernel density estimator. This article develops a new data analysis tool called modal linear regression in order to explore high‐dimensional data. Modal linear regression models the conditional mode of a response Y given a set of predictors x as a linear function of x . Modal linear regression differs from standard linear regression in that standard linear regression models the conditional mean (as opposed to mode) of Y as a linear function of x . We propose an expectation–maximization algorithm in order to estimate the regression coefficients of modal linear regression. We also provide asymptotic properties for the proposed estimator without the symmetric assumption of the error density. Our empirical studies with simulated data and real data demonstrate that the proposed modal regression gives shorter predictive intervals than mean linear regression, median linear regression and MM‐estimators. 相似文献

19.

Semiparametric principal component poisson regression on clustered data

Kristina Celene M. Manalaysay 《统计学通讯:模拟与计算》2017,46(2):1546-1556

In modeling count data with multivariate predictors, we often encounter problems with clustering of observations and interdependency of predictors. We propose to use principal components of predictors to mitigate the multicollinearity problem and to abate information losses due to dimension reduction, a semiparametric link between the count dependent variable and the principal components is postulated. Clustering of observations is accounted into the model as a random component and the model is estimated via the backfitting algorithm. Simulation study illustrates the advantages of the proposed model over standard poisson regression in a wide range of scenarios. 相似文献

20.

A robust regression methodology via M-estimation

Tao Yang Colin M. Gallagher 《统计学通讯:理论与方法》2019,48(5):1092-1107

A robust regression methodology is proposed via M-estimation. The approach adapts to the tail behavior and skewness of the distribution of the random error terms, providing for a reliable analysis under a broad class of distributions. This is accomplished by allowing the objective function, used to determine the regression parameter estimates, to be selected in a data driven manner. The asymptotic properties of the proposed estimator are established and a numerical algorithm is provided to implement the methodology. The finite sample performance of the proposed approach is exhibited through simulation and the approach was used to analyze two motivating datasets. 相似文献