首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 447 毫秒
1.
In this paper, we propose a Bayesian variable selection method for linear regression models with high-order interactions. Our method automatically enforces the heredity constraint, that is, a higher order interaction term can exist in the model only if both of its parent terms are in the model. Based on the stochastic search variable selection George and McCulloch (1993), we propose a novel hierarchical prior that fully considers the heredity constraint and controls the degree of sparsity simultaneously. We develop a Markov chain Monte Carlo (MCMC) algorithm to explore the model space efficiently while accounting for the heredity constraint by modifying the shotgun stochastic search algorithm Hans et al. (2007). The performance of the new model is demonstrated through comparisons with other methods. Numerical studies on both real data analysis and simulations show that our new method tends to find relevant variable more effectively when higher order interaction terms are considered.  相似文献   

2.
The mixture transition distribution (MTD) model was introduced by Raftery to face the need for parsimony in the modeling of high-order Markov chains in discrete time. The particularity of this model comes from the fact that the effect of each lag upon the present is considered separately and additively, so that the number of parameters required is drastically reduced. However, the efficiency for the MTD parameter estimations proposed up to date still remains problematic on account of the large number of constraints on the parameters. In this article, an iterative procedure, commonly known as expectation–maximization (EM) algorithm, is developed cooperating with the principle of maximum likelihood estimation (MLE) to estimate the MTD parameters. Some applications of modeling MTD show the proposed EM algorithm is easier to be used than the algorithm developed by Berchtold. Moreover, the EM estimations of parameters for high-order MTD models led on DNA sequences outperform the corresponding fully parametrized Markov chain in terms of Bayesian information criterion. A software implementation of our algorithm is available in the library seq++at http://stat.genopole.cnrs.fr/seqpp.  相似文献   

3.
It is common in regression discontinuity analysis to control for third, fourth, or higher-degree polynomials of the forcing variable. There appears to be a perception that such methods are theoretically justified, even though they can lead to evidently nonsensical results. We argue that controlling for global high-order polynomials in regression discontinuity analysis is a flawed approach with three major problems: it leads to noisy estimates, sensitivity to the degree of the polynomial, and poor coverage of confidence intervals. We recommend researchers instead use estimators based on local linear or quadratic polynomials or other smooth functions.  相似文献   

4.
A seemingly unrelated regression (SUR) model is defined by a system of linear regression equations in which the disturbances are contemporaneously correlated across equations. However, the disturbances can also be serially correlated in each equation of the system. In these cases, estimating SUR becomes more complicated. Some methods have been considered estimating SUR with low-order autoregressive (AR) disturbances. In this article, SUR with high-order AR disturbances are considered and a tapering approach is examined under this situation. Two modified methods for estimating SUR are obtained by using this approach. A comprehensive Monte Carlo simulation study is performed in order to compare small-sample efficiencies of the modified methods with the others given in the literature.  相似文献   

5.
通过蒙特卡罗模拟技术揭示各种HAC法在平稳过程伪回归中的适用性.研究发现,与核权函数HAC相比,预白化HAC法具有明显的优势;进一步的研究表明相对于被解释变量的持久性,解释变量的持久性对HAC的影响较大;当数据过程是高阶自回归过程时,在样本容量不是很大的情况下预白化方法的拒绝率会随着阶数增加而增大,只有在样本容量较大和BIC信息准则情况下预白化HAC的拒绝率才接近检验水平.  相似文献   

6.
Interaction is very common in reality, but has received little attention in logistic regression literature. This is especially true for higher-order interactions. In conventional logistic regression, interactions are typically ignored. We propose a model selection procedure by implementing an association rules analysis. We do this by (1) exploring the combinations of input variables which have significant impacts to response (via association rules analysis); (2) selecting the potential (low- and high-order) interactions; (3) converting these potential interactions into new dummy variables; and (4) performing variable selections among all the input variables and the newly created dummy variables (interactions) to build up the optimal logistic regression model. Our model selection procedure establishes the optimal combination of main effects and potential interactions. The comparisons are made through thorough simulations. It is shown that the proposed method outperforms the existing methods in all cases. A real-life example is discussed in detail to demonstrate the proposed method.  相似文献   

7.
The least absolute shrinkage and selection operator (lasso) has been widely used in regression analysis. Based on the piecewise linear property of the solution path, least angle regression provides an efficient algorithm for computing the solution paths of lasso. Group lasso is an important generalization of lasso that can be applied to regression with grouped variables. However, the solution path of group lasso is not piecewise linear and hence cannot be obtained by least angle regression. By transforming the problem into a system of differential equations, we develop an algorithm for efficient computation of group lasso solution paths. Simulation studies are conducted for comparing the proposed algorithm to the best existing algorithm: the groupwise-majorization-descent algorithm.  相似文献   

8.
We propose a simple algorithm for estimating LAD (least-sum of absolute deviations) regression coefficients. The algorithm finds the best subset points to use to obtain the global optimizer in the series of a backtracking process by narrowing down the search space. Hence, the computational complexity reduces to a practical level. To evaluate the performance of the algorithm for LAD regression, we conducted simulation studies. The results showed that the algorithm is very competitive in terms of computation times required to get the same or similar results in comparison with other methods even when heavy censoring is given. Moreover, because the algorithm is quite simple and easy to understand, it can be implemented easily for various objective functions that are the variations of LAD.  相似文献   

9.
An innovative algorithm is developed for obtaining spreadsheet regression measures used in computing out-of-sample statistics. This algorithm alleviates the leave-one-out computational simulation complexity and memory size problems perceived in computing these statistics. Hence, the purpose of this article is to describe a computationally enhanced algorithm that gives spreadsheet users advanced regression capabilities thereby adding a new dimension to spreadsheet regression operations. These statistics include diagonals of the hat matrix, legitimate forecasting intervals, and PRESS residuals. These computational innovations promote learning while eliminating spreadsheet inadequacies thereby making spreadsheet regression attractive to academicians in teaching and practitioners in acquiring further application competence.  相似文献   

10.
ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications.  相似文献   

11.
Q. F. Xu  C. Cai  X. Huang 《Statistics》2019,53(1):26-42
In recent decades, quantile regression has received much more attention from academics and practitioners. However, most of existing computational algorithms are only effective for small or moderate size problems. They cannot solve quantile regression with large-scale data reliably and efficiently. To this end, we propose a new algorithm to implement quantile regression on large-scale data using the sparse exponential transform (SET) method. This algorithm mainly constructs a well-conditioned basis and a sampling matrix to reduce the number of observations. It then solves a quantile regression problem on this reduced matrix and obtains an approximate solution. Through simulation studies and empirical analysis of a 5% sample of the US 2000 Census data, we demonstrate efficiency of the SET-based algorithm. Numerical results indicate that our new algorithm is effective in terms of computation time and performs well for large-scale quantile regression.  相似文献   

12.

Sufficient dimension reduction (SDR) provides a framework for reducing the predictor space dimension in statistical regression problems. We consider SDR in the context of dimension reduction for deterministic functions of several variables such as those arising in computer experiments. In this context, SDR can reveal low-dimensional ridge structure in functions. Two algorithms for SDR—sliced inverse regression (SIR) and sliced average variance estimation (SAVE)—approximate matrices of integrals using a sliced mapping of the response. We interpret this sliced approach as a Riemann sum approximation of the particular integrals arising in each algorithm. We employ the well-known tools from numerical analysis—namely, multivariate numerical integration and orthogonal polynomials—to produce new algorithms that improve upon the Riemann sum-based numerical integration in SIR and SAVE. We call the new algorithms Lanczos–Stieltjes inverse regression (LSIR) and Lanczos–Stieltjes average variance estimation (LSAVE) due to their connection with Stieltjes’ method—and Lanczos’ related discretization—for generating a sequence of polynomials that are orthogonal with respect to a given measure. We show that this approach approximates the desired integrals, and we study the behavior of LSIR and LSAVE with two numerical examples. The quadrature-based LSIR and LSAVE eliminate the first-order algebraic convergence rate bottleneck resulting from the Riemann sum approximation, thus enabling high-order numerical approximations of the integrals when appropriate. Moreover, LSIR and LSAVE perform as well as the best-case SIR and SAVE implementations (e.g., adaptive partitioning of the response space) when low-order numerical integration methods (e.g., simple Monte Carlo) are used.

  相似文献   

13.
This paper studies a fast computational algorithm for variable selection on high-dimensional recurrent event data. Based on the lasso penalized partial likelihood function for the response process of recurrent event data, a coordinate descent algorithm is used to accelerate the estimation of regression coefficients. This algorithm is capable of selecting important predictors for underdetermined problems where the number of predictors far exceeds the number of cases. The selection strength is controlled by a tuning constant that is determined by a generalized cross-validation method. Our numerical experiments on simulated and real data demonstrate the good performance of penalized regression in model building for recurrent event data in high-dimensional settings.  相似文献   

14.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

15.
对非参数异方差模型中回归函数的EM算法进行研究,并基于EM算法得到了条件回归函数的估计。此外,通过对农村居民食品消费支出与纯收入关系的实证分析,说明了基于EM算法的估计方法比最小二乘估计方法的拟合效果更好,并对恩格尔系数进行了拟合,分析了其变化走势。  相似文献   

16.
Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.  相似文献   

17.
Abstract

Handling data with the nonignorably missing mechanism is still a challenging problem in statistics. In this paper, we develop a fully Bayesian adaptive Lasso approach for quantile regression models with nonignorably missing response data, where the nonignorable missingness mechanism is specified by a logistic regression model. The proposed method extends the Bayesian Lasso by allowing different penalization parameters for different regression coefficients. Furthermore, a hybrid algorithm that combined the Gibbs sampler and Metropolis-Hastings algorithm is implemented to simulate the parameters from posterior distributions, mainly including regression coefficients, shrinkage coefficients, parameters in the non-ignorable missing models. Finally, some simulation studies and a real example are used to illustrate the proposed methodology.  相似文献   

18.
An efficient optimization algorithm for identifying the best least squares regression model under the condition of non-negative coefficients is proposed. The algorithm exposits an innovative solution via the unrestricted least squares and is based on the regression tree and branch-and-bound techniques for computing the best subset regression. The aim is to filling a gap in computationally tractable solutions to the non-negative least squares problem and model selection. The proposed method is illustrated with a real dataset. Experimental results on real and artificial random datasets confirm the computational efficacy of the new strategy and demonstrates its ability to solve large model selection problems that are subject to non-negativity constrains.  相似文献   

19.
A new technique is devised to mitigate the errors-in-variables bias in linear regression. The procedure mimics a 2-stage least squares procedure where an auxiliary regression which generates a better behaved predictor variable is derived. The generated variable is then used as a substitute for the error-prone variable in the first-stage model. The performance of the algorithm is tested by simulation and regression analyses. Simulations suggest the algorithm efficiently captures the additive error term used to contaminate the artificial variables. Regressions provide further credit to the simulations as they clearly show that the compact genetic algorithm-based estimate of the true but unobserved regressor yields considerably better results. These conclusions are robust across different sample sizes and different variance structures imposed on both the measurement error and regression disturbances.  相似文献   

20.
In this paper, we propose a lower bound based smoothed quasi-Newton algorithm for computing the solution paths of the group bridge estimator in linear regression models. Our method is based on the quasi-Newton algorithm with a smoothed group bridge penalty in combination with a novel data-driven thresholding rule for the regression coefficients. This rule is derived based on a necessary KKT condition of the group bridge optimization problem. It is easy to implement and can be used to eliminate groups with zero coefficients. Thus, it reduces the dimension of the optimization problem. The proposed algorithm removes the restriction of groupwise orthogonal condition needed in coordinate descent and LARS algorithms for group variable selection. Numerical results show that the proposed algorithm outperforms the coordinate descent based algorithms in both efficiency and accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号