期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Variable Selection via Regression Trees in the Presence of Irrelevant Variables

Youngjae Chang 《统计学通讯:模拟与计算》2013,42(8):1703-1726

Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many irrelevant variables and the number of predictors exceeds the number of observations. We propose the multistep regression tree with adaptive variable selection to handle this problem. The variable selection step and the fitting step comprise the multistep method.

The multistep generalized unbiased interaction detection and estimation (GUIDE) with adaptive forward selection (fg) algorithm, as a variable selection tool, performs better than some of the well-known variable selection algorithms such as efficacy adaptive regression tube hunting (EARTH), FSR (false selection rate), LSCV (least squares cross-validation), and LASSO (least absolute shrinkage and selection operator) for the regression problem. The results based on simulation study show that fg outperforms other algorithms in terms of selection result and computation time. It generally selects the important variables correctly with relatively few irrelevant variables, which gives good prediction accuracy with less computation time. 相似文献

2.

高相关性辅助变量择优回归插补法

杨贵军蔡娟赵晓云《统计与信息论坛》2012,27(6):8-13

调查数据无回答在抽样调查中经常出现.无回答项目插补法是处理无回答的最主要方法之一,而辅助变量对提高插补值准确度非常重要.因此,研究调查数据无回答项目的高相关性辅助变量择优回归插补法,先筛选与目标变量间相关系数高的辅助变量,再建立回归插补模型.该方法的辅助变量选择过程简单,插补值准确性高.模拟例子演示了该方法的优良性. 相似文献

3.

Non-iterative Estimation and Variable Selection in the Single-index Quantile Regression Model

C. N. Kuruwita 《统计学通讯:模拟与计算》2016,45(10):3615-3628

A new estimation procedure is proposed for the single-index quantile regression model. Compared to existing work, this approach is non-iterative and hence, computationally efficient. The proposed method not only estimates the index parameter and the link function but also selects variables simultaneously. The performance of the variable selection is enhanced by a fully adaptive penalty function motivated by the sliced inverse regression technique. Finite sample performance is studied through a simulation study that compares the proposed method with existing work under several criteria. A data analysis is given that highlights the usefulness of the proposed methodology. 相似文献

4.

Subset Selection in Linear Regression using Sequentially Normalized Least Squares: Asymptotic Theory

Jussi Määttä Daniel F. Schmidt Teemu Roos 《Scandinavian Journal of Statistics》2016,43(2):382-395

This article examines the recently proposed sequentially normalized least squares criterion for the linear regression subset selection problem. A simplified formula for computation of the criterion is presented, and an expression for its asymptotic form is derived without the assumption of normally distributed errors. Asymptotic consistency is proved in two senses: (i) in the usual sense, where the sample size tends to infinity, and (ii) in a non‐standard sense, where the sample size is fixed and the noise variance tends to zero. 相似文献

5.

Regression with outlier shrinkage

Shifeng Xiong V. Roshan Joseph 《Journal of statistical planning and inference》2013

We propose a robust regression method called regression with outlier shrinkage (ROS) for the traditional n>p

n > p

cases. It improves over the other robust regression methods such as least trimmed squares (LTS) in the sense that it can achieve maximum breakdown value and full asymptotic efficiency simultaneously. Moreover, its computational complexity is no more than that of LTS. We also propose a sparse estimator, called sparse regression with outlier shrinkage (SROS), for robust variable selection and estimation. It is proven that SROS can not only give consistent selection but also estimate the nonzero coefficients with full asymptotic efficiency under the normal model. In addition, we introduce a concept of nearly regression equivariant estimator for understanding the breakdown properties of sparse estimators, and prove that SROS achieves the maximum breakdown value of nearly regression equivariant estimators. Numerical examples are presented to illustrate our methods. 相似文献

6.

Akaike Information Criterion for Selecting Variables in the Nested Error Regression Model

Tatsuya Kubokawa Muni S. Srivastava 《统计学通讯:理论与方法》2013,42(15):2626-2642

The Akaike Information Criterion (AIC) is developed for selecting the variables of the nested error regression model where an unobservable random effect is present. Using the idea of decomposing the likelihood into two parts of “within” and “between” analysis of variance, we derive the AIC when the number of groups is large and the ratio of the variances of the random effects and the random errors is an unknown parameter. The proposed AIC is compared, using simulation, with Mallows' C _p, Akaike's AIC, and Sugiura's exact AIC. Based on the rates of selecting the true model, it is shown that the proposed AIC performs better. 相似文献

7.

Variable Selection for Semiparametric Partially Linear Covariate-Adjusted Regression Models

Jiang Du Gaorong Li 《统计学通讯:理论与方法》2013,42(13):2809-2826

In this article, the partially linear covariate-adjusted regression models are considered, and the penalized least-squares procedure is proposed to simultaneously select variables and estimate the parametric components. The rate of convergence and the asymptotic normality of the resulting estimators are established under some regularization conditions. With the proper choices of the penalty functions and tuning parameters, it is shown that the proposed procedure can be as efficient as the oracle estimators. Some Monte Carlo simulation studies and a real data application are carried out to assess the finite sample performances for the proposed method. 相似文献

8.

A Simulation Study of Ridge Regression Estimators with Autocorrelated Errors

Luis L. Firinguetti 《统计学通讯:模拟与计算》2013,42(2):673-702

In this paper we run a large number of simulations to study the effects of collinearity and autocorrelated disturbances in the performance of several Ridge Regression estimators. The results suggest that with a fair amount of multicollinearity and of autocorrelation the Ridge Regression estimators which take the autocorrelation into account can perform better than the other methods. Also if the error term is only moderately autocorrelated; then the performance of the Ridge Regression estimators built upon ignoring the autocorrelation can outperform the other estimators. 相似文献

9.

A Semi-parametric Regression Model with Errors in Variables 总被引：4，自引：0，他引：4

Lixing Zhu Hengjian Cui 《Scandinavian Journal of Statistics》2003,30(2):429-442

Abstract. In this paper, we consider a partial linear regression model with measurement errors in possibly all the variables. We use a method of moments and deconvolution to construct a new class of parametric estimators together with a non-parametric kernel estimator. Strong convergence, optimal rate of weak convergence and asymptotic normality of the estimators are investigated. 相似文献

10.

Logistic Regression and Discriminant Analysis by Ordinary Least Squares

Gus W. Haggstrom 《商业与经济统计学杂志》2013,31(3):229-238

If the observations for fitting a polytomous logistic regression model satisfy certain normality assumptions, the maximum likelihood estimates of the regression coefficients are the discriminant function estimates. This article shows that these estimates, their unbiased counterparts, and associated test statistics for variable selection can be calculated using ordinary least squares regression techniques, thereby providing a convenient method for fitting logistic regression models in the normal case. Evidence is given indicating that the discriminant function estimates and test statistics merit wider use in nonnormal cases, especially in exploratory work on large data sets. 相似文献

11.

Selecting the best predictor variate

John S. Ramberg 《统计学通讯:理论与方法》2013,42(11):1133-1147

The Bechhofer indifference-zone approach is used to determine the sample size for selecting the best predictor variate from a set of k variates. A multivariate normal model is assumed and the best predictor variate is defined to be that variate for which the predictand has the smallest population conditional variance. Asymptotic distribution theory and probability bounds are employed to obtain sample-size approximations, which are compared with (numerical) exact results. 相似文献

12.

Confidence Sets Based on Thresholding Estimators in High-Dimensional Gaussian Regression Models 总被引：1，自引：1，他引：0

Ulrike Schneider 《Econometric Reviews》2016,35(8-10):1412-1455

We study confidence intervals based on hard-thresholding, soft-thresholding, and adaptive soft-thresholding in a linear regression model where the number of regressors k may depend on and diverge with sample size n. In addition to the case of known error variance, we define and study versions of the estimators when the error variance is unknown. In the known-variance case, we provide an exact analysis of the coverage properties of such intervals in finite samples. We show that these intervals are always larger than the standard interval based on the least-squares estimator. Asymptotically, the intervals based on the thresholding estimators are larger even by an order of magnitude when the estimators are tuned to perform consistent variable selection. For the unknown-variance case, we provide nontrivial lower bounds and a small numerical study for the coverage probabilities in finite samples. We also conduct an asymptotic analysis where the results from the known-variance case can be shown to carry over asymptotically if the number of degrees of freedom n ? k tends to infinity fast enough in relation to the thresholding parameter. 相似文献

13.

Prediction in Multilevel Logistic Regression

Karin Ayumi Tamura Viviana Giampaoli 《统计学通讯:模拟与计算》2013,42(6):1083-1096

The purpose of this article is to present a new method to predict the response variable of an observation in a new cluster for a multilevel logistic regression. The central idea is based on the empirical best estimator for the random effect. Two estimation methods for multilevel model are compared: penalized quasi-likelihood and Gauss–Hermite quadrature. The performance measures for the prediction of the probability for a new cluster observation of the multilevel logistic model in comparison with the usual logistic model are examined through simulations and an application. 相似文献

14.

Simplifying Regression Models Using Dimensional Analysis 总被引：1，自引：0，他引：1

V.A. Vignaux & J.L. Scott 《Australian & New Zealand Journal of Statistics》1999,41(1):31-41

Dimensional analysis can make a contribution to model formulation when some of the measurements in the problem are of physical factors. The analysis constructs a set of independent dimensionless factors that should be used as the variables of the regression in place of the original measurements. There are fewer of these than the originals and they often have a more appropriate interpretation. The technique is described briefly and its proposed role in regression discussed and illustrated with examples. We conclude that dimensional analysis can be effective in the preliminary stages of regression analysis whendeveloping formulations involving continuous variables with several dimensions. 相似文献

15.

A Variable Selection Criterion for Two Sets of Principal Component Scores in Principal Canonical Correlation Analysis

Toru Ogura Yasunori Fujikoshi Takakazu Sugiyama 《统计学通讯:理论与方法》2013,42(12):2118-2135

Canonical correlation analysis (CCA) is often used to analyze the correlation between two random vectors. However, sometimes interpretation of CCA results may be hard. In an attempt to address these difficulties, principal canonical correlation analysis (PCCA) was proposed. PCCA is CCA between two sets of principal component (PC) scores. We consider the problem of selecting useful PC scores in CCA. A variable selection criterion for one set of PC scores has been proposed by Ogura (2010), here, we propose a variable selection criterion for two sets of PC scores in PCCA. Furthermore, we demonstrate the effectiveness of this criterion. 相似文献

16.

A New Regression Model: Modal Linear Regression

Weixin Yao Longhai Li 《Scandinavian Journal of Statistics》2014,41(3):656-671

The mode of a distribution provides an important summary of data and is often estimated on the basis of some non‐parametric kernel density estimator. This article develops a new data analysis tool called modal linear regression in order to explore high‐dimensional data. Modal linear regression models the conditional mode of a response Y given a set of predictors x as a linear function of x . Modal linear regression differs from standard linear regression in that standard linear regression models the conditional mean (as opposed to mode) of Y as a linear function of x . We propose an expectation–maximization algorithm in order to estimate the regression coefficients of modal linear regression. We also provide asymptotic properties for the proposed estimator without the symmetric assumption of the error density. Our empirical studies with simulated data and real data demonstrate that the proposed modal regression gives shorter predictive intervals than mean linear regression, median linear regression and MM‐estimators. 相似文献

17.

A Simulation Study of Some Ridge Regression Estimators under Different Distributional Assumptions

Kristofer Månsson Ghazi Shukur 《统计学通讯:模拟与计算》2013,42(8):1639-1670

Based on the work of Khalaf and Shukur (2005 Khalaf , G. , Shukur , G. ( 2005 ). Choosing ridge parameters for regression problems . Communications in Statistics – Theory and Methods 34 : 1177 – 1182 .[Taylor & Francis Online], [Web of Science ®] , [Google Scholar]), Alkhamisi et al. (2006 Alkhamisi , M. , Khalaf , G. , Shukur , G. ( 2006 ). Some modifications for choosing ridge parameters . Communications in Statistics – Theory and Methods 35 : 2005 – 2020 .[Taylor & Francis Online], [Web of Science ®] , [Google Scholar]), and Muniz et al. (2010 Muniz , G. , Kibria , B. M. G. , Shukur , G. ( 2010 ). On developing ridge regression parameters: a graphical Investigation. Submitted for Publication . [Google Scholar]), this article considers several estimators for estimating the ridge parameter k. This article differs from aforementioned articles in three ways: (1) Data are generated from Normal, Student's t, and F distributions with appropriate degrees of freedom; (2) The number of regressors considered are from 4–12 instead of 2–4, which are the usual practice; (3) Both mean square error (MSE) and prediction sum of square (PRESS) are considered as the performance criterion. A simulation study has been conducted to compare the performance of the estimators. Based on the simulation study we found that, increasing the correlation between the independent variables has negative effect on the MSE and PRESS. However, increasing the number of regressors has positive effect on MSE and PRESS. When the sample size increases the MSE decreases even when the correlation between the independent variables is large. It is interesting to note that the dominance pictures of the estimators are remained the same under both the MSE and PRESS criterion. However, the performance of the estimators depends on the choice of the assumption of the error distribution of the regression model. 相似文献

18.

A bayesian predictive approach to the selection of variables in multiple regression

Ramona L. Trader 《统计学通讯:理论与方法》2013,42(13):1553-1557

The squared error loss function applied to Bayesian predictive distributions is investigated as a variable selection criterion in linear regression equations. It is illustrated that “cost-free” variables may be eliminated if they are poor predictors. Regression models where the predictors are fixed and where they are stochastic are both considered. An empirical examination of the criterion and a comparison with other techniques are presented. 相似文献

19.

Subset selection in multiple linear regression in the presence of outlier and multicollinearity

《Statistical Methodology》2014

Various subset selection methods are based on the least squares parameter estimation method. The performance of these methods is not reasonably well in the presence of outlier or multicollinearity or both. Few subset selection methods based on the

M

-estimator are available in the literature for outlier data. Very few subset selection methods account the problem of multicollinearity with ridge regression estimator.In this article, we develop a generalized version of

S_{p}

statistic based on the jackknifed ridge

M

-estimator for subset selection in the presence of outlier and multicollinearity. We establish the equivalence of this statistic with the existing

C_{p}

,

S_{p}

and

R_{p}

statistics. The performance of the proposed method is illustrated through some numerical examples and the correct model selection ability is evaluated using simulation study. 相似文献

20.

Ridge Regression: Degrees of Freedom in the Analysis of Variance

Arthur E. Hoerl Robert W. Kennard 《统计学通讯:模拟与计算》2013,42(4):1485-1495

It appears to be common practice with ridge regression to obtain a decomposition of the total sum of squares, and assign degrees of freedom, according to established least squares theory. This discussion notes the obvious fallacies of such an approach, and introduces a decomposition based on orthogonality, and degrees of freedom based on expected mean squares, for non-stochastic k. 相似文献