期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Variable selection in additive quantile regression using nonconcave penalty

Kaifeng Zhao 《Statistics》2016,50(6):1276-1289

This paper considers variable selection in additive quantile regression based on group smoothly clipped absolute deviation (gSCAD) penalty. Although shrinkage variable selection in additive models with least-squares loss has been well studied, quantile regression is sufficiently different from mean regression to deserve a separate treatment. It is shown that the gSCAD estimator can correctly identify the significant components and at the same time maintain the usual convergence rates in estimation. Simulation studies are used to illustrate our method. 相似文献

2.

Evaluating modified generalized information criterion in presence of multicollinearity

Ali Hussein Al-Marshadi Abdullah Hamoud Alharby 《统计学通讯:模拟与计算》2017,46(8):6298-6307

When there are many explanatory variables in the regression model, there is a chance that some of these are intercorrelated. This is where the problem of multicollinearity creeps in due to which precision and accuracy of the coefficients is marred, and the quest to find the best model becomes tedious. To tackle such a situation, Model selection criteria are applied for selecting the best model that fits the data. Current study focuses on the evaluation of the four unmodified and four modified versions of generalized information criteria—Akaike Information Criterion, Schwarz's Bayes Information Criteria, Hannan-Quinn Information Criterion, and Akaike Information Criterion corrected for small samples. A simulation study using SAS software was carried out in order to compare the unmodified and modified versions of the generalized information criteria and to discover the best version amongst the four modified model selection criteria, for identifying the best model, when the collinearity assumption is violated. For the proposed simulation, two samples of size 50 and 100, for three explanatory variables X₁, X₂, and X₃, are drawn from Normal distribution. Two situations of collinearity violations between X₁ and X₂ are looked into, first when ρ = 0.6 and second when ρ = 0.8. The outcomes of the simulations are displayed in the tables along with visual representations. The results revealed that modified versions of the generalized information criteria are more sensitive in identifying models marred with high multicollinearity as compared to the unmodified generalized information criteria. 相似文献

3.

Model selection with data-oriented penalty

《Journal of statistical planning and inference》1999,77(1):103-117

We consider the problem of model (or variable) selection in the classical regression model using the GIC (general information criterion). In this method the maximum likelihood is used with a penalty function denoted by C_n, depending on the sample size n and chosen to ensure consistency in the selection of the true model. There are various choices of C_n suggested in the literature on model selection. In this paper we show that a particular choice of C_n based on observed data, which makes it random, preserves the consistency property and provides improved performance over a fixed choice of C_n. 相似文献

4.

Missing regressor values under conditions of multicollinearity

R. Carter Hill Rod F. Ziemer 《统计学通讯:理论与方法》2013,42(22):2557-2573

The performance of some common procedures for replacing missing regressor values is investigated under varying conditions of multicollinearity" Analytical and numerical results indicate that the popular method of substituting sample means for missing values appears preferrable to other more sophisticated procedures given ill-conditioned designs In addition, results indicated that incomplete sample observations should not be thrown away under conditions of extreme multicollinearity. 相似文献

5.

Model selection with distributed SCAD penalty

Puyu Wang Yong Liang 《Journal of applied statistics》2018,45(11):1938-1955

In this paper, we focus on the feature extraction and variable selection of massive data which is divided and stored in different linked computers. Specifically, we study the distributed model selection with the Smoothly Clipped Absolute Deviation (SCAD) penalty. Based on the Alternating Direction Method of Multipliers (ADMM) algorithm, we propose distributed SCAD algorithm and prove its convergence. The results of variable selection of the distributed approach are same with the results of the non-distributed approach. Numerical studies show that our method is both effective and efficient which performs well in distributed data analysis. 相似文献

6.

Variable selection for model-based clustering using the integrated complete-data likelihood

Matthieu Marbac Mohammed Sedki 《Statistics and Computing》2017,27(4):1049-1063

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often computationally expensive because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require the maximum likelihood estimate and its maximization appears to be simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumed. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection. The proposed approach is implemented in the R package VarSelLCM available on CRAN. 相似文献

7.

A modified ridge m-estimator for linear regression model with multicollinearity and outliers

Hasan Ertaş 《统计学通讯:模拟与计算》2018,47(4):1240-1250

The ordinary least-square estimators for linear regression analysis with multicollinearity and outliers lead to unfavorable results. In this article, we propose a new robust modified ridge M-estimator (MRME) based on M-estimator (ME) to deal with the combined problem resulting from multicollinearity and outliers in the y-direction. MRME outperforms modified ridge estimator, robust ridge estimator and ME, according to mean squares error criterion. Furthermore, a numerical example and a Monte Carlo simulation experiment are given to illustrate some of the theoretical results. 相似文献

8.

Variable selection for semiparametric proportional hazards model under progressive Type-II censoring

Xuejing Zhao Jinxia Su 《统计学通讯:模拟与计算》2017,46(6):4367-4376

Variable selection is an effective methodology for dealing with models with numerous covariates. We consider the methods of variable selection for semiparametric Cox proportional hazards model under the progressive Type-II censoring scheme. The Cox proportional hazards model is used to model the influence coefficients of the environmental covariates. By applying Breslow’s “least information” idea, we obtain a profile likelihood function to estimate the coefficients. Lasso-type penalized profile likelihood estimation as well as stepwise variable selection method are explored as means to find the important covariates. Numerical simulations are conducted and Veteran’s Administration Lung Cancer data are exploited to evaluate the performance of the proposed method. 相似文献

9.

Subset selection in multiple linear regression in the presence of outlier and multicollinearity

《Statistical Methodology》2014

Various subset selection methods are based on the least squares parameter estimation method. The performance of these methods is not reasonably well in the presence of outlier or multicollinearity or both. Few subset selection methods based on the

M

-estimator are available in the literature for outlier data. Very few subset selection methods account the problem of multicollinearity with ridge regression estimator.In this article, we develop a generalized version of

S_{p}

statistic based on the jackknifed ridge

M

-estimator for subset selection in the presence of outlier and multicollinearity. We establish the equivalence of this statistic with the existing

C_{p}

,

S_{p}

and

R_{p}

statistics. The performance of the proposed method is illustrated through some numerical examples and the correct model selection ability is evaluated using simulation study. 相似文献

10.

An m-estimation-based model selection criterion with a data-oriented penalty

《Journal of Statistical Computation and Simulation》2012,82(1):71-87

In Wu and Zen (1999), a linear model selection procedure based on M-estimation is proposed, which includes many classical model selection criteria as its special cases, and it is shown that the selection procedure is strongly consistent for a variety of penalty functions. In this paper, we will investigate its small sample performances for some choices of fixed penalty functions. It can be seen that the performance varies with the choice of the penalty. Hence, a randomized penalty based on observed data is proposed, which preserves the consistency property and provides improved performance over a fixed choice of penalty functions. 相似文献

11.

Variable selection in finite mixture of regression models using the skew-normal distribution

Junhui Yin Liucang Wu Lin Dai 《Journal of applied statistics》2020,47(16):2941

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with asymmetric behavior. In this paper, we introduce a variable selection procedure for FMR models using the skew-normal distribution. With appropriate choice of the tuning parameters, we establish the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. To estimate the parameters of the model, a modified EM algorithm for numerical computations is developed. The methodology is illustrated through numerical experiments and a real data example. 相似文献

12.

Variable selection for mode regression

Yingzhen Chen Xuejun Ma 《Journal of applied statistics》2018,45(6):1077-1084

From the prediction viewpoint, mode regression is more attractive since it pay attention to the most probable value of response variable given regressors. On the other hand, high-dimensional data are very prevalent as the advance of the technology of collecting and storing data. Variable selection is an important strategy to deal with high-dimensional regression problem. This paper aims to propose a variable selection procedure for high-dimensional mode regression via combining nonparametric kernel estimation method with sparsity penalty tactics. We also establish the asymptotic properties under certain technical conditions. The effectiveness and flexibility of the proposed methods are further illustrated by numerical studies and the real data application. 相似文献

13.

Variable selection in expectile regression

Jun Zhao 《统计学通讯:理论与方法》2018,47(7):1731-1746

相似文献

14.

Generalized autoregressive and moving average models: multicollinearity,interpretation and a new modified model

Orlando Yesid Esparza Albarracin Airlane Pereira Alencar Linda Lee Ho 《Journal of Statistical Computation and Simulation》2019,89(10):1819-1840

In this paper, we call attention of two observed features in practical applications of the Generalized Autoregressive Moving Average (GARMA) model due to the structure of its linear predictor. One is the multicollinearity which may lead to a non-convergence of the maximum likelihood, using iteratively reweighted least squares, and the inflation of the estimator's variance. The second is that the inclusion of the same lagged observations into the autoregressive and moving average components confounds the interpretation of the parameters. A modified model, GAR-M, is presented to reduce the multicollinearity and to improve the interpretation of the parameters. The expectation and variance under stationarity conditions are presented for the identity and logarithm link function. In a general sense, simulation studies show that the maximum likelihood estimators based on the GARMA and GAR-M models are equivalent but the GAR-M estimators presented a little lower standard errors and some restrictions in the parametric space are imposed to guarantee the stationarity of the process. Also, a real data analysis illustrates the GAR-M fit for daily hospitalization rates of elderly people due to respiratory diseases from October 2012 to April 2015 in São Paulo city, Brazil. 相似文献

15.

Variable selection via a multi-stage strategy

Jing Chang Herbert K.H. Lee 《Journal of applied statistics》2015,42(4):762-774

Variable selection for nonlinear regression is a complex problem, made even more difficult when there are a large number of potential covariates and a limited number of datapoints. We propose herein a multi-stage method that combines state-of-the-art techniques at each stage to best discover the relevant variables. At the first stage, an extension of the Bayesian Additive Regression tree is adopted to reduce the total number of variables to around 30. At the second stage, sensitivity analysis in the treed Gaussian process is adopted to further reduce the total number of variables. Two stopping rules are designed and sequential design is adopted to make best use of previous information. We demonstrate our approach on two simulated examples and one real data set. 相似文献

16.

Variable selection for functional density trees

Shu-Fu Kuo 《Journal of applied statistics》2012,39(7):1387-1395

In this paper, the exhaustive search principle used in functional trees for classifying densities is shown to select variables with more split points. A new variable selection scheme is proposed to correct this bias. The Pearson chi-squared tests for associated two-way contingency tables are used to select the variables. Through simulation, we show that the new method can control bias and is more powerful in selecting split variable. 相似文献

17.

Block thresholding wavelet regression using SCAD penalty

Cheolwoo Park 《Journal of statistical planning and inference》2010

This paper concerns wavelet regression using a block thresholding procedure. Block thresholding methods utilize neighboring wavelet coefficients information to increase estimation accuracy. We propose to construct a data-driven block thresholding procedure using the smoothly clipped absolute deviation (SCAD) penalty. A simulation study demonstrates competitive finite sample performance of the proposed estimator compared to existing methods. We also show that the proposed estimator achieves optimal convergence rates in Besov spaces. 相似文献

18.

Variable selection of linear programming discriminant estimator

Dong Xia 《统计学通讯:理论与方法》2017,46(7):3321-3341

相似文献

19.

Variable selection for discrimination among several populations

Sisir Kumar Samanta Shoutir Kishore Chatterjee 《统计学通讯:理论与方法》2013,42(11):2565-2582

相似文献

20.

Variable selection in heteroscedastic single-index quantile regression

Eliana Christou Michael G. Akritas 《统计学通讯:理论与方法》2018,47(24):6019-6033

We propose a new algorithm for simultaneous variable selection and parameter estimation for the single-index quantile regression (SIQR) model . The proposed algorithm, which is non iterative , consists of two steps. Step 1 performs an initial variable selection method. Step 2 uses the results of Step 1 to obtain better estimation of the conditional quantiles and , using them, to perform simultaneous variable selection and estimation of the parametric component of the SIQR model. It is shown that the initial variable selection method consistently estimates the relevant variables , and the estimated parametric component derived in Step 2 satisfies the oracle property. 相似文献