首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
In this article, a robust variable selection procedure based on the weighted composite quantile regression (WCQR) is proposed. Compared with the composite quantile regression (CQR), WCQR is robust to heavy-tailed errors and outliers in the explanatory variables. For the choice of the weights in the WCQR, we employ a weighting scheme based on the principal component method. To select variables with grouping effect, we consider WCQR with SCAD-L2 penalization. Furthermore, under some suitable assumptions, the theoretical properties, including the consistency and oracle property of the estimator, are established with a diverging number of parameters. In addition, we study the numerical performance of the proposed method in the case of ultrahigh-dimensional data. Simulation studies and real examples are provided to demonstrate the superiority of our method over the CQR method when there are outliers in the explanatory variables and/or the random error is from a heavy-tailed distribution.  相似文献   

2.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

3.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

4.
When the data contain outliers or come from population with heavy-tailed distributions, which appear very often in spatiotemporal data, the estimation methods based on least-squares (L2) method will not perform well. More robust estimation methods are required. In this article, we propose the local linear estimation for spatiotemporal models based on least absolute deviation (L1) and drive the asymptotic distributions of the L1-estimators under some mild conditions imposed on the spatiotemporal process. The simulation results for two examples, with outliers and heavy-tailed distribution, respectively, show that the L1-estimators perform better than the L2-estimators.  相似文献   

5.
The resistance of least absolute values (L1) estimators to outliers and their robustness to heavy-tailed distributions make these estimators useful alternatives to the usual least squares estimators. The recent development of efficient algorithms for L1 estimation in linear models has permitted their use in practical data analysis. Although in general the L1 estimators are not unique, there are a number of properties they all share. The set of all L1 estimators for a given model and data set can be characterized as the convex hull of some extreme estimators. Properties of the extreme estimators and of the L1-estimate set are considered.  相似文献   

6.
In healthcare studies, count data sets measured with covariates often exhibit heterogeneity and contain extreme values. To analyse such count data sets, we use a finite mixture of regression model framework and investigate a robust estimation approach, called the L2E [D.W. Scott, On fitting and adapting of density estimates, Comput. Sci. Stat. 30 (1998), pp. 124–133], to estimate the parameters. The L2E is based on an integrated L2 distance between parametric conditional and true conditional mass functions. In addition to studying the theoretical properties of the L2E estimator, we compare the performance of L2E with the maximum likelihood (ML) estimator and a minimum Hellinger distance (MHD) estimator via Monte Carlo simulations for correctly specified and gross-error contaminated mixture of Poisson regression models. These show that the L2E is a viable robust alternative to the ML and MHD estimators. More importantly, we use the L2E to perform a comprehensive analysis of a Western Australia hospital inpatient obstetrical length of stay (LOS) (in days) data that contains extreme values. It is shown that the L2E provides a two-component Poisson mixture regression fit to the LOS data which is better than those based on the ML and MHD estimators. The L2E fit identifies admission type as a significant covariate that profiles the predominant subpopulation of normal-stayers as planned patients and the small subpopulation of long-stayers as emergency patients.  相似文献   

7.
We developed robust estimators that minimize a weighted L1 norm for the first-order bifurcating autoregressive model. When all of the weights are fixed, our estimate is an L1 estimate that is robust against outlying points in the response space and more efficient than the least squares estimate for heavy-tailed error distributions. When the weights are random and depend on the points in the factor space, the weighted L1 estimate is robust against outlying points in the factor space. Simulated and artificial examples are presented. The behavior of the proposed estimate is modeled through a Monte Carlo study.  相似文献   

8.
The robust principal components analysis (RPCA) introduced by Campbell (Applied Statistics 1980, 29, 231–237) provides in addition to robust versions of the usual output of a principal components analysis, weights for the contribution of each point to the robust estimation of each component. Low weights may thus be used to indicate outliers. The present simulation study provides critical values for testing the kth smallest weight in the RPCA of a sample of n p-dimensional vectors, under the null hypothesis of a multivariate normal distribution. The cases p=2(2)10, 15, 20 for n=20, 30, 40, 50, 75, 100 subject to n≥p/2, are examined, with k≤√n.  相似文献   

9.
In this article, we introduce restricted principal components regression (RPCR) estimator by combining the approaches followed in obtaining the restricted least squares estimator and the principal components regression estimator. The performance of the RPCR estimator with respect to the matrix and the generalized mean square error are examined. We also suggest a testing procedure for linear restrictions in principal components regression by using singly and doubly non-central F distribution.  相似文献   

10.
This work studies outlier detection and robust estimation with data that are naturally distributed into groups and which follow approximately a linear regression model with fixed group effects. For this, several methods are considered. First, the robust fitting method of Peña and Yohai [A fast procedure for outlier diagnostics in large regression problems. J Am Stat Assoc. 1999;94:434–445], called principal sensitivity components (PSC) method, is adapted to the grouped data structure and the mentioned model. The robust methods RDL1 of Hubert and Rousseeuw [Robust regression with both continuous and binary regressors. J Stat Plan Inference. 1997;57:153–163] and M-S of Maronna and Yohai [Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference 2000;89:197–214] are also considered. These three methods are compared in terms of their effectiveness in outlier detection and their robustness through simulations, considering several contamination scenarios and growing contamination levels. Results indicate that the adapted PSC procedure is able to detect a high percentage of true outliers and a small number of false outliers. It is appropriate when the contamination is in the error term or in the covariates, detecting also possibly masked high leverage points. Moreover, in simulations the final robust regression estimator preserved good efficiency under Normality while keeping good robustness properties.  相似文献   

11.
The effect of nonstationarity in time series columns of input data in principal components analysis is examined. Nonstationarity are very common among economic indicators collected over time. They are subsequently summarized into fewer indices for purposes of monitoring. Due to the simultaneous drifting of the nonstationary time series usually caused by the trend, the first component averages all the variables without necessarily reducing dimensionality. Sparse principal components analysis can be used, but attainment of sparsity among the loadings (hence, dimension-reduction is achieved) is influenced by the choice of parameter(s) (λ 1,i ). Simulated data with more variables than the number of observations and with different patterns of cross-correlations and autocorrelations were used to illustrate the advantages of sparse principal components analysis over ordinary principal components analysis. Sparse component loadings for nonstationary time series data can be achieved provided that appropriate values of λ 1,j are used. We provide the range of values of λ 1,j that will ensure convergence of the sparse principal components algorithm and consequently achieve sparsity of component loadings.  相似文献   

12.
A number of efficient computer codes are available for the simple linear L 1 regression problem. However, a number of these codes can be made more efficient by utilizing the least squares solution. In fact, a couple of available computer programs already do so.

We report the results of a computational study comparing several openly available computer programs for solving the simple linear L 1 regression problem with and without computing and utilizing a least squares solution.  相似文献   

13.
In regression analysis, to deal with the problem of multicollinearity, the restricted principal components regression estimator is proposed. In this paper, we compared the restricted principal components regression estimator, the principal components regression estimator, and the ordinary least-squares estimator with each other under the Pitman's closeness criterion. We showed that the restricted principal components regression estimator is always superior to the principal components regression estimator, under certain conditions the restricted principal components regression estimator is superior to the ordinary least-squares estimator under the Pitman's closeness criterion and under certain conditions the principal components regression estimator is superior to the ordinary least-squares estimator under the Pitman's closeness criterion.  相似文献   

14.
As is the case of many studies, the data collected are limited and an exact value is recorded only if it falls within an interval range. Hence, the responses can be either left, interval or right censored. Linear (and nonlinear) regression models are routinely used to analyze these types of data and are based on normality assumptions for the errors terms. However, those analyzes might not provide robust inference when the normality assumptions are questionable. In this article, we develop a Bayesian framework for censored linear regression models by replacing the Gaussian assumptions for the random errors with scale mixtures of normal (SMN) distributions. The SMN is an attractive class of symmetric heavy-tailed densities that includes the normal, Student-t, Pearson type VII, slash and the contaminated normal distributions, as special cases. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo algorithm is introduced to carry out posterior inference. A new hierarchical prior distribution is suggested for the degrees of freedom parameter in the Student-t distribution. The likelihood function is utilized to compute not only some Bayesian model selection measures but also to develop Bayesian case-deletion influence diagnostics based on the q-divergence measure. The proposed Bayesian methods are implemented in the R package BayesCR. The newly developed procedures are illustrated with applications using real and simulated data.  相似文献   

15.
The performance of nine different nonparametric regression estimates is empirically compared on ten different real datasets. The number of data points in the real datasets varies between 7, 900 and 18, 000, where each real dataset contains between 5 and 20 variables. The nonparametric regression estimates include kernel, partitioning, nearest neighbor, additive spline, neural network, penalized smoothing splines, local linear kernel, regression trees, and random forests estimates. The main result is a table containing the empirical L2 risks of all nine nonparametric regression estimates on the evaluation part of the different datasets. The neural networks and random forests are the two estimates performing best. The datasets are publicly available, so that any new regression estimate can be easily compared with all nine estimates considered in this article by just applying it to the publicly available data and by computing its empirical L2 risks on the evaluation part of the datasets.  相似文献   

16.
In linear regression, robust methods are at the beginning of their use in practice. In the small sample case, such robust methods provide a necessary measure of protection against deviations from the assumed error distribution. This paper studies through simulation the deficiencies of bioptimal estimators and compares them with more common methods like Huber's estimator or Tukey's estimator. Polyoptimal estimators are convex combinations of Pitman estimators and are optimally robust for a confrontation containing several shapes. The word confrontation is due to J.W. Tukey. It expresses the situation when compromising two or several error distributions. The paper uses the confrontation containing the Gaussian distribution along with a symmetric heavy-tailed distribution having a tail of order 0(t-2) as t→ ±∞.  相似文献   

17.
Qingguo Tang 《Statistics》2013,47(5):389-404
The varying coefficient model is a useful extension of linear models and has many advantages in practical use. To estimate the unknown functions in the model, the kernel type with local linear least-squares (L 2) estimation methods has been proposed by several authors. When the data contain outliers or come from population with heavy-tailed distributions, L 1-estimation should yield better estimators. In this article, we present the local linear L 1-estimation method and derive the asymptotic distributions of the L 1-estimators. The simulation results for two examples, with outliers and heavy-tailed distribution, respectively, show that the L 1-estimators outperform the L 2-estimators.  相似文献   

18.
Several methods have been suggested to calculate robust M- and G-M -estimators of the regression parameter β and of the error scale parameter σ in a linear model. This paper shows that, for some data sets well known in robust statistics, the nonlinear systems of equations for the simultaneous estimation of β, with an M-estimate with a redescending ψ-function, and σ, with the residual median absolute deviation (MAD), have many solutions. This multiplicity is not caused by the possible lack of uniqueness, for redescending ψ-functions, of the solutions of the system defining β with known σ; rather, the simultaneous estimation of β and σ together creates the problem. A way to avoid these multiple solutions is to proceed in two steps. First take σ as the median absolute deviation of the residuals for a uniquely defined robust M-estimate such as Huber's Proposal 2 or the L1-estimate. Then solve the nonlinear system for the M-estimate with σ equal to the value obtained at the first step to get the estimate of β. Analytical conditions for the uniqueness of M and G-M-estimates are also given.  相似文献   

19.
主成分回归方法已得到广泛应用,但该方法是否能减小参数估计的误差,理论上并没有明确的结论。以3个假设模型为例,运用模拟计算的方法对主成分回归方法进行了研究,发现主成分回归估计的误差可能比普通最小二乘估计更小,也可能更大,依赖于实际的模型。  相似文献   

20.
In multiple linear regression analysis each lower-dimensional subspace L of a known linear subspace M of ? n corresponds to a non empty subset of the columns of the regressor matrix. For a fixed subspace L, the C p statistic is an unbiased estimator of the mean square error if the projection of the response vector onto L is used to estimate the expected response. In this article, we consider two truncated versions of the C p statistic that can also be used to estimate this mean square error. The C p statistic and its truncated versions are compared in two example data sets, illustrating that use of the truncated versions may result in models different from those selected by standard C p .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号