首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this article, we consider the problem of selecting functional variables using the L1 regularization in a functional linear regression model with a scalar response and functional predictors, in the presence of outliers. Since the LASSO is a special case of the penalized least-square regression with L1 penalty function, it suffers from the heavy-tailed errors and/or outliers in data. Recently, Least Absolute Deviation (LAD) and the LASSO methods have been combined (the LAD-LASSO regression method) to carry out robust parameter estimation and variable selection simultaneously for a multiple linear regression model. However, variable selection of the functional predictors based on LASSO fails since multiple parameters exist for a functional predictor. Therefore, group LASSO is used for selecting functional predictors since group LASSO selects grouped variables rather than individual variables. In this study, we propose a robust functional predictor selection method, the LAD-group LASSO, for a functional linear regression model with a scalar response and functional predictors. We illustrate the performance of the LAD-group LASSO on both simulated and real data.  相似文献   

2.
In this paper, a penalized weighted composite quantile regression estimation procedure is proposed to estimate unknown regression parameters and autoregression coefficients in the linear regression model with heavy-tailed autoregressive errors. Under some conditions, we show that the proposed estimator possesses the oracle properties. In addition, we introduce an iterative algorithm to achieve the proposed optimization problem, and use a data-driven method to choose the tuning parameters. Simulation studies demonstrate that the proposed new estimation method is robust and works much better than the least squares based method when there are outliers in the dataset or the autoregressive error distribution follows heavy-tailed distributions. Moreover, the proposed estimator works comparably to the least squares based estimator when there are no outliers and the error is normal. Finally, we apply the proposed methodology to analyze the electricity demand dataset.  相似文献   

3.
One advantage of quantile regression, relative to the ordinary least-square (OLS) regression, is that the quantile regression estimates are more robust against outliers and non-normal errors in the response measurements. However, the relative efficiency of the quantile regression estimator with respect to the OLS estimator can be arbitrarily small. To overcome this problem, composite quantile regression methods have been proposed in the literature which are resistant to heavy-tailed errors or outliers in the response and at the same time are more efficient than the traditional single quantile-based quantile regression method. This paper studies the composite quantile regression from a Bayesian perspective. The advantage of the Bayesian hierarchical framework is that the weight of each component in the composite model can be treated as open parameter and automatically estimated through Markov chain Monte Carlo sampling procedure. Moreover, the lasso regularization can be naturally incorporated into the model to perform variable selection. The performance of the proposed method over the single quantile-based method was demonstrated via extensive simulations and real data analysis.  相似文献   

4.
Response surfaces express the behavior of responses and can be used for both single and multi-response problems. A common approach to estimate a response surface using experimental results is the ordinary least squares (OLS) method. Since OLS is very sensitive to outliers, some robust approaches have been discussed in the literature. Although there are many methods available in the literature for multiple response optimizations, there are a few studies in model building especially robust models. Assuming correlated responses, in this paper, a robust coefficient estimation method is proposed for multi response problem based on M-estimators. In order to illustrate the performance of the proposed procedure, a contaminated experimental design using a numerical example available in the literature with some modifications is used. Both the classical multivariate least squares method and the proposed robust multivariate approach are used to estimate regression coefficients of multi-response surfaces based on this example. Moreover, a comparison of the proposed robust multi response surface (RMRS) approach with separate robust estimation of single response show that the proposed approach is more efficient.  相似文献   

5.
In this article, we investigate a new estimation approach for the partially linear single-index model based on modal regression method, where the non parametric function is estimated by penalized spline method. Moreover, we develop an expection maximum (EM)-type algorithm and establish the large sample properties of the proposed estimation method. A distinguishing characteristic of the newly proposed estimation is robust against outliers through introducing an additional tuning parameter which can be automatically selected using the observed data. Simulation studies and real data example are used to evaluate the finite-sample performance, and the results show that the newly proposed method works very well.  相似文献   

6.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

7.
Mixture regression models are used to investigate the relationship between variables that come from unknown latent groups and to model heterogenous datasets. In general, the error terms are assumed to be normal in the mixture regression model. However, the estimators under normality assumption are sensitive to the outliers. In this article, we introduce a robust mixture regression procedure based on the LTS-estimation method to combat with the outliers in the data. We give a simulation study and a real data example to illustrate the performance of the proposed estimators over the counterparts in terms of dealing with outliers.  相似文献   

8.
Longitudinal data are commonly modeled with the normal mixed-effects models. Most modeling methods are based on traditional mean regression, which results in non robust estimation when suffering extreme values or outliers. Median regression is also not a best choice to estimation especially for non normal errors. Compared to conventional modeling methods, composite quantile regression can provide robust estimation results even for non normal errors. In this paper, based on a so-called pseudo composite asymmetric Laplace distribution (PCALD), we develop a Bayesian treatment to composite quantile regression for mixed-effects models. Furthermore, with the location-scale mixture representation of the PCALD, we establish a Bayesian hierarchical model and achieve the posterior inference of all unknown parameters and latent variables using Markov Chain Monte Carlo (MCMC) method. Finally, this newly developed procedure is illustrated by some Monte Carlo simulations and a case analysis of HIV/AIDS clinical data set.  相似文献   

9.
Ordinary least-square (OLS) estimators for a linear model are very sensitive to unusual values in the design space or outliers among y values. Even one single atypical value may have a large effect on the parameter estimates. This article aims to review and describe some available and popular robust techniques, including some recent developed ones, and compare them in terms of breakdown point and efficiency. In addition, we also use a simulation study and a real data application to compare the performance of existing robust methods under different scenarios.  相似文献   

10.
Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

11.
We introduce a multivariate heteroscedastic measurement error model for replications under scale mixtures of normal distribution. The model can provide a robust analysis and can be viewed as a generalization of multiple linear regression from both model structure and distribution assumption. An efficient method based on Markov Chain Monte Carlo is developed for parameter estimation. The deviance information criterion and the conditional predictive ordinates are used as model selection criteria. Simulation studies show robust inference behaviours of the model against both misspecification of distributions and outliers. We work out an illustrative example with a real data set on measurements of plant root decomposition.  相似文献   

12.
The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.  相似文献   

13.
The estimation of the mixtures of regression models is usually based on the normal assumption of components and maximum likelihood estimation of the normal components is sensitive to noise, outliers, or high-leverage points. Missing values are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this article, we propose the mixtures of regression models for contaminated incomplete heterogeneous data. The proposed models provide robust estimates of regression coefficients varying across latent subgroups even under the presence of missing values. The methodology is illustrated through simulation studies and a real data analysis.  相似文献   

14.
This paper considers a non linear quantile model with change-points. The quantile estimation method, which as a particular case includes median model, is more robust with respect to other traditional methods when model errors contain outliers. Under relatively weak assumptions, the convergence rate and asymptotic distribution of change-point and of regression parameter estimators are obtained. Numerical study by Monte Carlo simulations shows the performance of the proposed method for non linear model with change-points.  相似文献   

15.
ABSTRACT

M-estimation is a widely used technique for robust statistical inference. In this paper, we study robust partially functional linear regression model in which a scale response variable is explained by a function-valued variable and a finite number of real-valued variables. For the estimation of the regression parameters, which include the infinite dimensional function as well as the slope parameters for the real-valued variables, we use polynomial splines to approximate the slop parameter. The estimation procedure is easy to implement, and it is resistant to heavy-tailederrors or outliers in the response. The asymptotic properties of the proposed estimators are established. Finally, we assess the finite sample performance of the proposed method by Monte Carlo simulation studies.  相似文献   

16.
Fast and robust bootstrap   总被引:1,自引:0,他引:1  
In this paper we review recent developments on a bootstrap method for robust estimators which is computationally faster and more resistant to outliers than the classical bootstrap. This fast and robust bootstrap method is, under reasonable regularity conditions, asymptotically consistent. We describe the method in general and then consider its application to perform inference based on robust estimators for the linear regression and multivariate location-scatter models. In particular, we study confidence and prediction intervals and tests of hypotheses for linear regression models, inference for location-scatter parameters and principal components, and classification error estimation for discriminant analysis.  相似文献   

17.
18.
A cluster methodology, motivated by a robust similarity matrix is proposed for identifying likely multivariate outlier structure and to estimate weighted least-square (WLS) regression parameters in linear models. The proposed method is an agglomeration of procedures that begins from clustering the n-observations through a test of ‘no-outlier hypothesis’ (TONH) to a weighted least-square regression estimation. The cluster phase partition the n-observations into h-set called main cluster and a minor cluster of size n?h. A robust distance emerge from the main cluster upon which a test of no outlier hypothesis’ is conducted. An initial WLS regression estimation is computed from the robust distance obtained from the main cluster. Until convergence, a re-weighted least-squares (RLS) regression estimate is updated with weights based on the normalized residuals. The proposed procedure blends an agglomerative hierarchical cluster analysis of a complete linkage through the TONH to the Re-weighted regression estimation phase. Hence, we propose to call it cluster-based re-weighted regression (CBRR). The CBRR is compared with three existing procedures using two data sets known to exhibit masking and swamping. The performance of CBRR is further examined through simulation experiment. The results obtained from the data set illustration and the Monte Carlo study shows that the CBRR is effective in detecting multivariate outliers where other methods are susceptible to it. The CBRR does not require enormous computation and is substantially not susceptible to masking and swamping.  相似文献   

19.
Based on B-spline basis functions and smoothly clipped absolute deviation (SCAD) penalty, we present a new estimation and variable selection procedure based on modal regression for partially linear additive models. The outstanding merit of the new method is that it is robust against outliers or heavy-tail error distributions and performs no worse than the least-square-based estimation for normal error case. The main difference is that the standard quadratic loss is replaced by a kernel function depending on a bandwidth that can be automatically selected based on the observed data. With appropriate selection of the regularization parameters, the new method possesses the consistency in variable selection and oracle property in estimation. Finally, both simulation study and real data analysis are performed to examine the performance of our approach.  相似文献   

20.
In this article, we present a new efficient iteration estimation approach based on local modal regression for single-index varying-coefficient models. The resulted estimators are shown to be robust with regardless of outliers and error distributions. The asymptotic properties of the estimators are established under some regularity conditions and a practical modified EM algorithm is proposed for the new method. Moreover, to achieve sparse estimator when there exists irrelevant variables in the index parameters, a variable selection procedure based on SCAD penalty is developed to select significant parametric covariates and the well-known oracle properties are also derived. Finally, some numerical examples with various distributed errors and a real data analysis are conducted to illustrate the validity and feasibility of our proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号