首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A tutorial on support vector regression   总被引:78,自引:0,他引:78  
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.  相似文献   

2.
Summary. Many geophysical regression problems require the analysis of large (more than 104 values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing bounded influence or high breakdown point estimators frequently lack the ability to eliminate extremely influential data and/or the computational efficiency to handle large data sets. A new bounded influence estimator is proposed that combines high asymptotic efficiency for normal data, high breakdown point behaviour with contaminated data and computational simplicity for large data sets. The algorithm combines a standard M -estimator to downweight data corresponding to extreme regression residuals and removal of overly influential predictor values (leverage points) on the basis of the statistics of the hat matrix diagonal elements. For this, the exact distribution of the hat matrix diagonal elements p ii for complex multivariate Gaussian predictor data is shown to be β ( p ii ,  m ,  N − m ), where N is the number of data and m is the number of parameters. Real geophysical data from an auroral zone magnetotelluric study which exhibit severe outlier and leverage point contamination are used to illustrate the estimator's performance. The examples also demonstrate the utility of looking at both the residual and the hat matrix distributions through quantile–quantile plots to diagnose robust regression problems.  相似文献   

3.
Quantile regression (QR) models have received a great deal of attention in both the theoretical and applied statistical literature. In this paper we propose support vector quantile regression (SVQR) with monotonicity restriction, which is easily obtained via the dual formulation of the optimization problem. We also provide the generalized approximate cross validation method for choosing the hyperparameters which affect the performance of the proposed SVQR. The experimental results for the synthetic and real data sets confirm the successful performance of the proposed model.  相似文献   

4.
5.
Financial stress index (FSI) is considered to be an important risk management tool to quantify financial vulnerabilities. This paper proposes a new framework based on a hybrid classifier model that integrates rough set theory (RST), FSI, support vector regression (SVR) and a control chart to identify stressed periods. First, the RST method is applied to select variables. The outputs are used as input data for FSI–SVR computation. Empirical analysis is conducted based on monthly FSI of the Federal Reserve Bank of Saint Louis from January 1992 to June 2011. A comparison study is performed between FSI based on the principal component analysis and FSI–SVR. A control chart based on FSI–SVR and extreme value theory is proposed to identify the extremely stressed periods. Our approach identified different stressed periods including internet bubble, subprime crisis and actual financial stress episodes, along with the calmest periods, agreeing with those given by Federal Reserve System reports.  相似文献   

6.
Support Vector Regression (SVR) is gaining in popularity in the detection of outliers and classification problems in high-dimensional data (HDD) as this technique does not require the data to be of full rank. In real application, most of the data are of high dimensional. Classification of high-dimensional data is needed in applied sciences, in particular, as it is important to discriminate cancerous cells from non-cancerous cells. It is also imperative that outliers are identified before constructing a model on the relationship between the dependent and independent variables to avoid misleading interpretations about the fitting of a model. The standard SVR and the μ-ε-SVR are able to detect outliers; however, they are computationally expensive. The fixed parameters support vector regression (FP-ε-SVR) was put forward to remedy this issue. However, the FP-ε-SVR using ε-SVR is not very successful in identifying outliers. In this article, we propose an alternative method to detect outliers i.e. by employing nu-SVR. The merit of our proposed method is confirmed by three real examples and the Monte Carlo simulation. The results show that our proposed nu-SVR method is very successful in identifying outliers under a variety of situations, and with less computational running time.  相似文献   

7.
We consider for quantile regression and support vector regression a kernel-based online learning algorithm associated with a sequence of insensitive pinball loss functions. Our error analysis and derived learning rates show quantitatively that the statistical performance of the learning algorithm may vary with the quantile parameter ττ. In our analysis we overcome the technical difficulty caused by the varying insensitive parameter introduced with a motivation of sparsity.  相似文献   

8.
ABSTRACT

The last few years, the applications of Support Vector Machine (SVM) for solving classification and regression problems have been increasing, due to its high performance and ability to transform the non-linear relationships among variables to linear form by employing the kernel idea (kernel function). In this work, we develop a semi-parametric approach to fit single-index models to deal with high-dimensional problems. To achieve this goal, we use support vector regression (SVR) for estimating the unknown nonparametric link function, while the single-index is determined by using the semi-parametric least squares method (Ichimura 1993). This development enhances the ability of SVR to solve high-dimensional problem. We design a three simulation examples with high-dimensional problems (linear and nonlinear). The simulations demonstrate the superior performance of the proposed method versus the standard SVR method. This is further illustrated by applying the real data.  相似文献   

9.
In the context of genetics and genomic medicine, gene-environment (G×E) interactions have a great impact on the risk of human diseases. Some existing methods for identifying G×E interactions are considered to be limited, since they analyze one or a few number of G factors at a time, assume linear effects of E factors, and use inefficient selection methods. In this paper, we propose a new method to identify significant main effects and G×E interactions. This is based on a semivarying coefficient least-squares support vector regression (LS-SVR) technique, which is devised by utilizing flexible semiparametric LS-SVR approach for censored survival data. This semivarying coefficient model is used to deal with the nonlinear effects of E factors. We also derive a generalized cross validation (GCV) function for determining the optimal values of hyperparameters of the proposed method. This GCV function is also used to identify significant main effects and G×E interactions. The proposed method is evaluated through numerical studies.  相似文献   

10.
In this paper a new robust estimator, modified median estimator, is introduced and studied for the logistic regression model. This estimator is based on the median estimator considered in Hobza et al. [Robust median estimator in logistic regression. J Stat Plan Inference. 2008;138:3822–3840]. Its asymptotic distribution is obtained. Using the modified median estimator, we also consider a Wald-type test statistic for testing linear hypotheses in the logistic regression model and we obtain its asymptotic distribution under the assumption of random regressors. An extensive simulation study is presented in order to analyse the efficiency as well as the robustness of the modified median estimator and Wald-type test based on it.  相似文献   

11.
Hierarchical study design often occurs in many areas such as epidemiology, psychology, sociology, public health, engineering, and agriculture. This imposes correlation in data structure that needs to be account for in modelling process. In this study, a three-level mixed-effects least squares support vector regression (MLS-SVR) model is proposed to extend the standard least squares support vector regression (LS-SVR) model for handling cluster correlated data. The MLS-SVR model incorporates multiple random effects which allow handling unequal number of observations for each case at non-fixed time points (a very unbalanced situation) and correlation between subjects simultaneously. The methodology consists of a regression modelling step that is performed straightforwardly by solving a linear system. The proposed model is illustrated through numerical studies on simulated data sets and a real data example on human Brucellosis frequency. The generalization performance of the proposed MLS-SVR is evaluated by comparing to ordinary LS-SVR and some other parametric models.  相似文献   

12.
Selection of relevant predictor variables for building a model is an important problem in the multiple linear regression. Variable selection method based on ordinary least squares estimator fails to select the set of relevant variables for building a model in the presence of outliers and leverage points. In this article, we propose a new robust variable selection criterion for selection of relevant variables in the model and establish its consistency property. Performance of the proposed method is evaluated through simulation study and real data.  相似文献   

13.
In the multiple linear regression analysis, the ridge regression estimator and the Liu estimator are often used to address multicollinearity. Besides multicollinearity, outliers are also a problem in the multiple linear regression analysis. We propose new biased estimators based on the least trimmed squares (LTS) ridge estimator and the LTS Liu estimator in the case of the presence of both outliers and multicollinearity. For this purpose, a simulation study is conducted in order to see the difference between the robust ridge estimator and the robust Liu estimator in terms of their effectiveness; the mean square error. In our simulations, the behavior of the new biased estimators is examined for types of outliers: X-space outlier, Y-space outlier, and X-and Y-space outlier. The results for a number of different illustrative cases are presented. This paper also provides the results for the robust ridge regression and robust Liu estimators based on a real-life data set combining the problem of multicollinearity and outliers.  相似文献   

14.
To seek the nonlinear structure hidden in data points of high-dimension, a transformation related to projection pursuit method and a projection index were proposed by Li (1989, 1990 ). In this paper, we present a consistent estimator of the supremum of the projection index based sliced inverse regression technique. This estimator also suggests a method to obtain approximately the most interesting projection in the general case.  相似文献   

15.
Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

16.
ABSTRACT

In this paper, we propose a new efficient and robust penalized estimating procedure for varying-coefficient single-index models based on modal regression and basis function approximations. The proposed procedure simultaneously solves two types of problems: separation of varying and constant effects and selection of variables with non zero coefficients for both non parametric and index components using three smoothly clipped absolute deviation (SCAD) penalties. With appropriate selection of the tuning parameters, the new method possesses the consistency in variable selection and the separation of varying and constant coefficients. In addition, the estimators of varying coefficients possess the optimal convergence rate and the estimators of constant coefficients and index parameters have the oracle property. Finally, we investigate the finite sample performance of the proposed method through a simulation study and real data analysis.  相似文献   

17.
ABSTRACT

Statistical methods are effectively used in the evaluation of pharmaceutical formulations instead of laborious liquid chromatography. However, signal overlapping, nonlinearity, multicollinearity and presence of outliers deteriorate the performance of statistical methods. The Partial Least Squares Regression (PLSR) is a very popular method in the quantification of high dimensional spectrally overlapped drug formulations. The SIMPLS is the mostly used PLSR algorithm, but it is highly sensitive to outliers that also effect the diagnostics. In this paper, we propose new robust multivariate diagnostics to identify outliers, influential observations and points causing non-normality for a PLSR model. We study performances of the proposed diagnostics on two everyday use highly overlapping drug systems: Paracetamol–Caffeine and Doxylamine Succinate–Pyridoxine Hydrochloride.  相似文献   

18.
We introduce a log-linear regression model based on the odd log-logistic generalized half-normal distribution [7 G.M. Cordeiro, M. Alizadeh, R.R. Pescim, and E.M.M. Ortega, The odd log-logistic generalized half-normal lifetime distribution: Properties and applications, Comm. Statist. Theory Methods (2015), accepted for publication. [Google Scholar]]. Some of its structural properties including explicit expressions for the density function, quantile and generating functions and ordinary moments are derived. We estimate the model parameters by the maximum likelihood method. For different parameter settings, proportion of censoring and sample size, some simulations are performed to investigate the behavior of the estimators. We derive the appropriate matrices for assessing local influence diagnostics on the parameter estimates under different perturbation schemes. We also define the martingale and modified deviance residuals to detect outliers and evaluate the model assumptions. In addition, we demonstrate that the extended regression model can be very useful in the analysis of real data and provide more realistic fits than other special regression models. The potentiality of the new regression model is illustrated by means of a real data set.  相似文献   

19.
The standard approach to non-parametric bivariate density estimation is to use a kernel density estimator. Practical performance of this estimator is hindered by the fact that the estimator is not adaptive (in the sense that the level of smoothing is not sensitive to local properties of the density). In this paper a simple, automatic and adaptive bivariate density estimator is proposed based on the estimation of marginal and conditional densities. Asymptotic properties of the estimator are examined, and guidance to practical application of the method is given. Application to two examples illustrates the usefulness of the estimator as an exploratory tool, particularly in situations where the local behaviour of the density varies widely. The proposed estimator is also appropriate for use as a pilot estimate for an adaptive kernel estimate, since it is relatively inexpensive to calculate.  相似文献   

20.
The shrinkage preliminary test ridge regression estimators (SPTRRE) based on the Wald (W), the likelihood ratio (LR) and the Lagrangian multiplier (LM) tests are considered in this paper. The bias and the risk functions of the proposed estimators are derived. The regions of optimality of the estimators are determined under the quadratic risk function. Under the null hypothesis, the SPTRRE based on LM test has the smallest risk, followed by the estimators based on LR and W tests. However, the SPTRRE based on W test performs the best followed by the LR and LM based estimators when the parameter moves away from the subspace of the restrictions. The conditions of superiority of the proposed estimator for both ridge and departure parameters are discussed. The optimum choice of the level of significance becomes the traditional choice by using the W test for all non-negative ridge parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号