首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It is important to identify outliers since inclusion, especially when using parametric methods, can cause distortion in the analysis and lead to erroneous conclusions. One of the easiest and most useful methods is based on the boxplot. This method is particularly appealing since it does not use any outliers in computing spread. Two methods, one by Carling and another by Schwertman and de Silva, adjust the boxplot method for sample size and skewness. In this paper, the two procedures are compared both theoretically and by Monte Carlo simulations. Simulations using both a symmetric distribution and an asymmetric distribution were performed on data sets with none, one, and several outliers. Based on the simulations, the Carling approach is superior in avoiding masking outliers, that is, the Carling method is less likely to overlook an outlier while the Schwertman and de Silva procedure is much better at reducing swamping, that is, misclassifying an observation as an outlier. Carling’s method is to the Schwertman and de Silva procedure as comparisonwise versus experimentwise error rate is for multiple comparisons. The two methods, rather than being competitors, appear to complement each other. Used in tandem they provide the data analyst a more complete prospective for identifying possible outliers.  相似文献   

2.
In this paper, we propose a robust test of exogeneity. The test statistics is constructed from quantile regression estimators, which are robust to heavy tails of errors. We derive the asymptotic distribution of the test statistic under the null hypothesis of exogeneity at a given quantile. The finite sample properties of the test are investigated through Monte Carlo simulations that exhibit not only good size and power properties, but also good robustness to outliers.  相似文献   

3.
Summary.  Editing in surveys of economic populations is often complicated by the fact that outliers due to errors in the data are mixed in with correct, but extreme, data values. We describe and evaluate two automatic techniques for the identification of errors in such long-tailed data distributions. The first is a forward search procedure based on finding a sequence of error-free subsets of the error-contaminated data and then using regression modelling within these subsets to identify errors. The second uses a robust regression tree modelling procedure to identify errors. Both approaches can be implemented on a univariate basis or on a multivariate basis. An application to a business survey data set that contains a mix of extreme errors and true outliers is described.  相似文献   

4.
This paper considers a non linear quantile model with change-points. The quantile estimation method, which as a particular case includes median model, is more robust with respect to other traditional methods when model errors contain outliers. Under relatively weak assumptions, the convergence rate and asymptotic distribution of change-point and of regression parameter estimators are obtained. Numerical study by Monte Carlo simulations shows the performance of the proposed method for non linear model with change-points.  相似文献   

5.
Longitudinal data are commonly modeled with the normal mixed-effects models. Most modeling methods are based on traditional mean regression, which results in non robust estimation when suffering extreme values or outliers. Median regression is also not a best choice to estimation especially for non normal errors. Compared to conventional modeling methods, composite quantile regression can provide robust estimation results even for non normal errors. In this paper, based on a so-called pseudo composite asymmetric Laplace distribution (PCALD), we develop a Bayesian treatment to composite quantile regression for mixed-effects models. Furthermore, with the location-scale mixture representation of the PCALD, we establish a Bayesian hierarchical model and achieve the posterior inference of all unknown parameters and latent variables using Markov Chain Monte Carlo (MCMC) method. Finally, this newly developed procedure is illustrated by some Monte Carlo simulations and a case analysis of HIV/AIDS clinical data set.  相似文献   

6.
One of the standard variable selection procedures in multiple linear regression is to use a penalisation technique in least‐squares (LS) analysis. In this setting, many different types of penalties have been introduced to achieve variable selection. It is well known that LS analysis is sensitive to outliers, and consequently outliers can present serious problems for the classical variable selection procedures. Since rank‐based procedures have desirable robustness properties compared to LS procedures, we propose a rank‐based adaptive lasso‐type penalised regression estimator and a corresponding variable selection procedure for linear regression models. The proposed estimator and variable selection procedure are robust against outliers in both response and predictor space. Furthermore, since rank regression can yield unstable estimators in the presence of multicollinearity, in order to provide inference that is robust against multicollinearity, we adjust the penalty term in the adaptive lasso function by incorporating the standard errors of the rank estimator. The theoretical properties of the proposed procedures are established and their performances are investigated by means of simulations. Finally, the estimator and variable selection procedure are applied to the Plasma Beta‐Carotene Level data set.  相似文献   

7.
The impacts of outliers and Berkson-type uncertainties with additive and multiplicative errors in linear regression are investigated. The work is motivated by a common biological phenomenon in which outlying observations and Berkson-type uncertainties may lie partly in the data, causing incorrect estimations and inferences. In this article, we use Wald-type estimator to combat these uncertainties due to its merits, including large sample properties especially for asymmetric errors, as well as its simplicity without nuisance parameters. The severity of the neglect of uncertainty effects will be examined by Monte Carlo simulations and real data examples through comparison with residual-based methods and the proposed estimate.  相似文献   

8.
In this article, we present an M-estimator to estimate the parameters of the extended three-parameter Burr Type III distribution for complete data with outliers. The confidence intervals for all parameters can be obtained by the M-estimator's normal approximation. The simulation results show that the M-estimator generally outperforms the maximum likelihood and least squares methods in terms of bias and root mean square errors. We also investigate the M-estimator's impact on different quantiles and the mean for the Weibull and normal distributions with outliers. Two numerical examples are used to demonstrate the performance of our proposed method.  相似文献   

9.
In this article, we consider a linear model in which the covariates are measured with errors. We propose a t-type corrected-loss estimation of the covariate effect, when the measurement error follows the Laplace distribution. The proposed estimator is asymptotically normal. In practical studies, some outliers that diminish the robustness of the estimation occur. Simulation studies show that the estimators are resistant to vertical outliers and an application of 6-minute walk test is presented to show that the proposed method performs well.  相似文献   

10.
Empirical distribution function (EDF) is a commonly used estimator of population cumulative distribution function. Survival function is estimated as the complement of EDF. However, clinical diagnosis of an event is often subjected to misclassification, by which the outcome is given with some uncertainty. In the presence of such errors, the true distribution of the time to first event is unknown. We develop a method to estimate the true survival distribution by incorporating negative predictive values and positive predictive values of the prediction process into a product-limit style construction. This will allow us to quantify the bias of the EDF estimates due to the presence of misclassified events in the observed data. We present an unbiased estimator of the true survival rates and its variance. Asymptotic properties of the proposed estimators are provided and these properties are examined through simulations. We evaluate our methods using data from the VIRAHEP-C study.  相似文献   

11.
In some experiments, such as destructive stress testing and industrial quality control experiments, only values smaller than all previous ones are observed. Here, for such record-breaking data, kernel estimation of the cumulative distribution function and smooth density estimation is considered. For a single record-breaking sample, consistent estimation is not possible, and replication is required for global results. For m independent record-breaking samples, the proposed distribution function and density estimators are shown to be strongly consistent and asymptotically normal as m → ∞. Also, for small m, the mean squared errors and biases of the estimators and their smoothing parameters are investigated through computer simulations.  相似文献   

12.
Nonparametric models with jump points have been considered by many researchers. However, most existing methods based on least squares or likelihood are sensitive when there are outliers or the error distribution is heavy tailed. In this article, a local piecewise-modal method is proposed to estimate the regression function with jump points in nonparametric models, and a piecewise-modal EM algorithm is introduced to estimate the proposed estimator. Under some regular conditions, the large-sample theory is established for the proposed estimators. Several simulations are presented to evaluate the performances of the proposed method, which shows that the proposed estimator is more efficient than the local piecewise-polynomial regression estimator in the presence of outliers or heavy tail error distribution. What is more, the proposed procedure is asymptotically equivalent to the local piecewise-polynomial regression estimator under the assumption that the error distribution is a Gaussian distribution. The proposed method is further illustrated via the sea-level pressures.  相似文献   

13.
CD4 and viral load play important roles in HIV/AIDS studies, and the study of their relationship has received much attention with well-known results. However, AIDS datasets are often highly complex in the sense that they typically contain outliers, measurement errors, and missing data. These data complications can greatly affect statistical analysis results, but much of the literature fail to address these issues in data analysis. In this paper, we re-visit the important relationship between CD4 and viral load and propose methods which simultaneously address outliers, measurement errors, and missing data. We find that the strength of the relationship may be severely mis-estimated if measurement errors and outliers are ignored. The proposed methods are general and can be used in other settings, where jointly modelling several different types of longitudinal data is required in the presence of data complications.  相似文献   

14.
In many situations, the quality of a process or product may be better characterized and summarized by a relationship between the response variable and one or more explanatory variables. Parameter estimation is the first step in constructing control charts. Outliers may hamper proper classical estimators and lead to incorrect conclusions. To remedy the problem of outliers, robust methods have been developed recently. In this article, a robust method is introduced for estimating the parameters of simple linear profiles. Two weight functions, Huber and Bisquare, are applied in the estimation algorithm. In addition, a method for robust estimation of the error terms variance is proposed. Simulation studies are done to investigate and evaluate the performance of the proposed estimator, as well as the classical one, in the presence and absence of outliers under different scenarios by the means of MSE criterion. The results reveal that the robust estimators proposed in this research perform as well as classical estimators in the absence of outliers and even considerably better when outliers exist. The maximum value of variance estimate in one scenario obtained from classical estimator is 10.9, while this value is 1.66 and 1.27 from proposed robust estimators when its actual value is 1.  相似文献   

15.
This study investigates the influences of additive outliers on financial durations. An outlier test statistic and an outlier detection procedure are proposed to detect and estimate outlier effects for the logarithmic Autoregressive Conditional Duration (Log-ACD) model. The proposed test statistic has an exact sampling distribution and performs very well, in terms of size and power, in a series of Monte Carlo simulations. Furthermore, the test statistic is robust to several alternative distribution assumptions. An empirical application shows that parameter estimates without considering outliers tend to be biased.  相似文献   

16.
We introduce a two-step procedure, in the context of ultra-high dimensional additive models, which aims to reduce the size of covariates vector and distinguish linear and nonlinear effects among nonzero components. Our proposed screening procedure, in the first step, is constructed based on the concept of cumulative distribution function and conditional expectation of response in the framework of marginal correlation. B-splines and empirical distribution functions are used to estimate the two above measures. The sure screening property of this procedure is also established. In the second step, a double penalization based procedure is applied to identify nonzero and linear components, simultaneously. The performance of the designed method is examined by several test functions to show its capabilities against competitor methods when the distribution of errors is varied. Simulation studies imply that the proposed screening procedure can be applied to the ultra-high dimensional data and well detect the influential covariates. It also demonstrate the superiority in comparison with the existing methods. This method is also applied to identify most influential genes for overexpression of a G protein-coupled receptor in mice.  相似文献   

17.
The Burr XII distribution offers a more flexible alternative to the lognormal, log-logistic and Weibull distributions. Outliers can occur during reliability life testing. Thus, we need an efficient method to estimate the parameters of the Burr XII distribution for censored data with outliers. The objective of this paper is to present a robust regression (RR) method called M-estimator to estimate the parameters of a two-parameter Burr XII distribution based on the probability plotting procedure for both the complete and multiply-censored data with outliers. The simulation results show that the RR method outperforms the unweighted least squares and maximum likelihood methods in most cases in terms of bias and errors in the root mean square.  相似文献   

18.
The Mahalanobis distance between pairs of multivariate observations is used as a measure of similarity between the observations. The theoretical distribution is derived, and the result is used for judging on the degree of isolation of an observation. In case of spatially dependent data where spatial coordinates are available, different exploratory tools are introduced for studying the degree of isolation of an observation from a fraction of its neighbors, and thus to identify local multivariate outliers.  相似文献   

19.
The Burr XII distribution offers a flexible alternative to the distributions that play important role for modelling data in reliability, risk and process capability. However, estimating the shape parameters of the Burr XII distribution is a challenging problem. The classical estimation methods such as maximum likelihood and least squares are often used to estimate the parameters of the Burr XII distribution, but these methods are very sensitive to the outliers in the data. Thus, a robust estimation method alternative to the classical methods is needed to find robust estimators that are less sensitive to the outliers in the data. The purpose of this paper is to use the optimal B-robust estimation method [Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust statistics: the approach based on influence functions. New York: Wiley; 1986] to obtain robust estimators for the shape parameters of the Burr XII distribution. The simulation results show that the optimal B-robust estimators generally outperform the classical estimators in terms of the bias and root mean square errors when there are outliers in data.  相似文献   

20.
This article compares eight estimators in terms of relative efficiencies with the univariate mean, some of which have not been compared previously. Four estimators, when testing hypotheses, are compared in terms of actual Type I errors. In terms of point estimation, the modified one-step M-estimator, one-step M-estimator, and rfch estimator are found to be the three best choices depending on the proportion of outliers. In terms of actual Type I errors, the modified one-step M estimator's and rfch estimator's level was between.045 and.055 in 5 out of 7 situations when real data were used in simulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号