首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Regression analysis is one of methods widely used in prediction problems. Although there are many methods used for parameter estimation in regression analysis, ordinary least squares (OLS) technique is the most commonly used one among them. However, this technique is highly sensitive to outlier observation. Therefore, in literature, robust techniques are suggested when data set includes outlier observation. Besides, in prediction a problem, using the techniques that reduce the effectiveness of outlier and using the median as a target function rather than an error mean will be more successful in modeling these kinds of data. In this study, a new parameter estimation method using the median of absolute rate obtained by division of the difference between observation values and predicted values by the observation value and based on particle swarm optimization was proposed. The performance of the proposed method was evaluated with a simulation study by comparing it with OLS and some other robust methods in the literature.  相似文献   

2.
Summary.  We compared measures of hospital performance by using both administrative and clinical data sources. Hospital-specific mortality outcomes on 10086 patients who had been admitted to 102 hospitals with a diagnosis of acute myocardial infarction in Ontario, Canada, were used as a test-case. Four and six hospitals were identified as having mortality that was statistically significantly higher than expected by using administrative and clinical data respectively, when model-based indirect standardization was used. When using random-effects models, zero and two hospitals were identified as having significantly higher mortality by using administrative and clinical data respectively. Approximately one in four hospitals changed at least two decile rankings when clinical data were used compared with when administrative data were used.  相似文献   

3.
In this article, we present an alternative test of discordancy in samples of univariate circular data. The new technique is based on the effect of existence of an outlier on the summation of circular distances of the point of interest to all other points. The percentage points are calculated and the performance is examined. We compare the performance of the test in detecting an outlier with other tests and show that the new approach performs relatively better than other known tests. As an illustration a practical example is presented.  相似文献   

4.
Potency bioassays are used to measure biological activity. Consequently, potency is considered a critical quality attribute in manufacturing. Relative potency is measured by comparing the concentration‐response curves of a manufactured test batch with that of a reference standard. If the curve shapes are deemed similar, the test batch is said to exhibit constant relative potency with the reference standard, a critical requirement for calibrating the potency of the final drug product. Outliers in bioassay potency data may result in the false acceptance/rejection of a bad/good sample and, if accepted, may yield a biased relative potency estimate. To avoid these issues, the USP<1032> recommends the screening of bioassay data for outliers prior to performing a relative potency analysis. In a recently published work, the effects of one or more outliers, outlier size, and outlier type on similarity testing and estimation of relative potency were thoroughly examined, confirming the USP<1032> outlier guidance. As a follow‐up, several outlier detection methods, including those proposed by the USP<1010>, are evaluated and compared in this work through computer simulation. Two novel outlier detection methods are also proposed. The effects of outlier removal on similarity testing and estimation of relative potency were evaluated, resulting in recommendations for best practice.  相似文献   

5.
A method for robustness in linear models is to assume that there is a mixture of standard and outlier observations with a different error variance for each class. For generalised linear models (GLMs) the mixture model approach is more difficult as the error variance for many distributions has a fixed relationship to the mean. This model is extended to GLMs by changing the classes to one where the standard class is a standard GLM and the outlier class which is an overdispersed GLM achieved by including a random effect term in the linear predictor. The advantages of this method are it can be extended to any model with a linear predictor, and outlier observations can be easily identified. Using simulation the model is compared to an M-estimator, and found to have improved bias and coverage. The method is demonstrated on three examples.  相似文献   

6.
In this paper, we propose a method for outlier detection and removal in electromyographic gait-related patterns (EMG-GRPs). The goal was to detect and remove EMG-GRPs that reduce the quality of gait data while preserving natural biological variations in EMG-GRPs. The proposed procedure consists of general statistical tests and is simple to use. The Friedman test with multiple comparisons was used to find particular EMG-GRPs that are extremely different from others. Next, outlying observations were calculated for each suspected stride waveform by applying the generalized extreme studentized deviate test. To complete the analysis, we applied different outlier criteria. The results suggest that an EMG-GRP is an outlier if it differs from at least 50% of the other stride waveforms and contains at least 20% of the outlying observations. The EMG signal remains a realistic representation of muscle activity and demonstrates step-by-step variability once the outliers, as defined here, are removed.  相似文献   

7.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

8.
Outlier detection plays an important role in the pre-treatment of sequential datasets to obtain pure valuable data. This paper proposes an outlier detection scheme for dynamical sequential datasets. First, the conception of forward outlier factor(FOF) and backward outlier factor(BOF) are employed to measure an object’s similarity shared with its sequentially adjacent objects. The object that shows no similarity with its sequential neighbors is labeled as suspicious outliers, which will be treated subsequently to judge whether it is really an outlier in the dataset. Second, the sequentially adjacent suspicious outliers are defined as suspicious outlier series(SOS), then the expected path representing the ideal transition path through the suspicious outliers in the SOS and the measured path representing the real path through all the objects in the SOS are employed, and the ratio of the length of the expected path to that of the measured path indicates whether there exist outliers in the SOS. Third, in the case that there exist outliers in the SOS, if there are N suspicious outliers in the SOS, then 2N ? 2 remaining path will be generated by removing k(0 < k < N) suspicious outliers and sequentially connecting the remaining ones. The dynamical sequential outlier factor(DSOF) is employed to represent the ratio of the length of measured path of the considered remaining path to the that of the the expected path of the corresponding SOS, and the degree of the objects removed in a remaining path being outliers is indicated by the DSOF. The proposed outlier detection scheme is conducted from a dynamical perspective, and breaks the tight relation between being an outlier and being not similar with adjacent objects. Experiments are conducted to evaluate the effectiveness of the proposed scheme, and the experimental results verify that the proposed scheme has higher detection quality for sequential dataset. In addition, the proposed outlier detection scheme is not dependent on the size of dataset and needs no prior information about the distribution of the data.  相似文献   

9.
This paper studies outlier detection for multilevel models. Approximate formulae for outlier detection in estimating both fixed and random parameters under the mean-shift outlier model are derived, and a test for multiple outliers is proposed. These results can be used to detect outlier units at any levels. Detection of outlier units related to random parts is also studied. Analysis of an example shows that the proposed method is effective in identifying outliers in multilevel models.  相似文献   

10.
Motivated by the national evaluation of readmission rates among kidney dialysis facilities in the United States, we evaluate the impact of including discharging hospitals on the estimation of facility-level standardized readmission ratios (SRRs). The estimation of SRRs consists of two steps. First, we model the dependence of readmission events on facilities and patient-level characteristics, with or without an adjustment for discharging hospitals. Second, using results from the models, standardization is achieved by computing the ratio of the number of observed events to the number of expected events assuming a population norm and given the case-mix in that facility. A challenging aspect of our motivating example is that the number of parameters is very large and estimation of high-dimensional parameters is troublesome. To solve this problem, we propose a structured Newton-Raphson algorithm for a logistic fixed effects model and an approximate EM algorithm for the logistic mixed effects model. We consider a re-sampling and simulation technique to obtain p-values for the proposed measures. Finally, our method of identifying outlier facilities involves converting the observed p-values to Z-statistics and using the empirical null distribution, which accounts for overdispersion in the data. The finite-sample properties of proposed measures are examined through simulation studies. The methods developed are applied to national dialysis data. It is our great pleasure to present this paper in honor of Ross Prentice, who has been instrumental in the development of modern methods of modeling and analyzing life history and failure time data, and in the inventive applications of these methods to important national data problem.  相似文献   

11.
The problem of outliers in statistical data has attracted many researchers for a long time. Consequently, numerous outlier detection methods have been proposed in the statistical literature. However, no consensus has emerged as to which method is uniformly better than the others or which one is recommended for use in practical situations. In this article, we perform an extensive comparative Monte Carlo simulation study to assess the performance of the multiple outlier detection methods that are either recently proposed or frequently cited in the outlier detection literature. Our simulation experiments include a wide variety of realistic and challenging regression scenarios. We give recommendations on which method is superior to others under what conditions.  相似文献   

12.
基于空气质量数据特征,在B-样条基底拟合曲线的基础上,将曲线本身信息、曲线变化信息引入分析,构造加权曲线深度指标,探索一种异常曲线探测方法。与现有仅考虑离散点信息和曲线本身信息的方法相比较,该探测方法更加符合空气质量数据特点,具备缺失值处理能力及整体异常和局部异常的识别能力。将该方法应用于兰州市空气质量数据采集点的二氧化氮水平曲线异常情况分析,结果表明该方法具有更好的异常情况识别效果。  相似文献   

13.
Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided.  相似文献   

14.
In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed.  相似文献   

15.
Sets of relatively short time series arise in many situations. One aspect of their analysis may be the detection of outlying series. We examine the performance of standard normal outlier tests applied to the means, or to simple functions of the means, of AR(1) series, not necessarily of equal lengths. Although unequal lengths of series implies that the means have unequal variances, that are only known approximately, it is shown that nominal significance levels hold good under most circumstances. Thus a standard outlier test can usefully be applied, avoiding the complication of estimating the time series' parameters. The test's power is affected by unequal lengths, being higher when the slippage occurs in one of the longer series  相似文献   

16.
Spatial outliers are spatially referenced objects whose non spatial attribute values are significantly different from the corresponding values in their spatial neighborhoods. In other words, a spatial outlier is a local instability or an extreme observation that deviates significantly in its spatial neighborhood, but possibly not be in the entire dataset. In this article, we have proposed a novel spatial outlier detection algorithm, location quotient (LQ) for multiple attributes spatial datasets, and compared its performance with the well-known mean and median algorithms for multiple attributes spatial datasets, in the literature. In particular, we have applied the mean, median, and LQ algorithms on a real dataset and on simulated spatial datasets of 13 different sizes to compare their performances. In addition, we have calculated area under the curve values in all the cases, which shows that our proposed algorithm is more powerful than the mean and median algorithms in almost all the considered cases and also plotted receiver operating characteristic curves in some cases.  相似文献   

17.
Support vector machine (SVM) is sparse in that its classifier is expressed as a linear combination of only a few support vectors (SVs). Whenever an outlier is included as an SV in the classifier, the outlier may have serious impact on the estimated decision function. In this article, we propose a robust loss function that is convex. Our learning algorithm is more robust to outliers than SVM. Also the convexity of our loss function permits an efficient solution path algorithm. Through simulated and real data analysis, we illustrate that our method can be useful in the presence of labeling errors.  相似文献   

18.
Outlier detection is fundamental to statistical modelling. When there are multiple outliers, many traditional approaches in use are stepwise detection procedures, which can be computationally expensive and ignore stochastic error in the outlier detection process. Outlier detection can be performed by a heteroskedasticity test. In this article, a rapid outlier detection method via multiple heteroskedasticity test based on penalized likelihood approaches is proposed to handle these kinds of problems. The proposed method detects the heteroskedasticity of all data only by one step and estimate coefficients simultaneously. The proposed approach is distinguished from others in that a rapid modelling approach uses a weighted least squares formulation coupled with nonconvex sparsity-including penalization. Furthermore, the proposed approach does not need to construct test statistics and calculate their distributions. A new algorithm is proposed for optimizing penalized likelihood functions. Favourable theoretical properties of the proposed approach are obtained. Our simulation studies and real data analysis show that the newly proposed methods compare favourably with other traditional outlier detection techniques.  相似文献   

19.
Multivariate outlier detection requires computation of robust distances to be compared with appropriate cut-off points. In this paper we propose a new calibration method for obtaining reliable cut-off points of distances derived from the MCD estimator of scatter. These cut-off points are based on a more accurate estimate of the extreme tail of the distribution of robust distances. We show that our procedure gives reliable tests of outlyingness in almost all situations of practical interest, provided that the sample size is not much smaller than 50. Therefore, it is a considerable improvement over all the available MCD procedures, which are unable to provide good control over the size of multiple outlier tests for the data structures considered in this paper.  相似文献   

20.
Bayesian analysis of outlier problems using the Gibbs sampler   总被引:6,自引:0,他引:6  
We consider the Bayesian analysis of outlier models. We show that the Gibbs sampler brings considerable conceptual and computational simplicity to the problem of calculating posterior marginals. Although other techniques for finding posterior marginals are available, the Gibbs sampling approach is notable for its ease of implementation. Allowing the probability of an outlier to be unknown introduces an extra parameter into the model but this turns out to involve only minor modification to the algorithm. We illustrate these ideas using a contaminated Gaussian distribution, at-distribution, a contaminated binomial model and logistic regression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号