首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 636 毫秒
1.
Diagnostics measures for detecting outliers in data from block designs of experiments with correlated errors are considered. Influence is often assessed by deleting suspected outlying observations. Autocorrelation of order one is considered to model correlation in each block. Cook-statistic is developed for detecting the effect of a single outlier, where results are illustrated with an example.  相似文献   

2.
Identification of outlier vectors in block designs for incomplete multiresponse experiments has been considered. Design is composed of two sets of experimental units. Different numbers of response variables are observed from these two sets. Cook-statistic has been developed for identification of outliers. The developed statistic has been illustrated with a real-life data set. It has been shown that presence of outliers can distort the overall conclusion from an experiment.  相似文献   

3.
In this paper, we propose a method for outlier detection and removal in electromyographic gait-related patterns (EMG-GRPs). The goal was to detect and remove EMG-GRPs that reduce the quality of gait data while preserving natural biological variations in EMG-GRPs. The proposed procedure consists of general statistical tests and is simple to use. The Friedman test with multiple comparisons was used to find particular EMG-GRPs that are extremely different from others. Next, outlying observations were calculated for each suspected stride waveform by applying the generalized extreme studentized deviate test. To complete the analysis, we applied different outlier criteria. The results suggest that an EMG-GRP is an outlier if it differs from at least 50% of the other stride waveforms and contains at least 20% of the outlying observations. The EMG signal remains a realistic representation of muscle activity and demonstrates step-by-step variability once the outliers, as defined here, are removed.  相似文献   

4.
Outliers in multilevel data   总被引:2,自引:0,他引:2  
This paper offers the data analyst a range of practical procedures for dealing with outliers in multilevel data. It first develops several techniques for data exploration for outliers and outlier analysis and then applies these to the detailed analysis of outliers in two large scale multilevel data sets from educational contexts. The techniques include the use of deviance reduction, measures based on residuals, leverage values, hierarchical cluster analysis and a measure called DFITS. Outlier analysis is more complex in a multilevel data set than in, say, a univariate sample or a set of regression data, where the concept of an outlying value is straightforward. In the multilevel situation one has to consider, for example, at what level or levels a particular response is outlying, and in respect of which explanatory variables; furthermore, the treatment of a particular response at one level may affect its status or the status of other units at other levels in the model.  相似文献   

5.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

6.
The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.  相似文献   

7.
This paper investigates the robustness of designed experiments for estimating linear functions of a subset of parameters in a general linear model against the loss of any t( ≥1) observations. Necessary and sufficient conditions for robustness of a design under a homoscedastic model are derived. It is shown that a design robust under a homoscedastic model is also robust under a general heteroscedastic model with correlated observations. As a particular case, necessary and sufficient conditions are obtained for the robustness of block designs against the loss of data. Simple sufficient conditions are also provided for the binary block designs to be robust against the loss of data. Some classes of designs, robust up to three missing observations, are identified. A-efficiency of the residual design is evaluated for certain block designs for several patterns of two missing observations. The efficiency of the residual design has also been worked out when all the observations in any two blocks, not necessarily disjoint, are lost. The lower bound to A-efficiency has also been obtained for the loss of t observations. Finally, a general expression is obtained for the efficiency of the residual design when all the observations of m ( ≥1) disjoint blocks are lost.  相似文献   

8.
In this article, utilizing a scale mixture of skew-normal distribution in which mixing random variable is assumed to follow a mixture model with varying weights for each observation, we introduce a generalization of skew-normal linear regression model with the aim to provide resistant results. This model, which also includes the skew-slash distribution in a particular case, allows us to accommodate and detect outlying observations under the skew-normal linear regression model. Inferences about the model are carried out through the empirical Bayes approach. The conditions for propriety of the posterior and for existence of posterior moments are given under the standard noninformative priors for regression and scale parameters as well as proper prior for skewness parameter. Then, for Bayesian inference, a Markov chain Monte Carlo method is described. Since posterior results depend on the prior hyperparameters, we estimate them adopting the empirical Bayes method as well as using a Monte Carlo EM algorithm. Furthermore, to identify possible outliers, we also apply the Bayes factor obtained through the generalized Savage-Dickey density ratio. Examining the proposed approach on simulated instance and real data, it is found to provide not only satisfactory parameter estimates rather allow identifying outliers favorably.  相似文献   

9.
Parameter estimation is the first step in constructing control charts. One of these parameters is the process mean. The classical estimators of the process mean are sensitive to the presence of outlying data and subgroups which contaminate the whole data. In existing robust estimators for the process mean, the effects of the presence of the individual outliers are being considered, while, in this paper, a robust estimator is being proposed to reduce the effect of outlying subgroups as well as the individual outliers within a subgroup. The proposed estimator was compared with some classical and robust estimators of the process mean. Although, its relative efficiency is fourth among the estimators tested, its robustness and efficiency are large when the outlying subgroups are present. Evaluation of the results indicated that the proposed estimator is less sensitive to the presence of outliers and the process mean performs well when there are no individual outliers or outlying subgroups.  相似文献   

10.
The general problem of outlier detection and five recursive outlier detection procedures considered in the study are defined. The methods to compute powers, probabilities of detecting ≥1 outliers, and >1 observations including at least one inlier as outliers are computed and results are discussed. Results show that no procedure is most powerful when the actual number of outlier present in the sample is exactly, under-, and overestimated. The probabilities of inliers being detected as outliers are also substantial particularly when outliers occur only on one side of the sample  相似文献   

11.
In this paper we introduce a new method for detecting outliers in a set of proportions. It is based on the construction of a suitable two-way contingency table and on the application of an algorithm for the detection of outlying cells in such table. We exploit the special structure of the relevant contingency table to increase the efficiency of the method. The main properties of our algorithm, together with a guide for the choice of the parameters, are investigated through simulations, and in simple cases some theoretical justifications are provided. Several examples on synthetic data and an example based on pseudo-real data from biological experiments demonstrate the good performances of our algorithm.  相似文献   

12.
The paper addresses the problem of estimating missing observations in an infinite realization of a linear, possibly nonstationary, stochastic processes when the model is known. The general case of any possible distribution of missing observations in the time series is considered, and analytical expressions for the optimal estimators and their associated mean squared errors are obtained. These expressions involve solely the elements of the inverse or dual autocorrelation function of the series.

This optimal estimator -the conditional expectation of the missing observations given the available ones- is equal to the estimator that results from filling the missing values in the series with arbitrary numbers, treating these numbers as additive outliers, and removing with intervention analysis the outlier effects from the invented numbers.  相似文献   

13.
The income or expenditure-related data sets are often nonlinear, heteroscedastic, skewed even after the transformation, and contain numerous outliers. We propose a class of robust nonlinear models that treat outlying observations effectively without removing them. For this purpose, case-specific parameters and a related penalty are employed to detect and modify the outliers systematically. We show how the existing nonlinear models such as smoothing splines and generalized additive models can be robustified by the case-specific parameters. Next, we extend the proposed methods to the heterogeneous models by incorporating unequal weights. The details of estimating the weights are provided. Two real data sets and simulated data sets show the potential of the proposed methods when the nature of the data is nonlinear with outlying observations.  相似文献   

14.
For the data from multivariate t distributions, it is very hard to make an influence analysis based on the probability density function since its expression is intractable. In this paper, we present a technique for influence analysis based on the mixture distribution and EM algorithm. In fact, the multivariate t distribution can be considered as a particular Gaussian mixture by introducing the weights from the Gamma distribution. We treat the weights as the missing data and develop the influence analysis for the data from multivariate t distributions based on the conditional expectation of the complete-data log-likelihood function in the EM algorithm. Several case-deletion measures are proposed for detecting influential observations from multivariate t distributions. Two numerical examples are given to illustrate our methodology.  相似文献   

15.
Birnbaum-Saunders models have largely been applied in material fatigue studies and reliability analyses to relate the total time until failure with some type of cumulative damage. In many problems related to the medical field, such as chronic cardiac diseases and different types of cancer, a cumulative damage caused by several risk factors might cause some degradation that leads to a fatigue process. In these cases, BS models can be suitable for describing the propagation lifetime. However, since the cumulative damage is assumed to be normally distributed in the BS distribution, the parameter estimates from this model can be sensitive to outlying observations. In order to attenuate this influence, we present in this paper BS models, in which a Student-t distribution is assumed to explain the cumulative damage. In particular, we show that the maximum likelihood estimates of the Student-t log-BS models attribute smaller weights to outlying observations, which produce robust parameter estimates. Also, some inferential results are presented. In addition, based on local influence and deviance component and martingale-type residuals, a diagnostics analysis is derived. Finally, a motivating example from the medical field is analyzed using log-BS regression models. Since the parameter estimates appear to be very sensitive to outlying and influential observations, the Student-t log-BS regression model should attenuate such influences. The model checking methodologies developed in this paper are used to compare the fitted models.  相似文献   

16.
Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided.  相似文献   

17.
In geostatistics, detecting atypical observations is of special interest due to the changes they can cause in environmental and geological patterns. Several methods for detecting them have been already suggested for the univariate spatial case. However, the problem is more complicated when various variables are observed simultaneously and the spatial correlation among them must be taken into account. The aim of this paper is to detect outliers and influential observations in multivariate spatial linear models. For this purpose, we derive and explore two different methods. First, a multivariate version of the forward search algorithm is given, where locations with outliers are detected in the last steps of the procedure. Next, we derive influence measures to assess the impact of the observations on the multivariate spatial linear model. The procedures are easy to compute and to interpret by means of graphical representations. Finally, an example and a Monte Carlo study illustrate the performance of these methods for identification of outliers in multivariate spatial linear models.  相似文献   

18.
The detection of outliers and influential observations has received a great deal of attention in the statistical literature in the context of least-squares (LS) regression. However, the explanatory variables can be correlated with each other and alternatives to LS come out to address outliers/influential observations and multicollinearity, simultaneously. This paper proposes new influence measures based on the affine combination type regression for the detection of influential observations in the linear regression model when multicollinearity exists. Approximate influence measures are also proposed for the affine combination type regression. Since the affine combination type regression includes the ridge, the Liu and the shrunken regressions as special cases, influence measures under the ridge, the Liu and the shrunken regressions are also examined to see the possible effect that multicollinearity can have on the influence of an observation. The Longley data set is given illustrating the influence measures in affine combination type regression and also in ridge, Liu and shrunken regressions so that the performance of different biased regressions on detecting and assessing the influential observations is examined.  相似文献   

19.
This paper studies outlier detection and accommodation in general spatial models including spatial autoregressive models and spatial error model as special cases. Using mean-shift and variance-weight models respectively, test statistics for multiple outliers are derived and the detecting procedures are proposed. In addition, several key diagnostic measures such as standardized residuals and leverage measure are defined in general spatial models. Outlier modified models are proposed to accommodate outliers in the data set. The performance of test statistics, including size and power, are examined via simulation studies. Three real examples are analyzed and the results show that the proposed methodology is useful for identifying and accommodating outliers in general spatial models.  相似文献   

20.
Traditionally, sphericity (i.e., independence and homoscedasticity for raw data) is put forward as the condition to be satisfied by the variance–covariance matrix of at least one of the two observation vectors analyzed for correlation, for the unmodified t test of significance to be valid under the Gaussian and constant population mean assumptions. In this article, the author proves that the sphericity condition is too strong and a weaker (i.e., more general) sufficient condition for valid unmodified t testing in correlation analysis is circularity (i.e., independence and homoscedasticity after linear transformation by orthonormal contrasts), to be satisfied by the variance–covariance matrix of one of the two observation vectors. Two other conditions (i.e., compound symmetry for one of the two observation vectors; absence of correlation between the components of one observation vector, combined with a particular pattern of joint heteroscedasticity in the two observation vectors) are also considered and discussed. When both observation vectors possess the same variance–covariance matrix up to a positive multiplicative constant, the circularity condition is shown to be necessary and sufficient. “Observation vectors” may designate partial realizations of temporal or spatial stochastic processes as well as profile vectors of repeated measures. From the proof, it follows that an effective sample size appropriately defined can measure the discrepancy from the more general sufficient condition for valid unmodified t testing in correlation analysis with autocorrelated and heteroscedastic sample data. The proof is complemented by a simulation study. Finally, the differences between the role of the circularity condition in the correlation analysis and its role in the repeated measures ANOVA (i.e., where it was first introduced) are scrutinized, and the link between the circular variance–covariance structure and the centering of observations with respect to the sample mean is emphasized.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号