首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The income or expenditure-related data sets are often nonlinear, heteroscedastic, skewed even after the transformation, and contain numerous outliers. We propose a class of robust nonlinear models that treat outlying observations effectively without removing them. For this purpose, case-specific parameters and a related penalty are employed to detect and modify the outliers systematically. We show how the existing nonlinear models such as smoothing splines and generalized additive models can be robustified by the case-specific parameters. Next, we extend the proposed methods to the heterogeneous models by incorporating unequal weights. The details of estimating the weights are provided. Two real data sets and simulated data sets show the potential of the proposed methods when the nature of the data is nonlinear with outlying observations.  相似文献   

2.
ABSTRACT

This article proposes a development of detecting patches of additive outliers in autoregressive time series models. The procedure improves the existing detection methods via Gibbs sampling. We combine the Bayesian method and the Kalman smoother to present some candidate models of outlier patches and the best model with the minimum Bayesian information criterion (BIC) is selected among them. We propose that this combined Bayesian and Kalman method (CBK) can reduce the masking and swamping effects about detecting patches of additive outliers. The correctness of the method is illustrated by simulated data and then by analyzing a real set of observations.  相似文献   

3.
The Bayesian analysis of outliers using a non-informative prior for the parameters is non-trivial because models with different numbers of outliers have different dimensions. A quasi-Bayesian approach based on the Akaike's predictive likelihood is proposed for the analysis of regression outliers. It overcomes the dimensionality problem in Bayesian outlier analysis in which the likelihood of the outlier model is compensated by a correction factor adjusted for the number of outliers. The stack loss data set is analysed with satisfactory results.  相似文献   

4.
The least squares estimates of the parameters in the multistage dose-response model are unduly affected by outliers in a data set whereas the minimum sum of absolute errors, MSAE estimates are more resistant to outliers. Algorithms to compute the MSAE estimates can be tedious and computationally burdensome. We propose a linear approximation for the dose-response model that can be used to find the MSAE estimates by a simple and computationally less intensive algorithm. A few illustrative ex-amples and a Monte Carlo study show that we get comparable values of the MSAE estimates of the parameters in a dose-response model using the exact model and the linear approximation.  相似文献   

5.
The aim of this article is to analyse the effect of the level shift and temporary change outliers on the estimation of a model with conditional heteroscedasticity, a concept rarely dealt with up to now, the literature focusing more on additive outliers. To do this, we have conducted various Monte Carlo experiments in which the bias produced by these outliers is analysed.  相似文献   

6.
In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed.  相似文献   

7.
In this paper, we consider a regression model with non-spherical covariance structure and outliers in the response. The generalized least squares estimator obtained from the full data set is generally not used in the presence of outliers and an estimator based on only the non-outlying observations is preferred. Here we propose as an estimator a convex combination of the full set and the deleted set estimators and compare its performance with the other two.  相似文献   

8.
Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided.  相似文献   

9.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

10.
For longitudinal data, the within-subject dependence structure and covariance parameters may be of practical and theoretical interests. The estimation of covariance parameters has received much attention and been studied mainly in the framework of generalized estimating equations (GEEs). The GEEs method, however, is sensitive to outliers. In this paper, an alternative set of robust generalized estimating equations for both the mean and covariance parameters are proposed in the partial linear model for longitudinal data. The asymptotic properties of the proposed estimators of regression parameters, non-parametric function and covariance parameters are obtained. Simulation studies are conducted to evaluate the performance of the proposed estimators under different contaminations. The proposed method is illustrated with a real data analysis.  相似文献   

11.
12.
Tail probabilities are calculated by saddle-point approximation in a probabilistic-statistical model for the accumulated splice loss that results from a number of fusion splices in the installation of fibre-optic networks. When these probabilities, representing the risk of exceeding a specified total loss, can be controlled and kept low, the requirements on the individual losses can be substantially relaxed from their customary settings. As a consequence, it should be possible to save considerable installation time and cost. The probabilistic model, which can be theoretically motivated, states that the individual loss is basically exponentially distributed, but with a Gaussian contribution added and truncated at a set value, and that the loss is additive over splices. An extensive set of installation data fitted well with this model, except for occasional high losses. Therefore, the model described was extended to allow for a frequency of unspecified high losses of this sort. It is also indicated how the model parameters can be estimated from data.  相似文献   

13.
This paper examines Bayesian posterior probabilities as a function of selected elements within the set of data, x, when the prior distribution is assumed fixed. The posterior probabilities considered here are those of the parameter vector lying in a subset of the total parameter space. The theorems of this paper provide insight into the effect of elements within x on this posterior probability. These results have applications, for example, in the study of the impact of outliers within the data and in the isolation of misspecified parameters in a model.  相似文献   

14.
Compositional data are characterized by values containing relative information, and thus the ratios between the data values are of interest for the analysis. Due to specific features of compositional data, standard statistical methods should be applied to compositions expressed in a proper coordinate system with respect to an orthonormal basis. It is discussed how three-way compositional data can be analyzed with the Parafac model. When data are contaminated by outliers, robust estimates for the Parafac model parameters should be employed. It is demonstrated how robust estimation can be done in the context of compositional data and how the results can be interpreted. A real data example from macroeconomics underlines the usefulness of this approach.  相似文献   

15.
In this paper, we introduce an alternative semiparametric estimator of the fractional differencing parameter in ARFIMA models which is robust against additive outliers. The proposed estimator is a variant of the GPH estimator [Geweke, J., Porter-Hudak, S., 1983. The estimation and application of long memory time series model. Journal of Time Series Analysis 4, 221–238]. In particular, we use the robust sample autocorrelations of Ma, Y. and Genton, M. [2000. Highly robust estimation of the autocovariance function. Journal of Time Series Analysis 21, 663–684] to obtain an estimator for the spectral density of the process. Numerical results show that the estimator we propose for the differencing parameter is robust when the data contain additive outliers.  相似文献   

16.
The presence of contamination often called outlier is a very common attribute in data. Among other causes, outliers in a homoscedastic model make the model heteroscedastic. Moreover, outliers distort diagnostic tools for heteroscedasticity such that it may not be correctly identified. In this article, we show how outliers affect heteroscedasticity diagnostics. We then proposed a robust procedure for detecting heteroscedasticity in the presence of outliers by robustifying the non-robust component of the Goldfeld–Quandt (GQ) test. The performance of the proposed procedure is examined using simulation experiment and real data sets. The proposed procedure offers great improvement where the conventional GQ and other procedures fail.  相似文献   

17.
Outliers in multilevel data   总被引:2,自引:0,他引:2  
This paper offers the data analyst a range of practical procedures for dealing with outliers in multilevel data. It first develops several techniques for data exploration for outliers and outlier analysis and then applies these to the detailed analysis of outliers in two large scale multilevel data sets from educational contexts. The techniques include the use of deviance reduction, measures based on residuals, leverage values, hierarchical cluster analysis and a measure called DFITS. Outlier analysis is more complex in a multilevel data set than in, say, a univariate sample or a set of regression data, where the concept of an outlying value is straightforward. In the multilevel situation one has to consider, for example, at what level or levels a particular response is outlying, and in respect of which explanatory variables; furthermore, the treatment of a particular response at one level may affect its status or the status of other units at other levels in the model.  相似文献   

18.
Detection of outliers or influential observations is an important work in statistical modeling, especially for the correlated time series data. In this paper we propose a new procedure to detect patch of influential observations in the generalized autoregressive conditional heteroskedasticity (GARCH) model. Firstly we compare the performance of innovative perturbation scheme, additive perturbation scheme and data perturbation scheme in local influence analysis. We find that the innovative perturbation scheme give better result than other two schemes although this perturbation scheme may suffer from masking effects. Then we use the stepwise local influence method under innovative perturbation scheme to detect patch of influential observations and uncover the masking effects. The simulated studies show that the new technique can successfully detect a patch of influential observations or outliers under innovative perturbation scheme. The analysis based on simulation studies and two real data sets show that the stepwise local influence method under innovative perturbation scheme is efficient for detecting multiple influential observations and dealing with masking effects in the GARCH model.  相似文献   

19.
Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach.  相似文献   

20.
In this paper the estimation of high return period quantiles of the flood peak and volume in the Kolubara River basin are carried out. Estimation of flood frequencies is carried out on a data set containing high outliers which are identified by the Rosner’s test. Simultaneously, low outliers are determined by the multiple Grubbs–Beck. The next step involved the usage of the mixed distribution functions applied to a data set from three populations: floods with low outliers, normal floods and floods with high outliers. The contribution of the data set with low outliers is neglected, since it should underestimate the flood quantiles with large return periods. Consequently, the best fitted mixed distribution from the applied types (EV1, GEV, P3 and LP3) was determined by using the minimum standard error of fit.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号