期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Nonlinear regression models for heterogeneous data with massive outliers

Yoonsuh Jung 《Journal of applied statistics》2019,46(8):1456-1477

The income or expenditure-related data sets are often nonlinear, heteroscedastic, skewed even after the transformation, and contain numerous outliers. We propose a class of robust nonlinear models that treat outlying observations effectively without removing them. For this purpose, case-specific parameters and a related penalty are employed to detect and modify the outliers systematically. We show how the existing nonlinear models such as smoothing splines and generalized additive models can be robustified by the case-specific parameters. Next, we extend the proposed methods to the heterogeneous models by incorporating unequal weights. The details of estimating the weights are provided. Two real data sets and simulated data sets show the potential of the proposed methods when the nature of the data is nonlinear with outlying observations. 相似文献

2.

Combining Bayesian method and Kalman smoother for detection additive outlier patches in autoregressive time series

Farideh Mohammadinia Rahim Chinipardaz 《统计学通讯:模拟与计算》2013,42(7):2191-2209

ABSTRACT

This article proposes a development of detecting patches of additive outliers in autoregressive time series models. The procedure improves the existing detection methods via Gibbs sampling. We combine the Bayesian method and the Kalman smoother to present some candidate models of outlier patches and the best model with the minimum Bayesian information criterion (BIC) is selected among them. We propose that this combined Bayesian and Kalman method (CBK) can reduce the masking and swamping effects about detecting patches of additive outliers. The correctness of the method is illustrated by simulated data and then by analyzing a real set of observations. 相似文献

3.

A quasi-Bayesian analysis of regression outliers using Akaike's predictive likelihood

W. K. Fung 《Statistical Papers》1993,34(1):133-141

The Bayesian analysis of outliers using a non-informative prior for the parameters is non-trivial because models with different numbers of outliers have different dimensions. A quasi-Bayesian approach based on the Akaike's predictive likelihood is proposed for the analysis of regression outliers. It overcomes the dimensionality problem in Bayesian outlier analysis in which the likelihood of the outlier model is compensated by a correction factor adjusted for the number of outliers. The stack loss data set is analysed with satisfactory results. 相似文献

4.

An iterative procedure for the estimation of parameters in a dose-response model

Carmen D.S. André Clóvis A. Peres Subhash C. Narula 《统计学通讯:模拟与计算》2013,42(2-3):763-775

The least squares estimates of the parameters in the multistage dose-response model are unduly affected by outliers in a data set whereas the minimum sum of absolute errors, MSAE estimates are more resistant to outliers. Algorithms to compute the MSAE estimates can be tedious and computationally burdensome. We propose a linear approximation for the dose-response model that can be used to find the MSAE estimates by a simple and computationally less intensive algorithm. A few illustrative ex-amples and a Monte Carlo study show that we get comparable values of the MSAE estimates of the parameters in a dose-response model using the exact model and the linear approximation. 相似文献

5.

Effects of level shifts and temporary changes on the estimation of GARCH models

《Journal of Statistical Computation and Simulation》2012,82(6):667-688

The aim of this article is to analyse the effect of the level shift and temporary change outliers on the estimation of a model with conditional heteroscedasticity, a concept rarely dealt with up to now, the literature focusing more on additive outliers. To do this, we have conducted various Monte Carlo experiments in which the bias produced by these outliers is analysed. 相似文献

6.

A simple diagnostic method of outlier detection for stationary Gaussian time series 总被引：1，自引：0，他引：1

Yuzhi Cai Neville Davies 《Journal of applied statistics》2003,30(2):205-223

In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed. 相似文献

7.

Estimation of regression parameters in the presence of outliers in the response

Sugata Sen Roy Sibnarayan Guria 《Statistics》2013,47(6):531-539

In this paper, we consider a regression model with non-spherical covariance structure and outliers in the response. The generalized least squares estimator obtained from the full data set is generally not used in the presence of outliers and an estimator based on only the non-outlying observations is preferred. Here we propose as an estimator a convex combination of the full set and the deleted set estimators and compare its performance with the other two. 相似文献

8.

Maximum studentized score tests for the detection of outliers in time series regression models

《Journal of Statistical Computation and Simulation》2012,82(12):1355-1372

Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided. 相似文献

9.

Robust variable selection in finite mixture of regression models using the t distribution

Lin Dai Junhui Yin Zhengfen Xie 《统计学通讯:理论与方法》2013,42(21):5370-5386

Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set. 相似文献

10.

Robust estimation of covariance parameters in partial linear model for longitudinal data

Guoyou Qin Zhongyi Zhu Wing K. Fung 《Journal of statistical planning and inference》2009

For longitudinal data, the within-subject dependence structure and covariance parameters may be of practical and theoretical interests. The estimation of covariance parameters has received much attention and been studied mainly in the framework of generalized estimating equations (GEEs). The GEEs method, however, is sensitive to outliers. In this paper, an alternative set of robust generalized estimating equations for both the mean and covariance parameters are proposed in the partial linear model for longitudinal data. The asymptotic properties of the proposed estimators of regression parameters, non-parametric function and covariance parameters are obtained. Simulation studies are conducted to evaluate the performance of the proposed estimators under different contaminations. The proposed method is illustrated with a real data analysis. 相似文献

11.

Specification error caused by level shifts and temporary changes in ARMA–GARCH models

《Journal of Statistical Computation and Simulation》2012,82(9):853-868

相似文献

12.

Statistical modelling and saddle-point approximation of tail probabilities for accumulated splice loss in fibre-optic networks

J. Tyrcha P. Sundberg B. Lindskog Sundstrom 《Journal of applied statistics》2000,27(2):245-256

Tail probabilities are calculated by saddle-point approximation in a probabilistic-statistical model for the accumulated splice loss that results from a number of fusion splices in the installation of fibre-optic networks. When these probabilities, representing the risk of exceeding a specified total loss, can be controlled and kept low, the requirements on the individual losses can be substantially relaxed from their customary settings. As a consequence, it should be possible to save considerable installation time and cost. The probabilistic model, which can be theoretically motivated, states that the individual loss is basically exponentially distributed, but with a Gaussian contribution added and truncated at a set value, and that the loss is additive over splices. An extensive set of installation data fitted well with this model, except for occasional high losses. Therefore, the model described was extended to allow for a frequency of unspecified high losses of this sort. It is also indicated how the model parameters can be estimated from data. 相似文献

13.

Effect of the sample on the posterior probability in bayesian analysis

James C. Spall 《统计学通讯:理论与方法》2013,42(6):1811-1827

This paper examines Bayesian posterior probabilities as a function of selected elements within the set of data, x, when the prior distribution is assumed fixed. The posterior probabilities considered here are those of the parameter vector lying in a subset of the total parameter space. The theorems of this paper provide insight into the effect of elements within x on this posterior probability. These results have applications, for example, in the study of the impact of outliers within the data and in the isolation of misspecified parameters in a model. 相似文献

14.

A robust Parafac model for compositional data

M. A. Di Palma P. Filzmoser M. Gallo K. Hron 《Journal of applied statistics》2018,45(8):1347-1369

Compositional data are characterized by values containing relative information, and thus the ratios between the data values are of interest for the analysis. Due to specific features of compositional data, standard statistical methods should be applied to compositions expressed in a proper coordinate system with respect to an orthonormal basis. It is discussed how three-way compositional data can be analyzed with the Parafac model. When data are contaminated by outliers, robust estimates for the Parafac model parameters should be employed. It is demonstrated how robust estimation can be done in the context of compositional data and how the results can be interpreted. A real data example from macroeconomics underlines the usefulness of this approach. 相似文献

15.

Robust estimation in long-memory processes under additive outliers

Fabio Fajardo Molinares Valdério Anselmo Reisen Francisco Cribari-Neto 《Journal of statistical planning and inference》2009

In this paper, we introduce an alternative semiparametric estimator of the fractional differencing parameter in ARFIMA models which is robust against additive outliers. The proposed estimator is a variant of the GPH estimator [Geweke, J., Porter-Hudak, S., 1983. The estimation and application of long memory time series model. Journal of Time Series Analysis 4, 221–238]. In particular, we use the robust sample autocorrelations of Ma, Y. and Genton, M. [2000. Highly robust estimation of the autocovariance function. Journal of Time Series Analysis 21, 663–684] to obtain an estimator for the spectral density of the process. Numerical results show that the estimator we propose for the differencing parameter is robust when the data contain additive outliers. 相似文献

16.

An outlier-resistant test for heteroscedasticity in linear models

Ekele Alih 《Journal of applied statistics》2015,42(8):1617-1634

The presence of contamination often called outlier is a very common attribute in data. Among other causes, outliers in a homoscedastic model make the model heteroscedastic. Moreover, outliers distort diagnostic tools for heteroscedasticity such that it may not be correctly identified. In this article, we show how outliers affect heteroscedasticity diagnostics. We then proposed a robust procedure for detecting heteroscedasticity in the presence of outliers by robustifying the non-robust component of the Goldfeld–Quandt (GQ) test. The performance of the proposed procedure is examined using simulation experiment and real data sets. The proposed procedure offers great improvement where the conventional GQ and other procedures fail. 相似文献

17.

Outliers in multilevel data 总被引：2，自引：0，他引：2

I. H. Langford & T. Lewis 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1998,161(2):121-160

This paper offers the data analyst a range of practical procedures for dealing with outliers in multilevel data. It first develops several techniques for data exploration for outliers and outlier analysis and then applies these to the detailed analysis of outliers in two large scale multilevel data sets from educational contexts. The techniques include the use of deviance reduction, measures based on residuals, leverage values, hierarchical cluster analysis and a measure called DFITS. Outlier analysis is more complex in a multilevel data set than in, say, a univariate sample or a set of regression data, where the concept of an outlying value is straightforward. In the multilevel situation one has to consider, for example, at what level or levels a particular response is outlying, and in respect of which explanatory variables; furthermore, the treatment of a particular response at one level may affect its status or the status of other units at other levels in the model. 相似文献

18.

Stepwise local influence in generalized autoregressive conditional heteroskedasticity models

Lei Shi Md. Mostafizur Rahman Wen Gan Jianhua Zhao 《Journal of applied statistics》2015,42(2):428-444

Detection of outliers or influential observations is an important work in statistical modeling, especially for the correlated time series data. In this paper we propose a new procedure to detect patch of influential observations in the generalized autoregressive conditional heteroskedasticity (GARCH) model. Firstly we compare the performance of innovative perturbation scheme, additive perturbation scheme and data perturbation scheme in local influence analysis. We find that the innovative perturbation scheme give better result than other two schemes although this perturbation scheme may suffer from masking effects. Then we use the stepwise local influence method under innovative perturbation scheme to detect patch of influential observations and uncover the masking effects. The simulated studies show that the new technique can successfully detect a patch of influential observations or outliers under innovative perturbation scheme. The analysis based on simulation studies and two real data sets show that the stepwise local influence method under innovative perturbation scheme is efficient for detecting multiple influential observations and dealing with masking effects in the GARCH model. 相似文献

19.

A comparison of different procedures for principal component analysis in the presence of outliers

B. Bariş Alkan Cemal Atakan Nesrin Alkan 《Journal of applied statistics》2015,42(8):1716-1722

Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach. 相似文献

20.

Estimation of flood frequencies from data sets with outliers using mixed distribution functions

Milan Stojković Stevan Prohaska Nikola Zlatanović 《Journal of applied statistics》2017,44(11):2017-2035

In this paper the estimation of high return period quantiles of the flood peak and volume in the Kolubara River basin are carried out. Estimation of flood frequencies is carried out on a data set containing high outliers which are identified by the Rosner’s test. Simultaneously, low outliers are determined by the multiple Grubbs–Beck. The next step involved the usage of the mixed distribution functions applied to a data set from three populations: floods with low outliers, normal floods and floods with high outliers. The contribution of the data set with low outliers is neglected, since it should underestimate the flood quantiles with large return periods. Consequently, the best fitted mixed distribution from the applied types (EV1, GEV, P3 and LP3) was determined by using the minimum standard error of fit. 相似文献