首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
This article builds on the test proposed by Lyhagen [The seasonal KPSS statistic, Econom. Bull. 3 (2006), pp. 1–9] for seasonal time series and having the null hypothesis of level stationarity against the alternative of unit root behaviour at some or all of the zero and seasonal frequencies. This new test is qualified as seasonal-frequency Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test and it is not originally supported by a regression framework.

The purpose of this paper is twofold. Firstly, we propose a model-based regression method and provide a clear illustration of Lyhagen's test and we establish its asymptotic theory in the time domain. Secondly, we use the Monte Carlo method to study the finite-sample performance of the seasonal KPSS test in the presence of additive outliers. Our simulation analysis shows that this test is robust to the magnitude and the number of outliers and the statistical results obtained cast an overall good performance of the test finite-sample properties.  相似文献   

Consider the linear regression model y =β01 ++ in the usual notation. It is argued that the class of ordinary ridge estimators obtained by shrinking the least squares estimator by the matrix (X1X + kI)-1X'X is sensitive to outliers in the ^variable. To overcome this problem, we propose a new class of ridge-type M-estimators, obtained by shrinking an M-estimator (instead of the least squares estimator) by the same matrix. Since the optimal value of the ridge parameter k is unknown, we suggest a procedure for choosing it adaptively. In a reasonably large scale simulation study with a particular M-estimator, we found that if the conditions are such that the M-estimator is more efficient than the least squares estimator then the corresponding ridge-type M-estimator proposed here is better, in terms of a Mean Squared Error criteria, than the ordinary ridge estimator with k chosen suitably. An example illustrates that the estimators proposed here are less sensitive to outliers in the y-variable than ordinary ridge estimators.  相似文献   

Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided.  相似文献   

This article discusses the estimation of the parameter function for a functional linear regression model under heavy-tailed errors' distributions and in the presence of outliers. Standard approaches of reducing the high dimensionality, which is inherent in functional data, are considered. After reducing the functional model to a standard multiple linear regression model, a weighted rank-based procedure is carried out to estimate the regression parameters. A Monte Carlo simulation and a real-world example are used to show the performance of the proposed estimator and a comparison made with the least-squares and least absolute deviation estimators.  相似文献   

In this paper, we consider the problem of robust estimation of the fractional parameter, d, in long memory autoregressive fractionally integrated moving average processes, when two types of outliers, i.e. additive and innovation, are taken into account without knowing their number, position or intensity. The proposed method is a weighted likelihood estimation (WLE) approach for which needed definitions and algorithm are given. By an extensive Monte Carlo simulation study, we compare the performance of the WLE method with the performance of both the approximated maximum likelihood estimation (MLE) and the robust M-estimator proposed by Beran (Statistics for Long-Memory Processes, Chapman & Hall, London, 1994). We find that robustness against the two types of considered outliers can be achieved without loss of efficiency. Moreover, as a byproduct of the procedure, we can classify the suspicious observations in different kinds of outliers. Finally, we apply the proposed methodology to the Nile River annual minima time series.  相似文献   

The presence of extreme outliers in the upper tail data of income distribution affects the Pareto tail modeling. A simulation study is carried out to compare the performance of three types of boxplot in the detection of extreme outliers for Pareto data, including standard boxplot, adjusted boxplot and generalized boxplot. It is found that the generalized boxplot is the best method for determining extreme outliers for Pareto distributed data. For the application, the generalized boxplot is utilized for determining the exreme outliers in the upper tail of Malaysian income distribution. In addition, for this data set, the confidence interval method is applied for examining the presence of dragon-kings, extreme outliers which are beyond the Pareto or power-laws distribution.  相似文献   

Outliers can occur as readily in samples from the finite populations (e.g. in sample surveys) as in samples from infinite populations. However, in the vast literature on outliers there is almost no mention of outlier tests for data from sample surveys. We examine the behaviour of some standard outlier test statistics for infinite populations when these are applied to finite populations, examining their properties by extensive simulation studies. Some anomalous results are obtained Nsuggesting a fundamental difficulty in testing outliers for the finite population case.  相似文献   

The presence of outliers in the data sets affects the structure of multicollinearity which arises from a high degree of correlation between explanatory variables in a linear regression analysis. This affect could be seen as an increase or decrease in the diagnostics used to determine multicollinearity. Thus, the cases of outliers reduce the reliability of diagnostics such as variance inflation factors, condition numbers and variance decomposition proportions. In this study, we propose to use a robust estimation of the correlation matrix obtained by the minimum covariance determinant method to determine the diagnostics of multicollinearity in the presence of outliers. As a result, the present paper demonstrates that the diagnostics of multicollinearity obtained by the robust estimation of the correlation matrix are more reliable in the presence of outliers.  相似文献   

The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.  相似文献   

Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach.  相似文献   

It is well known that in a traditional outlier-free situation, the generalized quasi-likelihood (GQL) approach [B.C. Sutradhar, On exact quasilikelihood inference in generalized linear mixed models, Sankhya: Indian J. Statist. 66 (2004), pp. 261–289] performs very well to obtain the consistent as well as the efficient estimates for the parameters involved in the generalized linear mixed models (GLMMs). In this paper, we first examine the effect of the presence of one or more outliers on the GQL estimation for the parameters in such GLMMs, especially in two important models such as count and binary mixed models. The outliers appear to cause serious biases and hence inconsistency in the estimation. As a remedy, we then propose a robust GQL (RGQL) approach in order to obtain the consistent estimates for the parameters in the GLMMs in the presence of one or more outliers. An extensive simulation study is conducted to examine the consistency performance of the proposed RGQL approach.  相似文献   

This paper studies methods for simple estimation of the exponential mean parameter in small samples in the presence of outliers. Existing estimation methods are discussed. Adaptation of these methods to allow for Type I censoring is investigated. New robust procedures are proposed. A series of simulation experiments Indicate trimming provides significant protection against outliers while the premium is usually small when trimming uncontarninated samples. A linearly weighted mean is recommended for uncontarninated samples, both censored and complete. In larger samples, (n - 20), the proposed Huber-type estimator performs quite well in all situations of censoring and contarnination  相似文献   

In this paper, we suggest a least squares procedure for the determination of the number of upper outliers in an exponential sample by minimizing sample mean squared error. Moreover, the method can reduce the masking or “swamping” effects. In addition, we have also found that the least squares procedure is easy and simple to compute than test test procedure T k suggested by Zhang (1998) for determining the number of upper outliers, since Zhang (1998) need to use the complicated null distribution of T k . Moreover, we give three practical examples and a simulated example to illustrate the procedures. Further, simulation studies are given to show the advantages of the proposed method. Finally, the proposed least squares procedure can also determine the number of upper outliers in other continuous univariate distributions (for example, Pareto, Gumbel, Weibull, etc.). Received: May 10, 1999; revised version: June 5, 2000  相似文献   

Support Vector Regression (SVR) is gaining in popularity in the detection of outliers and classification problems in high-dimensional data (HDD) as this technique does not require the data to be of full rank. In real application, most of the data are of high dimensional. Classification of high-dimensional data is needed in applied sciences, in particular, as it is important to discriminate cancerous cells from non-cancerous cells. It is also imperative that outliers are identified before constructing a model on the relationship between the dependent and independent variables to avoid misleading interpretations about the fitting of a model. The standard SVR and the μ-ε-SVR are able to detect outliers; however, they are computationally expensive. The fixed parameters support vector regression (FP-ε-SVR) was put forward to remedy this issue. However, the FP-ε-SVR using ε-SVR is not very successful in identifying outliers. In this article, we propose an alternative method to detect outliers i.e. by employing nu-SVR. The merit of our proposed method is confirmed by three real examples and the Monte Carlo simulation. The results show that our proposed nu-SVR method is very successful in identifying outliers under a variety of situations, and with less computational running time.  相似文献   

The robust principal components analysis (RPCA) introduced by Campbell (Applied Statistics 1980, 29, 231–237) provides in addition to robust versions of the usual output of a principal components analysis, weights for the contribution of each point to the robust estimation of each component. Low weights may thus be used to indicate outliers. The present simulation study provides critical values for testing the kth smallest weight in the RPCA of a sample of n p-dimensional vectors, under the null hypothesis of a multivariate normal distribution. The cases p=2(2)10, 15, 20 for n=20, 30, 40, 50, 75, 100 subject to n≥p/2, are examined, with k≤√n.  相似文献   

The power of some rank tests, used for testing the hypothesis of shift, is found when the underlying distributions contain outliers. The outliers are assumed to occur as the result of mixing two normal distributions with common variance. A small sample case shows how the scores for the rank tests are found and the exact power is computed for each of these rank tests. A Monte Carlo study provides an estimate of the power of the usual two sample t-test.  相似文献   

We provide a method for simultaneous variable selection and outlier identification using the mean-shift outlier model. The procedure consists of two steps: the first step is to identify potential outliers, and the second step is to perform all possible subset regressions for the mean-shift outlier model containing the potential outliers identified in step 1. This procedure is helpful for model selection while simultaneously considering outlier identification, and can be used to identify multiple outliers. In addition, we can evaluate the impact on the regression model of simultaneous omission of variables and interesting observations. In an example, we provide detailed output from the R system, and compare the results with those using posterior model probabilities as proposed by Hoeting et al. [Comput. Stat. Data Anal. 22 (1996), pp. 252-270] for simultaneous variable selection and outlier identification.  相似文献   

A robust test of a parameter while in the presence of nuisance parameters was proposed by Wang (1981). The test procedure is a robust extension of the optimal C(α) tests. A numerical method for computing the solution of the orthogonality condition that is required by the test procedure is provided. An example on the testing of normal scale while in the presence of outliers is worked out to illustrate the construction of the robust test.  相似文献   

Many methods have been developed for detecting multiple outliers in a single multivariate sample, but very few for the case where there may be groups in the data. We propose a method of simultaneously determining groups (as in cluster analysis) and detecting outliers, which are points that are distant from every group. Our method is an adaptation of the BACON algorithm proposed by Billor, Hadi and Velleman for the robust detection of multiple outliers in a single group of multivariate data. There are two versions of our method, depending on whether or not the groups can be assumed to have equal covariance matrices. The effectiveness of the method is illustrated by its application to two real data sets and further shown by a simulation study for different sample sizes and dimensions for 2 and 3 groups, with and without planted outliers in the data. When the number of groups is not known in advance, the algorithm could be used as a robust method of cluster analysis, by running it for various numbers of groups and choosing the best solution.  相似文献   

The usefulness of an extra sum of squares statistics QK for detecting K outliers has been discussed previously in the context of two-way tables. (See Gentleman and Wilk, 1975a, 1975b; John and Draper 1978; and Draper and John, 1980,) That work is extended here to straight line regression situations arising from, and motivated by, a specific set of research data. Percentage points for the appropriate test statistics are obtained by simulation, and approximations for these percentage points are suggested. Power calculations made for various designs and outlier situations are briefly summarized.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号