期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A simple diagnostic method of outlier detection for stationary Gaussian time series 总被引：1，自引：0，他引：1

Yuzhi Cai Neville Davies 《Journal of applied statistics》2003,30(2):205-223

In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed. 相似文献

2.

A note on estimating regression coefficients with missing data

Donald Lien David Rearden 《Econometric Reviews》1992,11(1):119-122

In this note, we consider the problem of estimating regression coefficients when there are missing observations of some explanatory variables. Following Dagenais (1973), Gourieroux and Monfort (1981), and Conniffe (1983a, 1983b), we assume auxiliary relationships exist among explanatory varibles. Several estimatprs and their interrelationships are discussed. We begin with the model of Gourieroux and

Monfort (1981) 相似文献

3.

Analytical approach for determining the cost optimum raw material quality level

R. Chen O. Hawaleshka D. R. Strong 《Journal of applied statistics》1995,22(3):375-388

In this paper, the expected cost of a raw

material quality characteristic is determined and

the cost optimum quality level is found. A series

of piecewise linear functions is used torepresent a general cost function. Examples are given in

which the distributions of quality characteristics are treated as being either

uniform or normal. The relationship between a raw

material characteristic and manufacturing cost is

assumed to be known

Determining cost optimum quality level Manitoba 相似文献

4.

Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model

Sung-Soo Kim Sung H. Park W. J. Krzanowski 《Journal of applied statistics》2008,35(3):283-291

We provide a method for simultaneous variable selection and outlier identification using the mean-shift outlier model. The procedure consists of two steps: the first step is to identify potential outliers, and the second step is to perform all possible subset regressions for the mean-shift outlier model containing the potential outliers identified in step 1. This procedure is helpful for model selection while simultaneously considering outlier identification, and can be used to identify multiple outliers. In addition, we can evaluate the impact on the regression model of simultaneous omission of variables and interesting observations. In an example, we provide detailed output from the R system, and compare the results with those using posterior model probabilities as proposed by Hoeting et al. [Comput. Stat. Data Anal. 22 (1996), pp. 252-270] for simultaneous variable selection and outlier identification. 相似文献

5.

Detecting outliers: power and some other considerations

Ram B. Jain 《统计学通讯:理论与方法》2013,42(22):2299-2314

The general problem of outlier detection and five recursive outlier detection procedures considered in the study are defined. The methods to compute powers, probabilities of detecting ≥1 outliers, and >1 observations including at least one inlier as outliers are computed and results are discussed. Results show that no procedure is most powerful when the actual number of outlier present in the sample is exactly, under-, and overestimated. The probabilities of inliers being detected as outliers are also substantial particularly when outliers occur only on one side of the sample 相似文献

6.

A note on contamination models and outliers

Järgen Wellmann Ursula Gather 《统计学通讯:理论与方法》2013,42(8):1793-1802

In order to describe or generate so-called outliers in univariate statistical data, contamination models are often used. These models assume that k out of n independent random variables are shifted or multiplicated by some constant, whereas the other observations still come i.i.d. from some common target distribution. Of course, these contaminants do not necessarily stick out as the extremes in the sample. Moreover, it is the amount and magnitude of ‘contamination” which determines the number of obvious outliers. Using the concept of Davies and Gather (1993) to formalize the outlier notion we quantify the amount of contamination needed to produce a prespecified expected number of ‘genuine’ outliers. In particular, we demonstrate that for sample of moderate size from a normal target distribution a rather large shift of the contaminants is necessary to yield a certain expected number of outliers. Such an insight is of interest when designing simulation studies where outliers shoulod occur as well as in theoretical investigations on outliers. 相似文献

7.

Robust ridge and robust Liu estimator for regression based on the LTS estimator 总被引：1，自引：0，他引：1

Betül Kan Özlem Alpu Berna Yazıcı 《Journal of applied statistics》2013,40(3):644-655

In the multiple linear regression analysis, the ridge regression estimator and the Liu estimator are often used to address multicollinearity. Besides multicollinearity, outliers are also a problem in the multiple linear regression analysis. We propose new biased estimators based on the least trimmed squares (LTS) ridge estimator and the LTS Liu estimator in the case of the presence of both outliers and multicollinearity. For this purpose, a simulation study is conducted in order to see the difference between the robust ridge estimator and the robust Liu estimator in terms of their effectiveness; the mean square error. In our simulations, the behavior of the new biased estimators is examined for types of outliers: X-space outlier, Y-space outlier, and X-and Y-space outlier. The results for a number of different illustrative cases are presented. This paper also provides the results for the robust ridge regression and robust Liu estimators based on a real-life data set combining the problem of multicollinearity and outliers. 相似文献

8.

Outlier detection and robust variable selection via the penalized weighted LAD-LASSO method

Yunlu Jiang Yan Wang Jiantao Zhang Baojian Xie Jibiao Liao Wenhui Liao 《Journal of applied statistics》2021,48(2):234

This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets. 相似文献

9.

Multiple outliers detection in sparse high-dimensional regression

Tao Wang Qun Li Bin Chen 《Journal of Statistical Computation and Simulation》2018,88(1):89-107

The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data. 相似文献

10.

Using a mixture model for multiple imputation in the presence of outliers: the 'Healthy for life' project

Michael R. Elliott Nicolas Stettler 《Journal of the Royal Statistical Society. Series C, Applied statistics》2007,56(1):63-78

Summary. We consider the problem of obtaining population-based inference in the presence of missing data and outliers in the context of estimating the prevalence of obesity and body mass index measures from the 'Healthy for life' study. Identifying multiple outliers in a multivariate setting is problematic because of problems such as masking, in which groups of outliers inflate the covariance matrix in a fashion that prevents their identification when included, and swamping, in which outliers skew covariances in a fashion that makes non-outlying observations appear to be outliers. We develop a latent class model that assumes that each observation belongs to one of K unobserved latent classes, with each latent class having a distinct covariance matrix. We consider the latent class covariance matrix with the largest determinant to form an 'outlier class'. By separating the covariance matrix for the outliers from the covariance matrices for the remainder of the data, we avoid the problems of masking and swamping. As did Ghosh-Dastidar and Schafer, we use a multiple-imputation approach, which allows us simultaneously to conduct inference after removing cases that appear to be outliers and to promulgate uncertainty in the outlier status through the model inference. We extend the work of Ghosh-Dastidar and Schafer by embedding the outlier class in a larger mixture model, consider penalized likelihood and posterior predictive distributions to assess model choice and model fit, and develop the model in a fashion to account for the complex sample design. We also consider the repeated sampling properties of the multiple imputation removal of outliers. 相似文献

11.

Specification of household engel curves by nonparametric regression 总被引：1，自引：0，他引：1

Herman J. Bierens Hettie A. Pott-Buter 《Econometric Reviews》1990,9(2):123-184

This paper demonstrates the usefulness of nonparametric regression analysis for functional specfication of houshold Engel curves.

After a brief review in section 2 of the literature on demand functions and equivalence scales and the functional specifications used, we first discuss in section 3 the issues of using income versus total expenditure, the origin and nature of the error terms in the light of utility theroy, and the interpretation of empirical demand functions. we shall reach the unorthodox view that household demand functions should be interpreted as conditional expectations relative to prices, household composition and either income or the conditional expectation of total expenditure (rather that total expenditure itself), where the latter conditional expectation is taken relative to income, prices and household composition. these two forms appear to be equivalent. this result also solves the simultaneity problem: the error variance matrix is no longer singular. Moreover, the errors are in general heteroskedastic.

In section 4 we discuss the model and the data, and in section 5 we review the nonparametric kernal regression approach.

In section 6 we derive the functional form of our household engel curves from nonparametric regression results, using the 1980 budget survey for the netherlands, in order to avoid model misspecification. thus the modl is derived directly from the data, without restricting its functional form. the nonparametric regression results are then translated to suitable parametric functional specifications, i.e., we choose parametric functional forms in accordance with the nanparametric regression results. these parametric specification are estimated by least squares, and various parameter restrictions are tested in order to simplify the models. this yields very simple final specifications of the household engel curves involved, namely linear functions of income and the number of children in two age groups. 相似文献

12.

An outlier detection scheme for dynamical sequential datasets

Shiliang Zhang Zonglin Ye Yanbin Zhang Xiali Hei 《统计学通讯:模拟与计算》2019,48(5):1450-1502

Outlier detection plays an important role in the pre-treatment of sequential datasets to obtain pure valuable data. This paper proposes an outlier detection scheme for dynamical sequential datasets. First, the conception of forward outlier factor(FOF) and backward outlier factor(BOF) are employed to measure an object’s similarity shared with its sequentially adjacent objects. The object that shows no similarity with its sequential neighbors is labeled as suspicious outliers, which will be treated subsequently to judge whether it is really an outlier in the dataset. Second, the sequentially adjacent suspicious outliers are defined as suspicious outlier series(SOS), then the expected path representing the ideal transition path through the suspicious outliers in the SOS and the measured path representing the real path through all the objects in the SOS are employed, and the ratio of the length of the expected path to that of the measured path indicates whether there exist outliers in the SOS. Third, in the case that there exist outliers in the SOS, if there are N suspicious outliers in the SOS, then 2^N ? 2 remaining path will be generated by removing k(0 < k < N) suspicious outliers and sequentially connecting the remaining ones. The dynamical sequential outlier factor(DSOF) is employed to represent the ratio of the length of measured path of the considered remaining path to the that of the the expected path of the corresponding SOS, and the degree of the objects removed in a remaining path being outliers is indicated by the DSOF. The proposed outlier detection scheme is conducted from a dynamical perspective, and breaks the tight relation between being an outlier and being not similar with adjacent objects. Experiments are conducted to evaluate the effectiveness of the proposed scheme, and the experimental results verify that the proposed scheme has higher detection quality for sequential dataset. In addition, the proposed outlier detection scheme is not dependent on the size of dataset and needs no prior information about the distribution of the data. 相似文献

13.

Outlier identification and robust parameter estimation in a zero-inflated Poisson model

Jun Yang Min Xie Thong Ngee Goh 《Journal of applied statistics》2011,38(2):421-430

The Zero-inflated Poisson distribution has been used in the modeling of count data in different contexts. This model tends to be influenced by outliers because of the excessive occurrence of zeroes, thus outlier identification and robust parameter estimation are important for such distribution. Some outlier identification methods are studied in this paper, and their applications and results are also presented with an example. To eliminate the effect of outliers, two robust parameter estimates are proposed based on the trimmed mean and the Winsorized mean. Simulation results show the robustness of our proposed parameter estimates. 相似文献

14.

Estimating individual response coefficients in varying coefficients regression models

K. P. Kalirajan M. B. Obwona 《Journal of applied statistics》1995,22(4):477-484

The objective of this paper is to suggest a method of estimating the individual response coefficients in varying coefficients regression models. An empirical

application of the method is demonstrated, using farm-level micro data from the Philip-

pines. 相似文献

15.

A Comparison of Multiple Outlier Detection Methods for Regression Data

Nedret Billor Gulsen Kiral 《统计学通讯:模拟与计算》2013,42(3):521-545

The problem of outliers in statistical data has attracted many researchers for a long time. Consequently, numerous outlier detection methods have been proposed in the statistical literature. However, no consensus has emerged as to which method is uniformly better than the others or which one is recommended for use in practical situations. In this article, we perform an extensive comparative Monte Carlo simulation study to assess the performance of the multiple outlier detection methods that are either recently proposed or frequently cited in the outlier detection literature. Our simulation experiments include a wide variety of realistic and challenging regression scenarios. We give recommendations on which method is superior to others under what conditions. 相似文献

16.

Maximum studentized score tests for the detection of outliers in time series regression models

《Journal of Statistical Computation and Simulation》2012,82(12):1355-1372

Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided. 相似文献

17.

Adaptive wavelet empirical Bayes estimation of a location or a scale parameter

Marianna Pensky 《Journal of statistical planning and inference》2000,90(2):275-292

Assume that in independent two-dimensional random vectors (X₁,θ₁),…,(X_n,θ_n), each θ_i is distributed according to some unknown prior density function g. Also, given θ_i=θ, X_i has the conditional density function q(x−θ), x,θ(−∞,∞) (a location parameter case), or θ⁻¹q(x/θ), x,θ(0,∞) (a scale parameter case). In each pair the first component is observable, but the second is not. After the (n+1)th pair (X_n+1,θ_n+1) is obtained, the objective is to construct an empirical Bayes (EB) estimator of θ. In this paper we derive the EB estimators of θ based on a wavelet approximation with Meyer-type wavelets. We show that these estimators provide adaptation not only in the case when g belongs to the Sobolev space H with an unknown , but also when g is supersmooth. 相似文献

18.

Discordancy Tests Based on the Likelihood Ratio for the Bipolar Watson Distribution on the Hypersphere

Adelaide Figueiredo 《统计学通讯:模拟与计算》2013,42(2):413-421

As the Watson distribution is frequently used for modeling axial data, it is important to investigate the existence of possible outliers in samples from this distribution. Then, we develop for the bipolar Watson distribution defined on the hypersphere, some tests of discordancy of an outlier or several outliers en bloc based on the likelihood ratio, supposing an alternative model of contamination of slippage type. We evaluate the performance of these tests of discordancy of an outlier and we also compare some tests of discordancy of an outlier available for this distribution. 相似文献

19.

Bayesian outlier testing using the predictive distribution for a linear model op constant intraclass form

Barry K. Noser Virgil R. Marco 《统计学通讯:理论与方法》2013,42(3):849-860

The problem of testing suspected outliers from a linear model with constant intraclass correlation is considered from a Bayesian viewpoint. The main objective of this paper is to develop an outlier test procedure based on the predictive distribution of suspected outlier observations given a set of existing inlier observations. The test procedure is easily performed with the usual F and t distributions. 相似文献

20.

Multiple chain-deferred inspection plans and their compatibility with the multiple plans in MIL-STD-105D and equivalent schemes

P. A. Osanaiye 《Journal of applied statistics》1985,12(1):71-81

This paper proposes a multiple sample extension of the chain-deferred plans by the author (1983).

These plans are similar to the traditional multiple plans except that they use information from, at most, four surrounding lots when sentencing the current lot. When compared with existing plans, the new proposals either reduce the cost of the decision procedure or reduce the possible length of the queue of unsentenced lots and at the same time give an equivalent overall protection.

The average sample number before a decision and the average delay time are both examined and compared with existing multiple deferred state sampling plans and with the traditional multiple plans.

The compatibility of these plans with the traditional multiple plans is illustrated. 相似文献