期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Modified ratio estimators using robust regression methods

Tolga Zaman Hasan Bulut 《统计学通讯:理论与方法》2019,48(8):2039-2048

When there is an outlier in the data set, the efficiency of traditional methods decreases. In order to solve this problem, Kadilar et al. (2007) adapted Huber-M method which is only one of robust regression methods to ratio-type estimators and decreased the effect of outlier problem. In this study, new ratio-type estimators are proposed by considering Tukey-M, Hampel M, Huber MM, LTS, LMS and LAD robust methods based on the Kadilar et al. (2007). Theoretically, we obtain the mean square error (MSE) for these estimators. We compared with MSE values of proposed estimators and MSE values of estimators based on Huber-M and OLS methods. As a result of these comparisons, we observed that our proposed estimators give more efficient results than both Huber M approach which was proposed by Kadilar et al. (2007) and OLS approach. Also, under all conditions, all of the other proposed estimators except Lad method are more efficient than robust estimators proposed by Kadilar et al. (2007). And, these theoretical results are supported with the aid of a numerical example and simulation by basing on data that includes an outlier. 相似文献

2.

A novel spatial outlier detection technique

Alok Kumar Singh S. Lalitha 《统计学通讯:理论与方法》2018,47(1):247-257

Spatial outliers are spatially referenced objects whose non spatial attribute values are significantly different from the corresponding values in their spatial neighborhoods. In other words, a spatial outlier is a local instability or an extreme observation that deviates significantly in its spatial neighborhood, but possibly not be in the entire dataset. In this article, we have proposed a novel spatial outlier detection algorithm, location quotient (LQ) for multiple attributes spatial datasets, and compared its performance with the well-known mean and median algorithms for multiple attributes spatial datasets, in the literature. In particular, we have applied the mean, median, and LQ algorithms on a real dataset and on simulated spatial datasets of 13 different sizes to compare their performances. In addition, we have calculated area under the curve values in all the cases, which shows that our proposed algorithm is more powerful than the mean and median algorithms in almost all the considered cases and also plotted receiver operating characteristic curves in some cases. 相似文献

3.

Outliers and Patterns of Outliers in Contingency Tables with Algebraic Statistics

FABIO RAPALLO 《Scandinavian Journal of Statistics》2012,39(4):784-797

Abstract. In this paper, we provide a definition of pattern of outliers in contingency tables within a model‐based framework. In particular, we make use of log‐linear models and exact goodness‐of‐fit tests to specify the notions of outlier and pattern of outliers. The language and some techniques from Algebraic Statistics are essential tools to make the definition clear and easily applicable. We also analyse several numerical examples to show how to use our definitions. 相似文献

4.

Detecting possibly non-consecutive outliers in industrial time series

Alberto Luceñ 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1998,60(2):295-310

A method for robust estimation and multiple outlier detection in time series generated by autoregressive integrated moving average processes in industrial environments is developed. The procedure is based on reweighted maximum likelihood estimation using Huber or redescending weights and, therefore, generalizes the well-established robust M -estimation procedures used in the regression framework. When the scalar process is non-stationary, the computations required can be performed equally well using either rhe original undifferenced series or auxiliary differenced series. Whereas the latter alternative may be preferred for scalar series, the former might be extended to cope with vector partially non-stationary time series without differencing the series, thus avoiding non-invertibility and parameter identifiability problems caused by overdifferencing. The overall strategy is applied in two real industrial data sets. 相似文献

5.

Comparisons of outlier tests for potency bioassays

Perceval Sondag Lingmin Zeng Binbing Yu Harry Yang Steven Novick 《Pharmaceutical statistics》2020,19(3):230-242

Potency bioassays are used to measure biological activity. Consequently, potency is considered a critical quality attribute in manufacturing. Relative potency is measured by comparing the concentration‐response curves of a manufactured test batch with that of a reference standard. If the curve shapes are deemed similar, the test batch is said to exhibit constant relative potency with the reference standard, a critical requirement for calibrating the potency of the final drug product. Outliers in bioassay potency data may result in the false acceptance/rejection of a bad/good sample and, if accepted, may yield a biased relative potency estimate. To avoid these issues, the USP<1032> recommends the screening of bioassay data for outliers prior to performing a relative potency analysis. In a recently published work, the effects of one or more outliers, outlier size, and outlier type on similarity testing and estimation of relative potency were thoroughly examined, confirming the USP<1032> outlier guidance. As a follow‐up, several outlier detection methods, including those proposed by the USP<1010>, are evaluated and compared in this work through computer simulation. Two novel outlier detection methods are also proposed. The effects of outlier removal on similarity testing and estimation of relative potency were evaluated, resulting in recommendations for best practice. 相似文献

6.

Outlier detection using difference-based variance estimators in multiple regression

Chun Gun Park 《统计学通讯:理论与方法》2018,47(24):5986-6001

In this article, we propose an outlier detection approach in a multiple regression model using the properties of a difference-based variance estimator. This type of a difference-based variance estimator was originally used to estimate error variance in a non parametric regression model without estimating a non parametric function. This article first employed a difference-based error variance estimator to study the outlier detection problem in a multiple regression model. Our approach uses the leave-one-out type method based on difference-based error variance. The existing outlier detection approaches using the leave-one-out approach are highly affected by other outliers, while ours is not because our approach does not use the regression coefficient estimator. We compared our approach with several existing methods using a simulation study, suggesting the outperformance of our approach. The advantages of our approach are demonstrated using a real data application. Our approach can be extended to the non parametric regression model for outlier detection. 相似文献

7.

Multiple outliers detection in sparse high-dimensional regression

Tao Wang Qun Li Bin Chen 《Journal of Statistical Computation and Simulation》2018,88(1):89-107

The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data. 相似文献

8.

A Comparison of Multiple Outlier Detection Methods for Regression Data

Nedret Billor Gulsen Kiral 《统计学通讯:模拟与计算》2013,42(3):521-545

The problem of outliers in statistical data has attracted many researchers for a long time. Consequently, numerous outlier detection methods have been proposed in the statistical literature. However, no consensus has emerged as to which method is uniformly better than the others or which one is recommended for use in practical situations. In this article, we perform an extensive comparative Monte Carlo simulation study to assess the performance of the multiple outlier detection methods that are either recently proposed or frequently cited in the outlier detection literature. Our simulation experiments include a wide variety of realistic and challenging regression scenarios. We give recommendations on which method is superior to others under what conditions. 相似文献

9.

Detecting outliers: power and some other considerations

Ram B. Jain 《统计学通讯:理论与方法》2013,42(22):2299-2314

The general problem of outlier detection and five recursive outlier detection procedures considered in the study are defined. The methods to compute powers, probabilities of detecting ≥1 outliers, and >1 observations including at least one inlier as outliers are computed and results are discussed. Results show that no procedure is most powerful when the actual number of outlier present in the sample is exactly, under-, and overestimated. The probabilities of inliers being detected as outliers are also substantial particularly when outliers occur only on one side of the sample 相似文献

10.

Detection of outliers in multilevel models

Lei Shi Gemai Chen 《Journal of statistical planning and inference》2008

This paper studies outlier detection for multilevel models. Approximate formulae for outlier detection in estimating both fixed and random parameters under the mean-shift outlier model are derived, and a test for multiple outliers is proposed. These results can be used to detect outlier units at any levels. Detection of outlier units related to random parts is also studied. Analysis of an example shows that the proposed method is effective in identifying outliers in multilevel models. 相似文献

11.

Model‐based Prediction In Ecological Surveys Including Those with Incomplete Detection

Gavin J. Melville Alan H. Welsh 《Australian & New Zealand Journal of Statistics》2014,56(3):257-281

This paper explores and develops model‐based predictors for surveys of plants and wildlife including those with incomplete detection. The methodology allows for estimating a detection function to account for objects which were not detected at the time of the survey. The model‐based theory utilises generalized linear models (GLMs) and is either new or adapted from other areas of sampling. A simulation study is used to validate the estimators and comparisons are made with an integrated likelihood approach. An aerial survey of kangaroos in western New South Wales is used to illustrate the theory. The area within 50m of the aircraft is treated as a strip transect and mark‐recapture methods are used to estimate the detection function. 相似文献

12.

Outlier detection in contingency tables using decomposable graphical models

Mads Lindskou Poul Svante Eriksen Torben Tvedebrink 《Scandinavian Journal of Statistics》2020,47(2):347-360

For high-dimensional data, it is a tedious task to determine anomalies such as outliers. We present a novel outlier detection method for high-dimensional contingency tables. We use the class of decomposable graphical models to model the relationship among the variables of interest, which can be depicted by an undirected graph called the interaction graph. Given an interaction graph, we derive a closed-form expression of the likelihood ratio test (LRT) statistic and an exact distribution for efficient simulation of the test statistic. An observation is declared an outlier if it deviates significantly from the approximated distribution of the test statistic under the null hypothesis. We demonstrate the use of the LRT outlier detection framework on genetic data modeled by Chow–Liu trees. 相似文献

13.

Local likelihood density estimation for interval censored data

John Braun Thierry Duchesne James E. Stafford 《Revue canadienne de statistique》2005,33(1):39-60

The authors propose a class of procedures for local likelihood estimation from data that are either interval‐censored or that have been aggregated into bins. One such procedure relies on an algorithm that generalizes existing self‐consistency algorithms by introducing kernel smoothing at each step of the iteration. The entire class of procedures yields estimates that are obtained as solutions of fixed point equations. By discretizing and applying numerical integration, the authors use fixed point theory to study convergence of algorithms for the class. Rapid convergence is effected by the implementation of a local EM algorithm as a global Newton iteration. The latter requires an explicit solution of the local likelihood equations which can be found by using the symbolic Newton‐Raphson algorithm, if necessary. 相似文献

14.

Wilcoxon's signed‐rank statistic: what null hypothesis and why it matters

Heng Li Terri Johnson 《Pharmaceutical statistics》2014,13(5):281-285

In statistical literature, the term ‘signed‐rank test’ (or ‘Wilcoxon signed‐rank test’) has been used to refer to two distinct tests: a test for symmetry of distribution and a test for the median of a symmetric distribution, sharing a common test statistic. To avoid potential ambiguity, we propose to refer to those two tests by different names, as ‘test for symmetry based on signed‐rank statistic’ and ‘test for median based on signed‐rank statistic’, respectively. The utility of such terminological differentiation should become evident through our discussion of how those tests connect and contrast with sign test and one‐sample t‐test. Published 2014. This article is a U.S. Government work and is in the public domain in the USA. 相似文献

15.

Robust linear discriminant analysis using S‐estimators

Christophe Croux Catherine Dehon 《Revue canadienne de statistique》2001,29(3):473-493

The authors consider a robust linear discriminant function based on high breakdown location and covariance matrix estimators. They derive influence functions for the estimators of the parameters of the discriminant function and for the associated classification error. The most B‐robust estimator is determined within the class of multivariate S‐estimators. This estimator, which minimizes the maximal influence that an outlier can have on the classification error, is also the most B‐robust location S‐estimator. A comparison of the most B‐robust estimator with the more familiar biweight S‐estimator is made. 相似文献

16.

Outlier detection in high-dimensional regression model

Tao Wang 《统计学通讯:理论与方法》2017,46(14):6947-6958

An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods. 相似文献

17.

Robust permutation tests for one sample

《Journal of statistical planning and inference》2003,116(2):475-487

We consider robust permutation tests based on an estimating equation comparing the test statistics based on the score function with those based on the M-estimate. First we obtain a form for the tests so that the exact tests may be carried out using the same algorithms as used for permutation tests based on the mean. Then we compare the efficiencies of the tests in two cases, equivalent to the sign test and a test based on Huber scores, showing that they are equivalent in the Pitman sense but that they have different Bahadur slopes with neither exceeding the other over the whole range. 相似文献

18.

A simple diagnostic method of outlier detection for stationary Gaussian time series 总被引：1，自引：0，他引：1

Yuzhi Cai Neville Davies 《Journal of applied statistics》2003,30(2):205-223

In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed. 相似文献

19.

Sample size calculation for recurrent events data in one‐arm studies

Paola Rebora Stefania Galimberti 《Pharmaceutical statistics》2012,11(6):494-502

In some exceptional circumstances, as in very rare diseases, nonrandomized one‐arm trials are the sole source of evidence to demonstrate efficacy and safety of a new treatment. The design of such studies needs a sound methodological approach in order to provide reliable information, and the determination of the appropriate sample size still represents a critical step of this planning process. As, to our knowledge, no method exists for sample size calculation in one‐arm trials with a recurrent event endpoint, we propose here a closed sample size formula. It is derived assuming a mixed Poisson process, and it is based on the asymptotic distribution of the one‐sample robust nonparametric test recently developed for the analysis of recurrent events data. The validity of this formula in managing a situation with heterogeneity of event rates, both in time and between patients, and time‐varying treatment effect was demonstrated with exhaustive simulation studies. Moreover, although the method requires the specification of a process for events generation, it seems to be robust under erroneous definition of this process, provided that the number of events at the end of the study is similar to the one assumed in the planning phase. The motivating clinical context is represented by a nonrandomized one‐arm study on gene therapy in a very rare immunodeficiency in children (ADA‐SCID), where a major endpoint is the recurrence of severe infections. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

20.

Estimating the number of classes in multiple populations: A geometric analysis

Chang Xuan Mao Bruce G. Lindsay 《Revue canadienne de statistique》2004,32(3):303-314

The authors study estimation of the total number of classes present in multiple overlapping populations. They show that the number of classes is identifiable in a nonparametric mixture model of multivariate Poisson densities. Unusual phenomena occur in both point estimation and confidence inference for the parameter defined as the odds of a class being unidentified in the data. Consequently only one‐sided confidence intervals are available for the number of classes. 相似文献