首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The influence function introduced by Hampe1 (1968, 1973, 1974) is a tool that can be used for outlier detection. Campbell (1978) has obtained influence function for Mahalanobis’s distance between two populations which can be used for detecting outliers in discrim-inant analysis. In this paper influence functions for a variety of parametric functions in multivariate analysis are obtained. Influence functions for the generalized variance, the matrix of regression coefficients, the noncentrality matrix Σ-1 δ in multivariate analysis of variance and its eigen values, the matrix L, which is a generalization of 1-R2 , canonical correlations, principal components and parameters that correspond to Pillai’s statistic (1955), Hotelling’s (1951) generalized To2 and Wilk’s Λ (1932), which can be used for outlier detection in multivariate analysis, are obtained. Delvin, Ginanadesikan and Kettenring (1975) have obtained influence function for the population correlation co-efficient in the bivariate case. It is shown in this paper that influence functions for parameters corresponding to r2, R2, and Mahalanobis D2 can be obtained as particular cases.  相似文献   

2.
A new type of procedure for estimating the number of outliers in a sample is presented and compared with existing procedures. The probabilities of exact, under-, and overestimation with the different procedures are examined for two different contamination schemes.  相似文献   

3.
Tartakovsky et al. provide us with, and should be thanked for, an illuminating introduction to the problems of detecting intrusions and other denial of services attacks, and a thorough discussion and analysis of the relevance of CUSUM-based change detection algorithms for this purpose.This discussion mainly addresses three issues: introducing a minimum change magnitude, adaptation and tuning of CUSUM algorithms, and processing binary quantized data. The influence of the adaptation in the NP-CUSUM algorithm on its performances is questioned.  相似文献   

4.
Exact conditional p-values based on the likelihood-ratio statistic in logistic regression require accurate computation of the supremum of the likelihood function, particularly for outcomes in the sample space that represent completely-separated or quasi-completely-separated data sets. Current software does not always handle these cases well. Three simple solutions are proposed.  相似文献   

5.
Sequential multi-chart detection procedures for detecting changes in multichannel sensor systems are developed. In the case of complete information on pre-change and post-change distributions, the detection algorithm represents a likelihood ratio-based multichannel generalization of Page’s cumulative sum (CUSUM) test that is applied to general stochastic models that may include correlated and nonstationary observations. There are many potential application areas where it is necessary to consider multichannel generalizations and general statistical models. In this paper our main motivation for doing so is network security: rapid anomaly detection for an early detection of attacks in computer networks that lead to changes in network traffic. Moreover, this kind of application encourages the development of a nonparametric multichannel detection test that does not use exact pre-change (legitimate) and post-change (attack) traffic models. The proposed nonparametric method can be effectively applied to detect a wide variety of attacks such as denial-of-service attacks, worm-based attacks, port-scanning, and man-in-the-middle attacks. In addition, we propose a multichannel CUSUM procedure that is based on binary quantized data; this procedure turns out to be more efficient than the previous two algorithms in certain scenarios. All proposed detection algorithms are based on the change-point detection theory. They utilize the thresholding of test statistics to achieve a fixed rate of false alarms, while allowing changes in statistical models to be detected “as soon as possible”. Theoretical frameworks for the performance analysis of detection procedures, as well as results of Monte Carlo simulations for a Poisson example and results of detecting real flooding attacks, are presented.  相似文献   

6.
Distributions of a response y (height, for example) differ with values of a factor t (such as age). Given a response y* for a subject of unknown t*, the objective of inverse prediction is to infer the value of t* and to provide a defensible confidence set for it. Training data provide values of y observed on subjects at known values of t. Models relating the mean and variance of y to t can be formulated as mixed (fixed and random) models in terms of sets of functions of t, such as polynomial spline functions. A confidence set on t* can then be had as those hypothetical values of t for which y* is not detected as an outlier when compared to the model fit to the training data. With nonconstant variance, the p-values for these tests are approximate. This article describes how versatile models for this problem can be formulated in such a way that the computations can be accomplished with widely available software for mixed models, such as SAS PROC MIXED. Coverage probabilities of confidence sets on t* are illustrated in an example.  相似文献   

7.
We propose a new approach for outlier detection, based on a ranking measure that focuses on the question of whether a point is ‘central’ for its nearest neighbours. Using our notations, a low cumulative rank implies that the point is central. For instance, a point centrally located in a cluster has a relatively low cumulative sum of ranks because it is among the nearest neighbours of its own nearest neighbours, but a point at the periphery of a cluster has a high cumulative sum of ranks because its nearest neighbours are closer to each other than the point. Use of ranks eliminates the problem of density calculation in the neighbourhood of the point and this improves the performance. Our method performs better than several density-based methods on some synthetic data sets as well as on some real data sets.  相似文献   

8.
This paper considers a statistical model for the detection mechanism of qualitative microbiological test methods with a parameter for the detection proportion (the probability to detect a single organism) and a parameter for the false positive rate. It is demonstrated that the detection proportion and the bacterial density cannot be estimated separately, not even in a multiple dilution experiment. Only the product can be estimated, changing the interpretation of the most probable number estimator. The asymptotic power of the likelihood ratio statistic for comparing an alternative method with the compendial method, is optimal for a single dilution experiment. The bacterial density should either be close to two CFUs per test unit or equal to zero, depending on differences in the model parameters between the two test methods. The proposed strategy for method validation is to use these two dilutions and test for differences in the two model parameters, addressing the validation parameters specificity and accuracy. Robustness of these two parameters might still be required, but all other validation parameters can be omitted. A confidence interval‐based approach for the ratio of the detection proportions for the two methods is recommended, since it is most informative and close to the power of the likelihood ratio test. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
孙怡帆等 《统计研究》2019,36(3):124-128
从大量基因中识别出致病基因是大数据下的一个十分重要的高维统计问题。基因间网络结构的存在使得对于致病基因的识别已从单个基因识别扩展到基因模块识别。从基因网络中挖掘出基因模块就是所谓的社区发现(或节点聚类)问题。绝大多数社区发现方法仅利用网络结构信息,而忽略节点本身的信息。Newman和Clauset于2016年提出了一个将二者有机结合的基于统计推断的社区发现方法(简称为NC方法)。本文以NC方法为案例,介绍统计方法在实际基因网络中的应用和取得的成果,并从统计学角度提出了改进措施。通过对NC方法的分析可以看出对于以基因网络为代表的非结构化数据,统计思想和原理在数据分析中仍然处于核心地位。而相应的统计方法则需要针对数据的特点及关心的问题进行相应的调整和优化。  相似文献   

10.
Two approaches have been used for designing spatial surveys to detect a target. The classical approach controls the probability of missing a target that exists; a Bayesian approach controls the probability that a target exists given that none was seen. In both cases, information about the likely size of the target can reduce sampling requirements. In this paper, previous results are summarized and then used to assess the risk that Roman remains could be present at sites scheduled for development in Greater London.  相似文献   

11.
This note shows that the results presented by Jabbari Nooghabi et al (2010) do not hold in all expected cases. With this, the technique proposed by Kumar and Lalitha (2012) for detecting upper outliers in Gamma samples is also not valid.  相似文献   

12.
13.
The general problem of outlier detection and five recursive outlier detection procedures considered in the study are defined. The methods to compute powers, probabilities of detecting ≥1 outliers, and >1 observations including at least one inlier as outliers are computed and results are discussed. Results show that no procedure is most powerful when the actual number of outlier present in the sample is exactly, under-, and overestimated. The probabilities of inliers being detected as outliers are also substantial particularly when outliers occur only on one side of the sample  相似文献   

14.
Modern methods for detecting changes in the scale or covariance of multivariate distributions rely primarily on testing for the constancy of the covariance matrix. These depend on higher-order moment conditions, and also do not work well when the dimension of the data is large or even moderate relative to the sample size. In this paper, we propose a nonparametric change point test for multivariate data using rankings obtained from data depth measures. As the data depth of an observation measures its centrality relative to the sample, changes in data depth may signify a change of scale of the underlying distribution, and the proposed test is particularly responsive to detecting such changes. We provide a full asymptotic theory for the proposed test statistic under the null hypothesis that the observations are stable, and natural conditions under which the test is consistent. The finite sample properties are investigated by means of a Monte Carlo simulation, and these along with the theoretical results confirm that the test is robust to heavy tails, skewness and high dimensionality. The proposed methods are demonstrated with an application to structural break detection in the rate of change of pollutants linked to acid rain measured in Turkey lake, a lake in central Ontario, Canada. Our test suggests a change in the rate of acid rain in the late 1980s/early 1990s, which coincides with clean air legislation in Canada and the US. The Canadian Journal of Statistics 48: 417–446; 2020 © 2020 Statistical Society of Canada  相似文献   

15.
The influence function introduced by Hampel (1968, 1973, 1974) i s a tool that can be used for outlier detection. Campbell (1978) has derived influence function for ~ahalanobis's distance between two populations which can be used for detecting outliers i n discriminant analysis. Radhakrishnan and Kshirsagar (1981) have obtained influence functions for a variety of parametric functions i n multivariate analysis. Radhakrishnan (1983) obtained influence functions for parameters corresponding to "residual" wilks's A and i t s "direction" and "collinearity" factors i n discriminant analysis when a single discriminant function is ade- quate while discriminating among several groups. In this paper influence functions for parameters that correspond to "residual" wilks's A and its "direction" and "coplanarity" factors used to test the goodness of f i t of s (s>l) assigned discriminant func- tions for discriminating among several groups are obtained. These influence functions can be used for outlier detection i n m u l t i -variate data when a single discriminant function is not adequate.  相似文献   

16.
The problem of outliers in statistical data has attracted many researchers for a long time. Consequently, numerous outlier detection methods have been proposed in the statistical literature. However, no consensus has emerged as to which method is uniformly better than the others or which one is recommended for use in practical situations. In this article, we perform an extensive comparative Monte Carlo simulation study to assess the performance of the multiple outlier detection methods that are either recently proposed or frequently cited in the outlier detection literature. Our simulation experiments include a wide variety of realistic and challenging regression scenarios. We give recommendations on which method is superior to others under what conditions.  相似文献   

17.
This paper deals with the problem of estimating the Pearson correlation coefficient when one variable is subject to left or right censoring. In parallel to the classical results on the Pearson correlation coefficient, we derive a workable formula, through tedious computation and intensive simplification, of the asymptotic variances of the maximum likelihood estimators in two cases: (1) known means and variances and (2) unknown means and variances. We illustrate the usefulness of the asymptotic results in experimental designs.  相似文献   

18.
Robust estimation of parameters, and identification of specific data points that are discordant with an assumed model, are often treated as different statistical problems. The two aims are, however, closely inter-related and in many cases the two analyses are required simultaneously. We present a simple diagnostic plot that connects existing robust estimators with simultaneous outlier detection, and uses the concept of false discovery rates to allow for the multiple comparisons induced by considering each point as a potential outlier. It is straightforward to implement, and applicable in any situation for which robust estimation procedures exist. Several examples are given.  相似文献   

19.
The Institute of Mathematical Statistics has published a table of critical values for the multivariate extreme deviate test. However, the critical values, derived by a Monte Carlo simulation, are given for only the dimensions 2 through 5. We present new critical values for the dimensions 6 through 10, 12, 15, and 20. The results are presented in both table and graphical form. All critical values for the test statistic have been generated by a Monte Carlo simulation using 10,000 observations per case. An example is presented using the new critical values.  相似文献   

20.
In this article we study the theoretical properties of the simultaneous multiscale change point estimator (SMUCE) in piecewise-constant signal models with dependent error processes. Empirical studies suggest that in this case the change point estimate is inconsistent, but it is not known if alternatives suggested in the literature for correlated data are consistent. We propose a modification of SMUCE scaling the basic statistic by the long run variance of the error process, which is estimated by a difference-type variance estimator calculated from local means from different blocks. For this modification we prove model consistency for physical-dependent error processes and illustrate the finite sample performance by means of a simulation study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号