首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The usual chi-squared approximation to test statistics based on normal theory for testing covariance structures of multivariate populations is very sensitive to the normality assumption. Two general bootstrap procedures are developed in this paper to obtain approximately valid critical values for these test statistics when the data are not normally distributed. The first is based on separate sampling from individual samples, and the second is based on sampling from pooled samples. Although the second method requires more assumptions, its small sample properties are better.  相似文献   

2.
In this paper, we propose a new test statistic for testing the equality of high-dimensional covariance matrices for multiple populations. The proposed test statistic generalizes the test of the equality of two population covariance matrices proposed by Li and Chen (2012).  相似文献   

3.
Abstract

This article proposes new regression-type estimators by considering Tukey-M, Hampel M, Huber MM, LTS, LMS and LAD robust methods and MCD and MVE robust covariance matrices in stratified sampling. Theoretically, we obtain the mean square error (MSE) for these estimators. We compare the efficiencies based on MSE equations, between the proposed estimators and the traditional combined and separate regression estimators. As a result of these comparisons, we observed that our proposed estimators give more efficient results than traditional approaches. And, these theoretical results are supported with the aid of numerical examples and simulation based on data sets that include outliers.  相似文献   

4.
5.
This paper describes a permutation procedure to test for the equality of selected elements of a covariance or correlation matrix across groups. It involves either centring or standardising each variable within each group before randomly permuting observations between groups. Since the assumption of exchangeability of observations between groups does not strictly hold following such transformations, Monte Carlo simulations were used to compare expected and empirical rejection levels as a function of group size, the number of groups and distribution type (Normal, mixtures of Normals and Gamma with various values of the shape parameter). The Monte Carlo study showed that the estimated probability levels are close to those that would be obtained with an exact test except at very small sample sizes (5 or 10 observations per group). The test appears robust against non-normal data, different numbers of groups or variables per group and unequal sample sizes per group. Power was increased with increasing sample size, effect size and the number of elements in the matrix and power was decreased with increasingly unequal numbers of observations per group.  相似文献   

6.
Necessary and sufficient conditions on the observation covariance structure and on the set of linear transformations are given for which the distribution of the multivariate maximum squared - radii statistic for detecting a single multivariate outlier is invariant from the distribution assuming the usual independence covariance structure. Thus, we extend the work of Baksalary and Puntanen (1990), who have given necessary and sufficient conditions for an independence-distribution-preserving covariance structure for Grubbs' statistic for detecting a univariate outlier. We also extend the work of Marco, Young, and Turner (1987) and Pavur and Young (1991), who have given sufficient conditions for an independence-distribution-preserving dependency structure for the multivariate squared - radii statistic.  相似文献   

7.
ABSTRACT

In this note, the limiting spectral distribution for large sample covariance matrices with unbounded m-dependent structure is obtained under the third moment for the entries. This partially extends the results of Hui and Pan (Comm. Statist. Theory and Methods, 2010, 39: 935–941).  相似文献   

8.
To overcome the main flaw of minimum covariance determinant (MCD) estimator, i.e. difficulty to determine its main parameter h, a modified-MCD (M-MCD) algorithm is proposed. In M-MCD, the self-adaptive iteration is proposed to minimize the deflection between the standard deviation of robust mahalanobis distance square, which is calculated by MCD with the parameter h based on the sample, and the standard deviation of theoretical mahalanobis distance square by adjusting the parameter h of MCD. Thus, the optimal parameter h of M-MCD is determined when the minimum deflection is obtained. The results of convergence analysis demonstrate that M-MCD has good convergence property. Further, M-MCD and MCD were applied to detect outliers for two typical data and chemical process data, respectively. The results show that M-MCD can get the optimal parameter h by using the self-adaptive iteration and thus its performances of outlier detection are better than MCD.  相似文献   

9.
10.
The sample distance functions between an observation and a population were deduced by the likelihood procedures for discrimination problem in the case of several normal populations with unequal covariance matrices(1986). The present paper gives the exact MGFs of the distance functions for the case that the observation and the sample come from the same population and the limiting distributions of the distance functions by using the MCFs.  相似文献   

11.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

12.
The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. There are various methods available in the literature for detection of outliers in multivariate data [V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, 1994] using the Mahalanobis distance measure. An attempt is made to propose an alternate method of outlier detection based on the comedian introduced by Falk [On MAD and Comedians, Ann. Inst. Statist. Math. 49 (1997), pp. 615–644]. The proposed method is computationally efficient with high breakdown value and low computation time. Further, important properties, namely, success rates (SR) and false detection rates (FDR) are studied and compared with some of the well-known outlier detection methods through a simulation study. The Comedian method has high SR and low FDR for all combination of parameters. On removal of the detected outliers or down weighing, the same, highly robust and approximately affine equivariant estimators of multivariate location and scatter can be obtained. Finally, the method is applied to well-known real data sets to evaluate its performance.  相似文献   

13.
Robust statistics have slowly become familiar to all practitioners. Books entirely devoted to the subject (e.g. [R.A. Maronna, R.D. Martin, V.J. Yohai, Robust Statistics: Theory and Methods. John Wiley &; Sons, New York, NY, USA, 2006; P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, John Wiley &; Sons, New York, NY, USA, 1987], …) are without any doubt responsible for the increased practice of robust statistics in all fields of applications. Even classical books often have at least one chapter (or parts of chapters) which develops robust methodology. The improvement of computing power has also contributed to the development of a wider and wider range of available robust procedures. However, this success story is now menacing to get backwards: non-specialists interested in the application of robust methodology are faced with a large set of (assumed equivalent) methods and with over-sophistication of some of them. Which method should one use? How should the (numerous) parameters be optimally tuned? These questions are not so easy to answer for non-specialists! One could then argue that default procedures are available in most statistical software (Splus, R, SAS, Matlab, …). However, using as illustration the detection of outliers in multivariate data, it is shown that, on one hand, it is not obvious that one would feel confident with the output of default procedures, and that, on the other hand, trying to understand thoroughly the tuning parameters involved in the procedures might require some extensive research. This is not conceivable when trying to compete with the classical methodology which (while clearly unreliable) is so straightforward. The aim of the paper is to help the practitioners willing to detect in a reliable way outliers in a multivariate data set. The chosen methodology is the Minimum Covariance Determinant estimator being widely available and intuitively appealing.  相似文献   

14.
The influence function introduced by Hampe1 (1968, 1973, 1974) is a tool that can be used for outlier detection. Campbell (1978) has obtained influence function for Mahalanobis’s distance between two populations which can be used for detecting outliers in discrim-inant analysis. In this paper influence functions for a variety of parametric functions in multivariate analysis are obtained. Influence functions for the generalized variance, the matrix of regression coefficients, the noncentrality matrix Σ-1 δ in multivariate analysis of variance and its eigen values, the matrix L, which is a generalization of 1-R2 , canonical correlations, principal components and parameters that correspond to Pillai’s statistic (1955), Hotelling’s (1951) generalized To2 and Wilk’s Λ (1932), which can be used for outlier detection in multivariate analysis, are obtained. Delvin, Ginanadesikan and Kettenring (1975) have obtained influence function for the population correlation co-efficient in the bivariate case. It is shown in this paper that influence functions for parameters corresponding to r2, R2, and Mahalanobis D2 can be obtained as particular cases.  相似文献   

15.
Covariance matrices, or in general matrices of sums of squares and cross-products, are used as input in many multivariate analyses techniques. The eigenvalues of these matrices play an important role in the statistical analysis of data including estimation and hypotheses testing. It has been recognized that one or few observations can exert an undue influence on the eigenvalues of a covariance matrix. The relationship between the eigenvalues of the covariance matrix computed from all data and the eigenvalues of the perturbed covariance matrix (a covariance matrix computed after a small subset of the observations has been deleted) cannot in general be written in closed-form. Two methods for approximating the eigenvalues of a perturbed covariance matrix have been suggested by Hadi (1988) and Wang and Nyquist (1991) for the case of a perturbation by a single observation. In this paper we improve on these two methods and give some additional theoretical results that may give further insight into the problem. We also compare the two improved approximations in terms of their accuracies.  相似文献   

16.
Multivariate control charts are used to monitor stochastic processes for changes and unusual observations. Hotelling's T2 statistic is calculated for each new observation and an out‐of‐control signal is issued if it goes beyond the control limits. However, this classical approach becomes unreliable as the number of variables p approaches the number of observations n, and impossible when p exceeds n. In this paper, we devise an improvement to the monitoring procedure in high‐dimensional settings. We regularise the covariance matrix to estimate the baseline parameter and incorporate a leave‐one‐out re‐sampling approach to estimate the empirical distribution of future observations. An extensive simulation study demonstrates that the new method outperforms the classical Hotelling T2 approach in power, and maintains appropriate false positive rates. We demonstrate the utility of the method using a set of quality control samples collected to monitor a gas chromatography–mass spectrometry apparatus over a period of 67 days.  相似文献   

17.
Driven by network intrusion detection, we propose a MultiResolution Anomaly Detection (MRAD) method, which effectively utilizes the multiscale properties of Internet features and network anomalies. In this paper, several theoretical properties of the MRAD method are explored. A major new result is the mathematical formulation of the notion that a two-scaled MRAD method has larger power than the average power of the detection method based on the given two scales. Test threshold is also developed. Comparisons between MRAD method and other classical outlier detectors in time series are reported as well.  相似文献   

18.
Outlier detection plays an important role in the pre-treatment of sequential datasets to obtain pure valuable data. This paper proposes an outlier detection scheme for dynamical sequential datasets. First, the conception of forward outlier factor(FOF) and backward outlier factor(BOF) are employed to measure an object’s similarity shared with its sequentially adjacent objects. The object that shows no similarity with its sequential neighbors is labeled as suspicious outliers, which will be treated subsequently to judge whether it is really an outlier in the dataset. Second, the sequentially adjacent suspicious outliers are defined as suspicious outlier series(SOS), then the expected path representing the ideal transition path through the suspicious outliers in the SOS and the measured path representing the real path through all the objects in the SOS are employed, and the ratio of the length of the expected path to that of the measured path indicates whether there exist outliers in the SOS. Third, in the case that there exist outliers in the SOS, if there are N suspicious outliers in the SOS, then 2N ? 2 remaining path will be generated by removing k(0 < k < N) suspicious outliers and sequentially connecting the remaining ones. The dynamical sequential outlier factor(DSOF) is employed to represent the ratio of the length of measured path of the considered remaining path to the that of the the expected path of the corresponding SOS, and the degree of the objects removed in a remaining path being outliers is indicated by the DSOF. The proposed outlier detection scheme is conducted from a dynamical perspective, and breaks the tight relation between being an outlier and being not similar with adjacent objects. Experiments are conducted to evaluate the effectiveness of the proposed scheme, and the experimental results verify that the proposed scheme has higher detection quality for sequential dataset. In addition, the proposed outlier detection scheme is not dependent on the size of dataset and needs no prior information about the distribution of the data.  相似文献   

19.
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.  相似文献   

20.
For the invariant unbiased level-α test of equality of two co-variance matrices, the quantities b and B satisfying the equations P(b≤T≤B) = 1-α, E(T|b≤T≤B) = E(T), where T is the mean trace of a multivariate beta, are required. Five and one per cent values of B are tabulated for m = 2,3(2)11,16; b can be obtained from B. Upper five and one per cent values of T are also included, as these are required for the locally most powerful invariant test of nullity of any source of difference in several mean vectors and the locally most powerful invariant one-sided test of equality of two covariance matrices. Lower critical values may be obtained from upper critical values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号