首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
The study of multivariate outliers raises many problems of definition, principle and manipulation. Well-authenticated tests of discordancy exist only for the multivariate normal distribution. Detection of outliers in non-normal distributions involves the adoption of appropriate criteria to represent 'extremeness' of observations in a sample; corresponding tests of discordancy usually require tedious, or even intractable, distributional and computational manipulations. A class of transformations of the data is considered with a view of transferring some of the familiar and desirable features of discordancy tests for normal samples to non-normal situations.  相似文献   

2.
As the Watson distribution is frequently used for modeling axial data, it is important to investigate the existence of possible outliers in samples from this distribution. Then, we develop for the bipolar Watson distribution defined on the hypersphere, some tests of discordancy of an outlier or several outliers en bloc based on the likelihood ratio, supposing an alternative model of contamination of slippage type. We evaluate the performance of these tests of discordancy of an outlier and we also compare some tests of discordancy of an outlier available for this distribution.  相似文献   

3.
The linear structural model provides one way of modelling a linear relationship between two random variables. It is well known that problems of unidentifiability arise for unreplicated observations and normal error structure. As in all data sets, outliers can arise and methods are needed for detecting and testing them. An outlier-generating model of mean–slippage type can be used to characterise four different forms of outlier manifestation. It is interesting to find that the unidentifiability problem provides no obstacle for detecting or testing the outliers for three of the four forms. Detection principles, and specific discordancy tests, are derived and illustrated by application to some data on physical measurements of Pacific squid.  相似文献   

4.
In this article, we propose a new test of discordancy based on spacing theory in circular data. The test should provide a good alternative to existing tests of discordancy for detecting single or well-separated multiple outliers. On top of that, the new method can be generalized to identify a patch of outliers in data. The percentage points are calculated and the performance is examined. We first investigate the performance of the test for detecting a single outlier and show that the new test performs well compared to other known tests. We then show that the generalized test works well in detecting a patch of outliers in the data. As an illustration, a practical example based on an eye dataset obtained from a glaucoma clinic at the University of Malaya Medical Center, Malaysia is presented.  相似文献   

5.
The union-intersection approach to multivariate test construction is used to develop an alternative to Wilks' likelihood ratio test statistic for testing for two or more outliers in multivariate normal data. It is shown that critical values of both statistics are poorly approximated by Bonferroni bounds. Simulated critical values are presented for both statistics for significance levels 1% and 5%, for sample sizes 10(5)30, 40, 50, 75 and 100 for 2, 3, 4 and 5 dimensions. A power comparison of the two tests in the slippage of the mean model for generating outliers indicates that the union-intersection test is the more powerful when the slippages are close to collinear. Although Wilks' test remains the preference for general use, the union-intersection test could be valuable when such special structure in the data is suspected.  相似文献   

6.
Repeating measurements of efficacy variables in clinical trials may be desirable when the measurement may be affected by ambient conditions. When such measurements are repeated at baseline and at the end of therapy, statistical questions relate to: (1) the best summary measurement to use for a subject when there is a possibility that some observations are contaminated and have increased variances; and (2) the effect of screening procedures which exclude outliers based on within- and between-subject contamination tests. We study these issues in two stages, each using a different set of models. The first stage deals only with the choice of the summary measure. The simulation results show that in some cases of contamination, the power achieved by the tests based on the median exceeds that achieved by the tests based on the mean of the replicates. However, even when we use the median, there are cases when contamination leads to a considerable loss in power. The combined issue of the best summary measurement and the effect of screening is studied in the second stage. The tests use either the observed data or the data after screening for outliers. The simulation results demonstrate that the power depends on the screening procedure as well as on the test statistic used in the study. We found that for the extent and magnitude of contamination considered, within-subject screening has a minimal effect on the power of the tests when there are at least three replicates; as a result, we found no advantage in the use of screening procedures for within-subject contamination. On the other hand, the use of a between-subject screening for outliers increases the power of the test procedures. However, even with the use of screening procedures, heterogeneity of variances can greatly reduce the power of the study.  相似文献   

7.
Outliers can as readily arise in sample survey (i.e. finite population) data as in samples from infinite populations. For infinite populations, an extensive methodology exists: very little has been written on the finite population case. We shall explore matters of definition and concept to formulate some basic principles for handling outliers in sample survey data. Some existing methods for outlier accommodation are reviewed and proposals are made for the dual problem of outlier tests of discordancy.  相似文献   

8.
Statistical tests for two independent samples under the assumption of normality are applied routinely by most practitioners of statistics. Likewise, presumably each introductory course in statistics treats some statistical procedures for two independent normal samples. Often, the classical two-sample model with equal variances is introduced, emphasizing that a test for equality of the expected values is a test for equality of both distributions as well, which is the actual goal. In a second step, usually the assumption of equal variances is discarded. The two-sample t test with Welch correction and the F test for equality of variances are introduced. The first test is solely treated as a test for the equality of central location, as well as the second as a test for the equality of scatter. Typically, there is no discussion if and to which extent testing for equality of the underlying normal distributions is possible, which is quite unsatisfactorily regarding the motivation and treatment of the situation with equal variances. It is the aim of this article to investigate the problem of testing for equality of two normal distributions, and to do so using knowledge and methods adequate to statistical practitioners as well as to students in an introductory statistics course. The power of the different tests discussed in the article is examined empirically. Finally, we apply the tests to several real data sets to illustrate their performance. In particular, we consider several data sets arising from intelligence tests since there is a large body of research supporting the existence of sex differences in mean scores or in variability in specific cognitive abilities.  相似文献   

9.
SUMMARY The discordancy test for multiple outliers is complicated by problems of masking and swamping. The key to the settlement of the question lies in the determination of k , i.e. the number of 'contaminants' in a sample. Great efforts have been made to solve this problem in recent years, but no effective method has been developed. In this paper, we present two ways of determining k , free from the effects of masking and swamping, when testing upper (lower) outliers in normal samples. Examples are given to illustrate the methods.  相似文献   

10.
It is generally assumed that the likelihood ratio statistic for testing the null hypothesis that data arise from a homoscedastic normal mixture distribution versus the alternative hypothesis that data arise from a heteroscedastic normal mixture distribution has an asymptotic χ 2 reference distribution with degrees of freedom equal to the difference in the number of parameters being estimated under the alternative and null models under some regularity conditions. Simulations show that the χ 2 reference distribution will give a reasonable approximation for the likelihood ratio test only when the sample size is 2000 or more and the mixture components are well separated when the restrictions suggested by Hathaway (Ann. Stat. 13:795–800, 1985) are imposed on the component variances to ensure that the likelihood is bounded under the alternative distribution. For small and medium sample sizes, parametric bootstrap tests appear to work well for determining whether data arise from a normal mixture with equal variances or a normal mixture with unequal variances.  相似文献   

11.
Tests for mean equality proposed by Weerahandi (1995) and Chen and Chen (1998), tests that do not require equality of population variances, were examined when data were not only heterogeneous but, as well, nonnormal in unbalanced completely randomized designs. Furthermore, these tests were compared to a test examined by Lix and Keselman (1998), a test that uses a heteroscedastic statistic (i.e., Welch, 1951) with robust estimators (20% trimmed means and Winsorized variances). Our findings confirmed previously published data that the tests are indeed robust to variance heterogeneity when the data are obtained from normal populations. However, the Weerahandi (1995) and Chen and Chen (1998) tests were not found to be robust when data were obtained from nonnormal populations. Indeed, rates of Type I error were typically in excess of 10% and, at times, exceeded 50%. On the other hand, the statistic presented by Lix and Keselman (1998) was generally robust to variance heterogeneity and nonnormality.  相似文献   

12.
The only parametric model in current use for axial data from a rotationally symmetric bipolar or girdle distribution on the sphere is the Watson distribution. This paper develops methods for evaluating the model as a fit to data using graphical and formal goodness-of-fit tests, and tests of discordancy.  相似文献   

13.
When comparing the central values of two independent groups, should a t-test be performed, or should the observations be transformed into their ranks and a Wilcoxon-Mann-Whitney test performed? This paper argues that neither should automatically be chosen. Instead, provided that software for conducting randomisation tests is available, the chief concern should be with obtaining data values that are a good reflection of scientific reality and appropriate to the objective of the research; if necessary, the data values should be transformed so that this is so. The subsequent use of a randomisation (permutation) test will mean that failure of the transformed data values to satisfy assumptions such as normality and equality of variances will not be of concern.  相似文献   

14.
Approximations to the power functions of the likelihood ratio tests of homogeneity of normal means against the simple loop ordering at slippage alternatives are considered. If a researcher knows which mean is smallest and which is largest, but does not know how the other means are ordered, then a simple loop ordering is appropriate. The accuracy of the several moment approximations are studied for the case of known variances and it is found that for powers in the range typically of interest, the two-moment approximation seems quite adequate. Approximations based on mixtures of noncentral F variables are developed for the case of unknown variances. The critical values of the test statistics are also tabulated for selected levels of significance.  相似文献   

15.
A likelihood ratio test for discordancy in the sample is considered with slippage alternatives. It is shown for a wide class of univariate distributions that only the extreme observations in the sample need to be tested for discordancy. This result provides a firmer support to many commonly used discordancy tests that take only extreme observations as candidates. The problem of testing multiple discordant observations is also discussed.  相似文献   

16.
In this paper a test for outliers based on externally studentized residuals is shown to be related to a test for predictive failure. The relationships between a test for outliers, a test for a correlated mean shift and a test for an intercept shift are developed. A sequential testing procedure for outliers and structural change is shown to be independent, so that the overall size of the joint test can be determined exactly. It is established that a joint test for outliers and constancy of variances cannot be performed.  相似文献   

17.
由于传统因子分析方法对离群值较敏感,导致计算结果与实际不相符。针对这一现象,本文运用FAST-MCD方法对传统因子分析方法进行改进,构建出因子分析的稳健算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析结果均表明:因子旋转前后,当数据中不存在离群值时,传统因子分析与稳健因子分析得到的结果基本保持一致;当数据中存在离群值时,运用传统因子分析得到的结果出现较大变化,而运用稳健因子分析方法得到的结果基本不变,这说明相对于传统因子分析方法,稳健因子分析方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

18.
Populational conditional quantiles in terms of percentage α are useful as indices for identifying outliers. We propose a class of symmetric quantiles for estimating unknown nonlinear regression conditional quantiles. In large samples, symmetric quantiles are more efficient than regression quantiles considered by Koenker and Bassett (Econometrica 46 (1978) 33) for small or large values of α, when the underlying distribution is symmetric, in the sense that they have smaller asymptotic variances. Symmetric quantiles play a useful role in identifying outliers. In estimating nonlinear regression parameters by symmetric trimmed means constructed by symmetric quantiles, we show that their asymptotic variances can be very close to (or can even attain) the Cramer–Rao lower bound under symmetric heavy-tailed error distributions, whereas the usual robust and nonrobust estimators cannot.  相似文献   

19.
Traditional statistical modeling of continuous outcome variables relies heavily on the assumption of a normal distribution. However, in some applications, such as analysis of microRNA (miRNA) data, normality may not hold. Skewed distributions play an important role in such studies and might lead to robust results in the presence of extreme outliers. We apply a skew-normal (SN) distribution, which is indexed by three parameters (location, scale and shape), in the context of miRNA studies. We developed a test statistic for comparing means of two conditions replacing the normal assumption with SN distribution. We compared the performance of the statistic with other Wald-type statistics through simulations. Two real miRNA datasets are analyzed to illustrate the methods. Our simulation findings showed that the use of a SN distribution can result in improved identification of differentially expressed miRNAs, especially with markedly skewed data and when the two groups have different variances. It also appeared that the statistic with SN assumption performs comparably with other Wald-type statistics irrespective of the sample size or distribution. Moreover, the real dataset analyses suggest that the statistic with SN assumption can be used effectively for identification of important miRNAs. Overall, the statistic with SN distribution is useful when data are asymmetric and when the samples have different variances for the two groups.  相似文献   

20.
Statistical models are often based on normal distributions and procedures for testing this distributional assumption are needed. Many goodness-of-fit tests suffer from the presence of outliers, in the sense that they may reject the null hypothesis even in the case of a single extreme observation. We show a possible extension of the Shapiro-Wilk test that is not affected by such a problem. The presented method is inspired by the forward search (FS), a new, recently proposed, diagnostic tool. An application to univariate observations shows how the procedure is able to capture the structure of the data, even in the presence of outliers. Other properties are also investigated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号