期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations

Cédric Béguin Beat Hulliger 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(2):275-294

Summary. As a part of the EUREDIT project new methods to detect multivariate outliers in incomplete survey data have been developed. These methods are the first to work with sampling weights and to be able to cope with missing values. Two of these methods are presented here. The epidemic algorithm simulates the propagation of a disease through a population and uses extreme infection times to find outlying observations. Transformed rank correlations are robust estimates of the centre and the scatter of the data. They use a geometric transformation that is based on the rank correlation matrix. The estimates are used to define a Mahalanobis distance that reveals outliers. The two methods are applied to a small data set and to one of the evaluation data sets of the EUREDIT project. 相似文献

2.

An outlier detection scheme for dynamical sequential datasets

Shiliang Zhang Zonglin Ye Yanbin Zhang Xiali Hei 《统计学通讯:模拟与计算》2019,48(5):1450-1502

Outlier detection plays an important role in the pre-treatment of sequential datasets to obtain pure valuable data. This paper proposes an outlier detection scheme for dynamical sequential datasets. First, the conception of forward outlier factor(FOF) and backward outlier factor(BOF) are employed to measure an object’s similarity shared with its sequentially adjacent objects. The object that shows no similarity with its sequential neighbors is labeled as suspicious outliers, which will be treated subsequently to judge whether it is really an outlier in the dataset. Second, the sequentially adjacent suspicious outliers are defined as suspicious outlier series(SOS), then the expected path representing the ideal transition path through the suspicious outliers in the SOS and the measured path representing the real path through all the objects in the SOS are employed, and the ratio of the length of the expected path to that of the measured path indicates whether there exist outliers in the SOS. Third, in the case that there exist outliers in the SOS, if there are N suspicious outliers in the SOS, then 2^N ? 2 remaining path will be generated by removing k(0 < k < N) suspicious outliers and sequentially connecting the remaining ones. The dynamical sequential outlier factor(DSOF) is employed to represent the ratio of the length of measured path of the considered remaining path to the that of the the expected path of the corresponding SOS, and the degree of the objects removed in a remaining path being outliers is indicated by the DSOF. The proposed outlier detection scheme is conducted from a dynamical perspective, and breaks the tight relation between being an outlier and being not similar with adjacent objects. Experiments are conducted to evaluate the effectiveness of the proposed scheme, and the experimental results verify that the proposed scheme has higher detection quality for sequential dataset. In addition, the proposed outlier detection scheme is not dependent on the size of dataset and needs no prior information about the distribution of the data. 相似文献

3.

Likelihood testing populations modeled by autoregressive process subject to the limit of detection in applications to longitudinal biomedical data

Albert Vexler Alan D. Hutson 《Journal of applied statistics》2011,38(7):1333-1346

Dependent and often incomplete outcomes are commonly found in longitudinal biomedical studies. We develop a likelihood function, which implements the autoregressive process of outcomes, incorporating the limit of detection problem and the probability of drop-out. The proposed approach incorporates the characteristics of the longitudinal data in biomedical research allowing us to carry out powerful tests to detect a difference between study populations in terms of the growth rate and drop-out rate. The formal notation of the likelihood function is developed, making it possible to adapt the proposed method easily for various different scenarios in terms of the number of groups to compare and a variety of growth trend patterns. Useful inferential properties for the proposed method are established, which take advantage of many well-developed theorems regarding the likelihood approach. A broad Monte-Carlo study confirms both the asymptotic results and illustrates good power properties of the proposed method. We apply the proposed method to three data sets obtained from mouse tumor experiments. 相似文献

4.

Application of trajectories from growth curve in identification of longitudinal biomarker for the multivariate survival data

Feng-shou Ko 《Journal of applied statistics》2017,44(3):416-426

In clinical studies, the researchers measure the patients' response longitudinally. In recent studies, Mixed models are used to determine effects in the individual level. In the other hand, Henderson et al. [3,4] developed a joint likelihood function which combines likelihood functions of longitudinal biomarkers and survival times. They put random effects in the longitudinal component to determine if a longitudinal biomarker is associated with time to an event. In this paper, we deal with a longitudinal biomarker as a growth curve and extend Henderson's method to determine if a longitudinal biomarker is associated with time to an event for the multivariate survival data. 相似文献

5.

A method for sample size calculation via E-value in the planning of observational studies

Yixin Fang Weili He Xiaofei Hu Hongwei Wang 《Pharmaceutical statistics》2021,20(1):163-174

Confounding adjustment plays a key role in designing observational studies such as cross-sectional studies, case-control studies, and cohort studies. In this article, we propose a simple method for sample size calculation in observational research in the presence of confounding. The method is motivated by the notion of E-value, using some bounding factor to quantify the impact of confounders on the effect size. The method can be applied to calculate the needed sample size in observational research when the outcome variable is binary, continuous, or time-to-event. The method can be implemented straightforwardly using existing commercial software such as the PASS software. We demonstrate the performance of the proposed method through numerical examples, simulation studies, and a real application, which show that the proposed method is conservative in providing a slightly bigger sample size than what it needs to achieve a given power. 相似文献

6.

Analysis of the performance of test statistics for detection of outliers (additive,innovative, transient,and level shift) in AR (1) processes

Amena Urooj Zahid Asghar 《统计学通讯:模拟与计算》2017,46(2):948-979

Outlier detection has always been of interest for researchers and data miners. It has been well researched in different knowledge and application domains. This study aims at exploring the correctly identifying outliers using most commonly applied statistics. We evaluate the performance of AO, IO, LS, and TC as vulnerability to spurious outliers by means of empirical level of significance (ELS), power of the test indicating the sensitivity of the statistical tests in detecting changes and the vulnerability to masking of outliers in terms of misspecification frequencies are determined. We have observed that the sampling distribution of test statistic η_tp; tp = AO,?IO,?LS,?TC in case of AR(1) model is connected with the values of n and φ. The sampling distribution of η_TC is less concentrated than the sampling distribution of η_AO, η_IO, and η_LS. In AR(1) process, empirical critical values for 1%, 5%, and 10% upper percentiles are found to be higher than those generally used. We have also found the evidence that the test statistics for transient change (TC) needs to be revisited as the test statistics η_TC is found to be eclipsed by η_AO,?η_LS and η_IO at different δ values. TC keeps on confusing with IO and AO, and at extreme δ values it just gets equal to AO and LS. 相似文献