期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Outlier Detection in a Circular Regression Model Using COVRATIO Statistic

S. Ibrahim A. Rambli A. G. Hussin I. Mohamed 《统计学通讯:模拟与计算》2013,42(10):2272-2280

In this article, we model the relationship between two circular variables using the circular regression models, to be called JS circular regression model, which was proposed by Jammalamadaka and Sarma (1993). The model has many interesting properties and is sensitive enough to detect the occurrence of outliers. We focus our attention on the problem of identifying outliers in this model. In particular, we extend the use of the COVRATIO statistic, which has been successfully used in the linear case for the same purpose, to the JS circular regression model via a row deletion approach. Through simulation studies, the cut-off points for the new procedure are obtained and its power of performance is investigated. It is found that the performance improves when the resulting residuals have small variance and when the sample size gets larger. An example of the application of the procedure is presented using a real dataset. 相似文献

2.

A Simple Diagnostic Plot Connecting Robust Estimation, Outlier Detection, and False Discovery Rates

Kenneth Rice David Spiegelhalter 《Journal of applied statistics》2006,33(10):1131-1147

Robust estimation of parameters, and identification of specific data points that are discordant with an assumed model, are often treated as different statistical problems. The two aims are, however, closely inter-related and in many cases the two analyses are required simultaneously. We present a simple diagnostic plot that connects existing robust estimators with simultaneous outlier detection, and uses the concept of false discovery rates to allow for the multiple comparisons induced by considering each point as a potential outlier. It is straightforward to implement, and applicable in any situation for which robust estimation procedures exist. Several examples are given. 相似文献

3.

Regression Methods for Combining Multiple Classifiers

T. Górecki M. Krzyśko 《统计学通讯:模拟与计算》2015,44(3):739-755

As no single classification method outperforms other classification methods under all circumstances, decision-makers may solve a classification problem using several classification methods and examine their performance for classification purposes in the learning set. Based on this performance, better classification methods might be adopted and poor methods might be avoided. However, which single classification method is the best to predict the classification of new observations is still not clear, especially when some methods offer similar classification performance in the learning set. In this article we present various regression and classical methods, which combine several classification methods to predict the classification of new observations. The quality of the combined classifiers is examined on some real data. Nonparametric regression is the best method of combining classifiers. 相似文献

4.

A Review Of Some Robust Data Analysis And Multiple Outlier Detection Procedures

Dr P Prescott 《Journal of applied statistics》1980,7(2):141-158

Some recent contributions to robust data analysis and multiple outlier detection are discussed. Two methods of analysis producing robust estimates and sets of weights which may be inspected for outliers are described and compared. Some examples of their application are given to support the recommendation that both ordinary least squares and a robust method of analysis should be part of routine data analysis. 相似文献

5.

Boxplot-Based Outlier Detection for the Location-Scale Family

Y. H. Dovoedo 《统计学通讯:模拟与计算》2015,44(6):1492-1513

Boxplots are among the most widely used exploratory data analysis (EDA) tools in statistical practice. Typical applications of boxplots include eliciting information about the underlying distribution (shape, location, etc.) as well as identifying possible outliers. This article focuses on a modification using a type of lower and upper fences similar in concept to those used in a traditional boxplot; however, instead of constructing the upper and lower fences using the upper and lower quartiles, respectively, and a multiple of the interquartile range (IQR), multiples of the upper and the lower semi-interquartile ranges (SIQR), respectively, measured from the sample median, are used. Any observation beyond the proposed fences is labeled a potential outlier. An exact expression for the probability that at least one sample observation is wrongly classified as an outlier, the so-called “some-outside rate per sample” (Hoaglin et al. (1986)), is derived for the family of location-scale distributions and is used in the determination of the fence constants. Tables for the fence constants are provided for a number of well-known location-scale distributions along with some illustrations with data; the performance of the outlier detection rule is explored in a simulation study. 相似文献

6.

A Semiparametric Regression Method for Interval-Censored Data

Seungbong Han Adin-Cristian Andrei Kam-Wah Tsui 《统计学通讯:模拟与计算》2013,42(1):18-30

In many medical studies, event times are recorded in an interval-censored (IC) format. For example, in numerous cancer trials, time to disease relapse is only known to have occurred between two consecutive clinic visits. Many existing modeling methods in the IC context are computationally intensive and usually require numerous assumptions that could be unrealistic or difficult to verify in practice. We propose a flexible and computationally efficient modeling strategy based on jackknife pseudo-observations (POs). The POs obtained based on nonparametric estimators of the survival function are employed as outcomes in an equivalent, yet simpler regression model that produces consistent covariate effect estimates. Hence, instead of operating in the IC context, the problem is translated into the realm of generalized linear models, where numerous options are available. Outcome transformations via appropriate link functions lead to familiar modeling contexts such as the proportional hazards and proportional odds. Moreover, the methods developed are not limited to these settings and have broader applicability. Simulations studies show that the proposed methods produce virtually unbiased covariate effect estimates, even for moderate sample sizes. An example from the International Breast Cancer Study Group (IBCSG) Trial VI further illustrates the practical advantages of this new approach. 相似文献

7.

面板数据的贝叶斯Elastic Net分位数回归方法及其应用研究

唐礼智李雨佳赵力静《统计研究》2020,(3):94-113

本文首次将Elastic Net这种用于高度相关变量的惩罚方法用于面板数据的贝叶斯分位数回归,并基于非对称Laplace先验分布推导所有参数的后验分布,进而构建Gibbs抽样。为了验证模型的有效性,本文将面板数据的贝叶斯Elastic Net分位数回归方法(BQR. EN)与面板数据的贝叶斯分位数回归方法(BQR)、面板数据的贝叶斯Lasso分位数回归方法(BLQR)、面板数据的贝叶斯自适应Lasso分位数回归方法(BALQR)进行了多种情形下的全方位比较,结果表明BQR. EN方法适用于具有高度相关性、数据维度很高和尖峰厚尾分布特征的数据。进一步地,本文就BQR. EN方法在不同扰动项假设、不同样本量的情形展开模拟比较,验证了新方法的稳健性和小样本特性。最后,本文选取互联网金融类上市公司经济增加值(EVA)作为实证研究对象,检验新方法在实际问题中的参数估计与变量选择能力,实证结果符合预期。相似文献

8.

An Empirical Comparison of Multiple Imputation Methods for Categorical Data

Olanrewaju Akande Fan Li Jerome Reiter 《The American statistician》2017,71(2):162-170

Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. Supplementary material for this article is available online. 相似文献

9.

Asymptotic Theory of Outlier Detection Algorithms for Linear Time Series Regression Models

Søren Johansen Bent Nielsen 《Scandinavian Journal of Statistics》2016,43(2):321-348

Outlier detection algorithms are intimately connected with robust statistics that down‐weight some observations to zero. We define a number of outlier detection algorithms related to the Huber‐skip and least trimmed squares estimators, including the one‐step Huber‐skip estimator and the forward search. Next, we review a recently developed asymptotic theory of these. Finally, we analyse the gauge, the fraction of wrongly detected outliers, for a number of outlier detection algorithms and establish an asymptotic normal and a Poisson theory for the gauge. 相似文献

10.

缺失数据下的逆概率多重加权分位回归估计及其应用

邰凌楠等《统计研究》2018,35(9):115-128

数据缺失问题普遍存在于应用研究中。在随机缺失机制假定下,本文从模型推断角度出发,针对线性缺失分位回归模型,提出一种新的有效估计方法——逆概率多重加权（IPMW）估计。该方法是在逆概率加权（IPW）估计的基础上,结合倾向得分匹配及模型平均思想,经过多次估计,加权确定最终参数估计结果。该方法适用于响应变量是独立同分布或独立非同分布的情形,并适用于绝大多数缺失场景。经过理论推导及模拟研究发现,IPMW估计量在继承IPW估计量的优势上具有更稳健的性质。最后,将该方法应用于含有缺失数据的微观调查数据中,研究了经济较发达的准一线城市中等收入群体消费水平的影响因素,对比两种估计方法的估计结果及置信带,发现逆概率多重加权估计量的标准偏差更小,估计结果更稳健。相似文献

11.

Biased Bootstrap Methods for Reducing the Effects of Contamination

Peter Hall & Brett Presnell 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(3):661-680

Contamination of a sampled distribution, for example by a heavy-tailed distribution, can degrade the performance of a statistical estimator. We suggest a general approach to alleviating this problem, using a version of the weighted bootstrap. The idea is to 'tilt' away from the contaminated distribution by a given (but arbitrary) amount, in a direction that minimizes a measure of the new distribution's dispersion. This theoretical proposal has a simple empirical version, which results in each data value being assigned a weight according to an assessment of its influence on dispersion. Importantly, distance can be measured directly in terms of the likely level of contamination, without reference to an empirical measure of scale. This makes the procedure particularly attractive for use in multivariate problems. It has several forms, depending on the definitions taken for dispersion and for distance between distributions. Examples of dispersion measures include variance and generalizations based on high order moments. Practicable measures of the distance between distributions may be based on power divergence, which includes Hellinger and Kullback–Leibler distances. The resulting location estimator has a smooth, redescending influence curve and appears to avoid computational difficulties that are typically associated with redescending estimators. Its breakdown point can be located at any desired value ε∈ (0, ½) simply by 'trimming' to a known distance (depending only on ε and the choice of distance measure) from the empirical distribution. The estimator has an affine equivariant multivariate form. Further, the general method is applicable to a range of statistical problems, including regression. 相似文献

12.

面板数据的分位回归方法及其模拟研究 总被引：5，自引：0，他引：5

下载免费PDF全文

罗幼喜田茂再《统计研究》2010,27(10):81-87

文章讨论了含有固定效应的面板数据模型,给出了3种估计未知参数的分位回归方法,蒙特卡洛模拟结果显示这些分位回归方法是处理面板数据的有效手段,且在误差非正态时优于均值回归方法。文章最后给出了一个真实数据的建模案例,得到了有利于决策的有用参考信息。相似文献

13.

The Detection of Influential Observations for Allocation,Separation, and the Determination of Probabilities in a Bayesian Framework

Wesley Johnson 《商业与经济统计学杂志》2013,31(3):369-381

Normal theory separation and allocation problems are discussed from a predictive point of view. Influence statistics are defined and employed to ascertain the impact that particular observations will have on the inferential goals—allocation of future observations, separation between populations, and the determination of probabilities for future cases. Methods are illustrated on a collection of financial data taken from Johnson and Wichern (1982). 相似文献

14.

Multiple Spatio-Temporal Cluster Detection for Case Event Data: An Ordering-Based Approach

C. Demattei L. Cucala 《统计学通讯:理论与方法》2013,42(2):358-372

This article introduces a spatio-temporal distance which allows the extension of the spatial cluster detection methods of Demattei et al. (2007 Demattei , C. , Molinari , N. , Daures , J. P. ( 2007 ). Arbitrarily shaped multiple spatial cluster detection for case event data . Computat. Statist. Data Anal. 51 ( 8 ): 3931 – 3945 . [Google Scholar]) and Cucala (2009 Cucala , L. ( 2009 ). A flexible spatial scan test for case event data . Computat. Statist. Data Anal. 53 ( 8 ): 2843 – 2850 .[Crossref], [Web of Science ®] , [Google Scholar]). A review of these methods is given before we define a spatio-temporal distance. Then this distance is used for detecting spatio-temporal clusters. These ordering-based methods are compared to the scan statistic by a simulation study. The scan procedure is more powerful but it detects fewer true positives due to its lack of flexibility. Those techniques are applied to a seismic data set. This article highlights two advantages of the ordering-based methods: their flexibility and their low computational demand. 相似文献

15.

基于Gibbs抽样算法的面板数据分位回归方法

下载免费PDF全文

罗幼喜李翰芳田茂再《统计研究》2011,28(7):98-103

文章讨论了含有随机效应的面板数据模型,利用非对称Laplace分布与分位回归之间的关系,文章建立了一种贝叶斯分层分位回归模型。通过对非对称Laplace分布的分解,文章给出了Gibbs抽样算法下模型参数的点估计及区间估计,模拟结果显示,在处理含随机效应的面板数据模型中,特别是在误差非正态的情况下,本文的方法优于传统的均值模型方法。文章最后利用新方法对我国各地区经济与就业面板数据进行了实证研究,得到了有利于宏观调控的有用信息。相似文献

16.

Rejoinder: Asymptotic Theory of Outlier Detection Algorithms for Linear Time Series Regression Models

下载免费PDF全文

Søren Johansen Bent Nielsen 《Scandinavian Journal of Statistics》2016,43(2):374-381

相似文献

17.

The Relative Effectiveness of Procedures Commonly Used in Multiple Regression Analysis for Dealing with Missing Values

Allan Donner 《The American statistician》2013,67(4):378-381

Expressions are derived for the bias and variance associated with procedures frequently used to estimate partial regression coefficients in a linear model having the two explanatory variables x ₁ and x ₂, with missing values on x ₂ only. The expressions are used to help gain insight into the relative effectiveness of these procedures for handling more complex patterns of missing data. 相似文献

18.

A Comparison of Different Methods for Representing Categorical Data

C. M. Cuadras D. Cuadras M. J. Greenacre 《统计学通讯:模拟与计算》2013,42(2):447-459

We first compare correspondence analysis, which uses chi-square distance, and an alternative approach using Hellinger distance, for representing categorical data in a contingency table. We propose a coefficient which globally measures the similarity between these two approaches. This coefficient can be decomposed into several components, one component for each principal dimension, indicating the contribution of the dimensions to the difference between the two representations. We also make comparisons with the logratio approach based on compositional data. These three methods of representation can produce quite similar results. Two illustrative examples are given. 相似文献

19.

A Comparison of Hierarchical Methods for Clustering Functional Data

Laura Ferreira 《统计学通讯:模拟与计算》2013,42(9):1925-1949

Functional data analysis (FDA)—the analysis of data that can be considered a set of observed continuous functions—is an increasingly common class of statistical analysis. One of the most widely used FDA methods is the cluster analysis of functional data; however, little work has been done to compare the performance of clustering methods on functional data. In this article, a simulation study compares the performance of four major hierarchical methods for clustering functional data. The simulated data varied in three ways: the nature of the signal functions (periodic, non periodic, or mixed), the amount of noise added to the signal functions, and the pattern of the true cluster sizes. The Rand index was used to compare the performance of each clustering method. As a secondary goal, clustering methods were also compared when the number of clusters has been misspecified. To illustrate the results, a real set of functional data was clustered where the true clustering structure is believed to be known. Comparing the clustering methods for the real data set confirmed the findings of the simulation. This study yields concrete suggestions to future researchers to determine the best method for clustering their functional data. 相似文献

20.

面板数据的可加分位回归模型研究与应用

罗幼喜张敏田茂再《统计研究》2020,37(2):105-118

本文在贝叶斯分析的框架下讨论了面板数据的可加模型分位回归建模方法。首先通过低秩薄板惩罚样条展开和个体效应虚拟变量的引进将非参数模型转换为参数模型,然后在假定随机误差项服从非对称Laplace分布的基础上建立了贝叶斯分层分位回归模型。通过对非对称Laplace分布的分解,论文给出了所有待估参数的条件后验分布,并构造了待估参数的 Gibbs抽样估计算法。计算机模拟仿真结果显示,新提出的方法相比于传统的可加模型均值回归方法在估计稳健性上明显占优。最后以消费支出面板数据为例研究了我国农村居民收入结构对消费支出的影响,发现对于农村居民来说,无论是高、中、低消费群体,工资性收入与经营净收入的增加对其消费支出的正向刺激作用更为明显。进一步,相比于高消费农村居民人群,低消费农村居民人群随着收入的增加消费支出上升速度较为缓慢。相似文献