首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The Forward Search is a powerful general method, incorporating flexible data-driven trimming, for the detection of outliers and unsuspected structure in data and so for building robust models. Starting from small subsets of data, observations that are close to the fitted model are added to the observations used in parameter estimation. As this subset grows we monitor parameter estimates, test statistics and measures of fit such as residuals. The paper surveys theoretical development in work on the Forward Search over the last decade. The main illustration is a regression example with 330 observations and 9 potential explanatory variables. Mention is also made of procedures for multivariate data, including clustering, time series analysis and fraud detection.  相似文献   

2.
In this paper, we consider the influence of individual observations on inferences about the Box–Cox power transformation parameter from a Bayesian point of view. We compare Bayesian diagnostic measures with the ‘forward’ method of analysis due to Riani and Atkinson. In particular, we look at the effect of omitting observations on the inference by comparing particular choices of transformation using the conditional predictive ordinate and the k d measure of Pettit and Young. We illustrate the methods using a designed experiment. We show that a group of masked outliers can be detected using these single deletion diagnostics. Also, we show that Bayesian diagnostic measures are simpler to use to investigate the effect of observations on transformations than the forward search method.  相似文献   

3.
Fitting multiplicative models by robust alternating regressions   总被引:1,自引:0,他引:1  
In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from the residual plot. Also provided is an accompanying robust R 2-plot to determine the appropriate number of factors. The approach is illustrated by real and artificial examples and compared with factor analysis based on robust covariance matrix estimators. The same estimation technique can fit models with both additive and multiplicative effects (FANOVA models) to two-way tables, thereby extending the median polish technique.  相似文献   

4.
Measurement error models constitute a wide class of models that include linear and nonlinear regression models. They are very useful to model many real-life phenomena, particularly in the medical and biological areas. The great advantage of these models is that, in some sense, they can be represented as mixed effects models, allowing us to implement well-known techniques, like the EM-algorithm for the parameter estimation. In this paper, we consider a class of multivariate measurement error models where the observed response and/or covariate are not fully observed, i.e., the observations are subject to certain threshold values below or above which the measurements are not quantifiable. Consequently, these observations are considered censored. We assume a Student-t distribution for the unobserved true values of the mismeasured covariate and the error term of the model, providing a robust alternative for parameter estimation. Our approach relies on a likelihood-based inference using an EM-type algorithm. The proposed method is illustrated through some simulation studies and the analysis of an AIDS clinical trial dataset.  相似文献   

5.
Birnbaum-Saunders models have largely been applied in material fatigue studies and reliability analyses to relate the total time until failure with some type of cumulative damage. In many problems related to the medical field, such as chronic cardiac diseases and different types of cancer, a cumulative damage caused by several risk factors might cause some degradation that leads to a fatigue process. In these cases, BS models can be suitable for describing the propagation lifetime. However, since the cumulative damage is assumed to be normally distributed in the BS distribution, the parameter estimates from this model can be sensitive to outlying observations. In order to attenuate this influence, we present in this paper BS models, in which a Student-t distribution is assumed to explain the cumulative damage. In particular, we show that the maximum likelihood estimates of the Student-t log-BS models attribute smaller weights to outlying observations, which produce robust parameter estimates. Also, some inferential results are presented. In addition, based on local influence and deviance component and martingale-type residuals, a diagnostics analysis is derived. Finally, a motivating example from the medical field is analyzed using log-BS regression models. Since the parameter estimates appear to be very sensitive to outlying and influential observations, the Student-t log-BS regression model should attenuate such influences. The model checking methodologies developed in this paper are used to compare the fitted models.  相似文献   

6.
We contribute to the discussion of an article where Andrea Cerioli, Marco Riani, Anthony Atkinson and Aldo Corbellini review the advantages of analyzing multivariate data by monitoring how the estimated model parameters change as the estimation parameters vary. The focus is on robust methods and their sensitivity to the nominal efficiency and breakdown point. In congratulating with the authors for the clear and stimulating exposition, we contribute to its discussion with an overview of what we experienced in applying the monitoring in our application domain.  相似文献   

7.
欧变玲等 《统计研究》2015,32(10):98-105
空间权重矩阵是描述个体间空间关系的重要工具,通常基于个体间的地理距离构造不随时间而改变的空间权重矩阵。然而,当个体间的空间关系源自经济/社会/贸易距离或人口流动性/气候等特征时,空间权重矩阵本质上可能将随时间而改变。由此,本研究提出时变空间权重矩阵面板数据模型的稳健LM检验。大量Monte Carlo模拟结果显示:从检验水平和功效角度来看,基于误设的非时变空间权重矩阵的稳健LM检验存在较大偏差,但是基于时变空间权重矩阵的稳健LM检验能够有效地识别面板数据中的空间关系类型。尤其是,在时间较长和个体较多等情况下,时变空间权重矩阵的稳健LM检验功效更高。  相似文献   

8.
In this paper, we combine empirical likelihood and estimating functions for censored data to obtain robust confidence regions for the parameters and more generally for functions of the parameters of distributions used in lifetime data analysis. The proposed method works with type I, type II or randomly censored data. It is illustrated by considering inference for log-location-scale models. In particular, we focus on the log-normal and the Weibull models and we tackle the problem of constructing robust confidence regions (or intervals) for the parameters of the model, as well as for quantiles and values of the survival function. The usefulness of the method is demonstrated through a Monte Carlo study and by examples on two lifetime data sets.  相似文献   

9.
This article introduces the robust indirect technique for the slightly contaminated stochastic logistic population models. Based on discrete sampled data with a fixed unit of time between two consecutive observations, we not only construct the robust indirect inference generalized method of moments (GMM) estimator for the model parameters, but also propose a likelihood-ratio-type indirect statistic and a robust indirect GMM saddle-point statistic for testing the parameters of interest. In addition, we develop the robust exponential tilting estimator and the robust exponential tilting test to improve their small sample performances. Finally, their finite-sample properties are studied through Monte Carlo experiments.  相似文献   

10.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

11.
A robust approach to the analysis of epidemic data is suggested. This method is based on a natural extension of M-estimation for i.i.d. observations where the distribution may be asymmetric. It is discussed initially in the context of a general discrete time stochastic process before being applied to previously studied epidemic models. In particular we consider a class of chain binomial models and models based on time dependent branching processes. Robustness and efficiency properties are studied through simulation and some previously analysed data sets are considered.  相似文献   

12.
Fast and robust bootstrap   总被引:1,自引:0,他引:1  
In this paper we review recent developments on a bootstrap method for robust estimators which is computationally faster and more resistant to outliers than the classical bootstrap. This fast and robust bootstrap method is, under reasonable regularity conditions, asymptotically consistent. We describe the method in general and then consider its application to perform inference based on robust estimators for the linear regression and multivariate location-scatter models. In particular, we study confidence and prediction intervals and tests of hypotheses for linear regression models, inference for location-scatter parameters and principal components, and classification error estimation for discriminant analysis.  相似文献   

13.
F. Auert  H. Läuter 《Statistics》2013,47(2):265-293
In this paper we give an approximation procedure to surfaces which are defined on a _p-dimensional region and are observable (disturbed with some noice) according to an experimental design. In this procedure we combine clustering methods, discriminant analysis and smoothing techniques.

In the second part of the paper is considered some investigations on statistical properties of linear smoothing procedures. We assume linear models and for a broad class of models we prove the consistence of the estimation of the expectation of observations after smoothing.

In the last section we give some results on efficiency.  相似文献   

14.
Traditional multivariate control charts are based upon the assumption that the observations follow a multivariate normal distribution. In many practical applications, however, this supposition may be difficult to verify. In this paper, we use control charts based on robust estimators of location and scale to improve the capability of detection observations out of control under non-normality in the presence of multiple outliers. Concretely, we use a simulation process to analyse the behaviour of the robust alternatives to Hotelling's T 2, which use minimum volume ellipsoidal (MVE) and minimum covariance determinant (MCD) in the presence of observations with a Student's t-distribution. The results show that these robust control charts are good alternatives for small deviations from normality due to the fact that the percentage of out-of-control observations detected for these charts in the Phase II are higher.  相似文献   

15.
Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

16.
In geostatistics, detecting atypical observations is of special interest due to the changes they can cause in environmental and geological patterns. Several methods for detecting them have been already suggested for the univariate spatial case. However, the problem is more complicated when various variables are observed simultaneously and the spatial correlation among them must be taken into account. The aim of this paper is to detect outliers and influential observations in multivariate spatial linear models. For this purpose, we derive and explore two different methods. First, a multivariate version of the forward search algorithm is given, where locations with outliers are detected in the last steps of the procedure. Next, we derive influence measures to assess the impact of the observations on the multivariate spatial linear model. The procedures are easy to compute and to interpret by means of graphical representations. Finally, an example and a Monte Carlo study illustrate the performance of these methods for identification of outliers in multivariate spatial linear models.  相似文献   

17.
We complement the work of Cerioli, Riani, Atkinson and Corbellini by discussing monitoring in the context of robust clustering. This implies extending the approach to clustering, and possibly monitoring more than one parameter simultaneously. The cases of trimming and snipping are discussed separately, and special attention is given to recently proposed methods like double clustering, reweighting in robust clustering, and fuzzy regression clustering.  相似文献   

18.
Observations collected over time are often autocorrelated rather than independent, and sometimes include observations below or above detection limits (i.e. censored values reported as less or more than a level of detection) and/or missing data. Practitioners commonly disregard censored data cases or replace these observations with some function of the limit of detection, which often results in biased estimates. Moreover, parameter estimation can be greatly affected by the presence of influential observations in the data. In this paper we derive local influence diagnostic measures for censored regression models with autoregressive errors of order p (hereafter, AR(p)‐CR models) on the basis of the Q‐function under three useful perturbation schemes. In order to account for censoring in a likelihood‐based estimation procedure for AR(p)‐CR models, we used a stochastic approximation version of the expectation‐maximisation algorithm. The accuracy of the local influence diagnostic measure in detecting influential observations is explored through the analysis of empirical studies. The proposed methods are illustrated using data, from a study of total phosphorus concentration, that contain left‐censored observations. These methods are implemented in the R package ARCensReg.  相似文献   

19.
ABSTRACT

For experiments running in field plots or over time, the observations are often correlated due to spatial or serial correlation, which leads to correlated errors in a linear model analyzing the treatment means. Without knowing the exact correlation matrix of the errors, it is not possible to compute the generalized least-squares estimator for the treatment means and use it to construct optimal designs for the experiments. In this paper, we propose to use neighborhoods to model the covariance matrix of the errors, and apply a modified generalized least-squares estimator to construct robust designs for experiments with blocks. A minimax design criterion is investigated, and a simulated annealing algorithm is developed to find robust designs. We have derived several theoretical results, and representative examples are presented.  相似文献   

20.
Longitudinal data are commonly modeled with the normal mixed-effects models. Most modeling methods are based on traditional mean regression, which results in non robust estimation when suffering extreme values or outliers. Median regression is also not a best choice to estimation especially for non normal errors. Compared to conventional modeling methods, composite quantile regression can provide robust estimation results even for non normal errors. In this paper, based on a so-called pseudo composite asymmetric Laplace distribution (PCALD), we develop a Bayesian treatment to composite quantile regression for mixed-effects models. Furthermore, with the location-scale mixture representation of the PCALD, we establish a Bayesian hierarchical model and achieve the posterior inference of all unknown parameters and latent variables using Markov Chain Monte Carlo (MCMC) method. Finally, this newly developed procedure is illustrated by some Monte Carlo simulations and a case analysis of HIV/AIDS clinical data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号