首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

This article studies the outlier detection problem in mixed regressive-spatial autoregressive model. The formulae for testing outliers and their approximate distributions are derived under the mean-shift model and the variance-weight model, respectively. The simulation studies are conducted for examining the power and size of the test, as well as for the detection of outliers when a simulated data contains several outliers. A real data is analyzed to illustrate the proposed method, and modified models based on mean-shift and variance-weight models in which detected outliers are taken into account are suggested to deal with the outliers and confirm theconclusions.  相似文献   

2.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

3.
Asymmetric models have been extensively studied in recent years, in situations where the normality assumption is not satisfied due to lack of symmetry of the data. Techniques for assessing the quality of fit and diagnostic analysis are important for model validation. This paper presents a study of the mean-shift method for detecting outliers in asymmetric normal regression models. Analytical solutions for the estimators of the parameters are obtained using the algorithm. Simulation studies and application to real data are presented, showing the efficiency of the method in detecting outliers.  相似文献   

4.
Abstract

Binomial integer-valued AR processes have been well studied in the literature, but there is little progress in modeling bounded integer-valued time series with outliers. In this paper, we first review some basic properties of the binomial integer-valued AR(1) process and then we introduce binomial integer-valued AR(1) processes with two classes of innovational outliers. We focus on the joint conditional least squares (CLS) and the joint conditional maximum likelihood (CML) estimates of models’ parameters and the probability of occurrence of the outlier. Their large-sample properties are illustrated by simulation studies. Artificial and real data examples are used to demonstrate good performances of the proposed models.  相似文献   

5.
ABSTRACT

This article proposes a development of detecting patches of additive outliers in autoregressive time series models. The procedure improves the existing detection methods via Gibbs sampling. We combine the Bayesian method and the Kalman smoother to present some candidate models of outlier patches and the best model with the minimum Bayesian information criterion (BIC) is selected among them. We propose that this combined Bayesian and Kalman method (CBK) can reduce the masking and swamping effects about detecting patches of additive outliers. The correctness of the method is illustrated by simulated data and then by analyzing a real set of observations.  相似文献   

6.
ABSTRACT

Constrained general linear models (CGLMs) have wide applications in practice. Similar to other data analysis, the identification of influential observations that may be potential outliers is an important step beyond in the CGLMs. We develop multiple case-deletion diagnostics for detecting influential observations in the CGLMs. The diagnostics are functions of basic building blocks: studentized residuals, error contrast matrix, and the inverse of the response variable covariance matrix. The basic building blocks are computed only once from the complete data analysis and provide information on the influence of the data on different aspects of the model fit. Computational formulas are given which make the procedures feasible. An illustrative example with a real data set is also reported.  相似文献   

7.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

8.
ABSTRACT

As there is an extensive body of research on diagnostics in regression models, various outlier detection methods have been developed. These methods have been extended to mixed effects models and generalized linear models, but there exist intrinsic drawbacks and limitations. This paper presents two-dimensional plots to identify discordant subjects and observations in generalized linear mixed effects models, displaying discordance in two directions. The sTudentized Residual Sum of Squares is not an extension of any regression tools but a new approach designed to efficiently reflect the characteristics of repeated measures. And this noteworthy clustering of outliers is identified in the plot. Applications to real-life examples are presented to illustrate the favorable/beneficial performance of the new tool.  相似文献   

9.
The most popular method for trying to detect an association between two random variables is to test H 0 ?:?ρ=0, the hypothesis that Pearson's correlation is equal to zero. It is well known, however, that Pearson's correlation is not robust, roughly meaning that small changes in any distribution, including any bivariate normal distribution as a special case, can alter its value. Moreover, the usual estimate of ρ, r, is sensitive to only a few outliers which can mask a true association. A simple alternative to testing H 0 ?:?ρ =0 is to switch to a measure of association that guards against outliers among the marginal distributions such as Kendall's tau, Spearman's rho, a Winsorized correlation, or a so-called percentage bend correlation. But it is known that these methods fail to take into account the overall structure of the data. Many measures of association that do take into account the overall structure of the data have been proposed, but it seems that nothing is known about how they might be used to detect dependence. One such measure of association is selected, which is designed so that under bivariate normality, its estimator gives a reasonably accurate estimate of ρ. Then methods for testing the hypothesis of a zero correlation are studied.  相似文献   

10.
Mixed effects models or random effects models are popular for the analysis of longitudinal data. In practice, longitudinal data are often complex since there may be outliers in both the response and the covariates and there may be measurement errors. The likelihood method is a common approach for these problems but it can be computationally very intensive and sometimes may even be computationally infeasible. In this article, we consider approximate robust methods for nonlinear mixed effects models to simultaneously address outliers and measurement errors. The approximate methods are computationally very efficient. We show the consistency and asymptotic normality of the approximate estimates. The methods can also be extended to missing data problems. An example is used to illustrate the methods and a simulation is conducted to evaluate the methods.  相似文献   

11.
Abstract

There are three main problems in the existing procedures for detecting outliers in ARIMA models. The first one is the biased estimation of the initial parameter values that may strongly affect the power to detect outliers. The second problem is the confusion between level shifts and innovative outliers when the series has a level shift. The third problem is masking. We propose a procedure that keeps the powerful features of previous methods but improves the initial parameter estimate, avoids the confusion between innovative outliers and level shifts and includes joint tests for sequences of additive outliers in order to solve the masking problem. A Monte Carlo study and one example of the performance of the proposed procedure are presented.  相似文献   

12.
This article is concerned with the outliers in GARCH models. An iterative procedure is given for testing the presence of any type of the four common outliers. Since the distribution of test statistic cannot be obtained analytically, its distributional behavior is investigated via a simulation study. The simulation study is based on estimation of residuals standard deviation (σν), which are obtained using two methods, median absolute deviation method (MAD), and omit-one method. The proposed procedure is employed for testing the presence of outliers in weekly light oil price Indexes of Iran during 1997 to 2010.  相似文献   

13.
ABSTRACT

Motivated by a longitudinal oral health study, the Signal-Tandmobiel® study, a Bayesian approach has been developed to model misclassified ordinal response data. Two regression models have been considered to incorporate misclassification in the categorical response. Specifically, probit and logit models have been developed. The computational difficulties have been avoided by using data augmentation. This idea is exploited to derive efficient Markov chain Monte Carlo methods. Although the method is proposed for ordered categories, it can also be implemented for unordered ones in a simple way. The model performance is shown through a simulation-based example and the analysis of the motivating study.  相似文献   

14.
ABSTRACT

In this paper, we study a novelly robust variable selection and parametric component identification simultaneously in varying coefficient models. The proposed estimator is based on spline approximation and two smoothly clipped absolute deviation (SCAD) penalties through rank regression, which is robust with respect to heavy-tailed errors or outliers in the response. Furthermore, when the tuning parameter is chosen by modified BIC criterion, we show that the proposed procedure is consistent both in variable selection and the separation of varying and constant coefficients. In addition, the estimators of varying coefficients possess the optimal convergence rate under some assumptions, and the estimators of constant coefficients have the same asymptotic distribution as their counterparts obtained when the true model is known. Simulation studies and a real data example are undertaken to assess the finite sample performance of the proposed variable selection procedure.  相似文献   

15.
ABSTRACT

Many financial decisions such as portfolio allocation, risk management, option pricing and hedge strategies are based on the forecast of the conditional variances, covariances and correlations of financial returns. Although the decisions depend on the forecasts covariance matrix little is known about effects of outliers on the uncertainty associated with these forecasts. In this paper we analyse these effects on the context of dynamic conditional correlation models when the uncertainty is measured using bootstrap methods. We also propose a bootstrap procedure to obtain forecast densities for return, volatilities, conditional correlation and Value-at-Risk that is robust to outliers. The results are illustrated with simulated and real data.  相似文献   

16.
Abstract

We study alternative models for capturing abrupt structural changes (level shifts) in a times series. The problem is confounded by the presence of transient outliers. We compare the performance of non-Gaussian time-varying parameter models and multiprocess mixture models within a Monte Carlo experimental setup. Our findings suggest that once we incorporate shocks with thick-tailed probability distributions, the superiority of the multiprocess mixture models over the time-varying parameter models, reported in an earlier study, disappears. The behavior of the two models, both in adapting to level shifts and in reacting to transient outliers, is very similar.  相似文献   

17.
ABSTRACT

In biomedical and epidemiological studies, gene–environment (G–E) interactions have been shown to importantly contribute to the etiology and progression of many complex diseases. Most existing approaches for identifying G–E interactions are limited by the lack of robustness against outliers/contaminations in response and predictor spaces. In this study, we develop a novel robust G–E identification approach using the trimmed regression technique under joint modelling. A robust data-driven criterion and stability selection are adopted to determine the trimmed subset which is free from both vertical outliers and leverage points. An effective penalization approach is developed to identify important G–E interactions, respecting the ‘main effects, interactions’ hierarchical structure. Extensive simulations demonstrate the better performance of the proposed approach compared to multiple alternatives. Interesting findings with superior prediction accuracy and stability are observed in the analysis of The Cancer Genome Atlas data on cutaneous melanoma and breast invasive carcinoma.  相似文献   

18.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

19.
ABSTRACT

This article considers the problem of a mean change-point in heavy-tailed dependent observations. A method of change-point estimation by truncating initial process is proposed, which can weaken the affection of outliers. In the infinite variance case, we obtained a generalization Hájek-Rényi type inequality. Consistency and the rate of convergence for the estimated change-point are also established. The results of a simulation study support validity of our method.  相似文献   

20.
National statistical agencies and other data custodians collect and hold a vast amount of survey and census data, containing information vital for research and policy analysis. However, the problem of allowing analysis of these data, while protecting respondent confidentiality, has proved challenging to address. In this paper we will focus on the remote analysis approach, under which a confidential dataset is held in a secure environment under the direct control of the data custodian agency. A computer system within the secure environment accepts a query from an analyst, runs it on the data, then returns the results to the analyst. In particular, the analyst does not have direct access to the data at all, and cannot view any microdata records. We further focus on the fitting of linear regression models to confidential data in the presence of outliers and influential points, such as are often present in business data. We propose a new method for protecting confidentiality in linear regression via a remote analysis system, that provides additional confidentiality protection for outliers and influential points in the data. The method we describe in this paper was designed for the prototype DataAnalyser system developed by the Australian Bureau of Statistics, however the method would be suitable for similar remote analysis systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号