首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes a new robust Bayes factor for comparing two linear models. The factor is based on a pseudo‐model for outliers and is more robust to outliers than the Bayes factor based on the variance‐inflation model for outliers. If an observation is considered an outlier for both models this new robust Bayes factor equals the Bayes factor calculated after removing the outlier. If an observation is considered an outlier for one model but not the other then this new robust Bayes factor equals the Bayes factor calculated without the observation, but a penalty is applied to the model considering the observation as an outlier. For moderate outliers where the variance‐inflation model is suitable, the two Bayes factors are similar. The new Bayes factor uses a single robustness parameter to describe a priori belief in the likelihood of outliers. Real and synthetic data illustrate the properties of the new robust Bayes factor and highlight the inferior properties of Bayes factors based on the variance‐inflation model for outliers.  相似文献   

2.
For high-dimensional data, it is a tedious task to determine anomalies such as outliers. We present a novel outlier detection method for high-dimensional contingency tables. We use the class of decomposable graphical models to model the relationship among the variables of interest, which can be depicted by an undirected graph called the interaction graph. Given an interaction graph, we derive a closed-form expression of the likelihood ratio test (LRT) statistic and an exact distribution for efficient simulation of the test statistic. An observation is declared an outlier if it deviates significantly from the approximated distribution of the test statistic under the null hypothesis. We demonstrate the use of the LRT outlier detection framework on genetic data modeled by Chow–Liu trees.  相似文献   

3.
Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach.  相似文献   

4.
Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided.  相似文献   

5.
In mixed linear models, it is frequently of interest to test hypotheses on the variance components. F-test and likelihood ratio test (LRT) are commonly used for such purposes. Current LRTs available in literature are based on limiting distribution theory. With the development of finite sample distribution theory, it becomes possible to derive the exact test for likelihood ratio statistic. In this paper, we consider the problem of testing null hypotheses on the variance component in a one-way balanced random effects model. We use the exact test for the likelihood ratio statistic and compare the performance of F-test and LRT. Simulations provide strong support of the equivalence between these two tests. Furthermore, we prove the equivalence between these two tests mathematically.  相似文献   

6.
As the Watson distribution is frequently used for modeling axial data, it is important to investigate the existence of possible outliers in samples from this distribution. Then, we develop for the bipolar Watson distribution defined on the hypersphere, some tests of discordancy of an outlier or several outliers en bloc based on the likelihood ratio, supposing an alternative model of contamination of slippage type. We evaluate the performance of these tests of discordancy of an outlier and we also compare some tests of discordancy of an outlier available for this distribution.  相似文献   

7.
The paper deals with generalized confidence intervals for the between-group variance in one-way heteroscedastic (unbalanced) ANOVA with random effects. The approach used mimics the standard one applied in mixed linear models with two variance components, where interval estimators are based on a minimal sufficient statistic derived after an initial reduction by the principle of invariance. A minimal sufficient statistic under heteroscedasticity is found to resemble its homoscedastic counterpart and further analogies between heteroscedastic and homoscedastic cases lead us to two classes of fiducial generalized pivots for the between-group variance. The procedures suggested formerly by Wimmer and Witkovský [Between group variance component interval estimation for the unbalanced heteroscedastic one-way random effects model, J. Stat. Comput. Simul. 73 (2003), pp. 333–346] and Li [Comparison of confidence intervals on between group variance in unbalanced heteroscedastic one-way random models, Comm. Statist. Simulation Comput. 36 (2007), pp. 381–390] are found to belong to these two classes. We comment briefly on some of their properties that were not mentioned in the original papers. In addition, properties of another particular generalized pivot are considered.  相似文献   

8.
This article provides a procedure for the detection and identification of outliers in the spectral domain where the Whittle maximum likelihood estimator of the panel data model proposed by Chen [W.D. Chen, Testing for spurious regression in a panel data model with the individual number and time length growing, J. Appl. Stat. 33(88) (2006b), pp. 759–772] is implemented. We extend the approach of Chang and co-workers [I. Chang, G.C. Tiao, and C. Chen, Estimation of time series parameters in the presence of outliers, Technometrics 30 (2) (1988), pp. 193–204] to the spectral domain and through the Whittle approach we can quickly detect and identify the type of outliers. A fixed effects panel data model is used, in which the remainder disturbance is assumed to be a fractional autoregressive integrated moving-average (ARFIMA) process and the likelihood ratio criterion is obtained directly through the modified inverse Fourier transform. This saves much time, especially when the estimated model implements a huge data-set.

Through Monte Carlo experiments, the consistency of the estimator is examined by growing the individual number N and time length T, in which the long memory remainder disturbances are contaminated with two types of outliers: additive outlier and innovation outlier. From the power tests, we see that the estimators are quite successful and powerful.

In the empirical study, we apply the model on Taiwan's computer motherboard industry. Weekly data from 1 January 2000 to 31 October 2006 of nine familiar companies are used. The proposed model has a smaller mean square error and shows more distinctive aggressive properties than the raw data model does.  相似文献   


9.
Summary. Semiparametric mixed models are useful in biometric and econometric applications, especially for longitudinal data. Maximum penalized likelihood estimators (MPLEs) have been shown to work well by Zhang and co-workers for both linear coefficients and nonparametric functions. This paper considers the role of influence diagnostics in the MPLE by extending the case deletion and subject deletion analysis of linear models to accommodate the inclusion of a nonparametric component. We focus on influence measures for the fixed effects and provide formulae that are analogous to those for simpler models and readily computable with the MPLE algorithm. We also establish an equivalence between the case or subject deletion model and a mean shift outlier model from which we derive tests for outliers. The influence diagnostics proposed are illustrated through a longitudinal hormone study on progesterone and a simulated example.  相似文献   

10.
We consider the problem of estimating and testing a general linear hypothesis in a general multivariate linear model, the so-called Growth Curve model, when the p × N observation matrix is normally distributed.

The maximum likelihood estimator (MLE) for the mean is a weighted estimator with the inverse of the sample covariance matrix which is unstable for large p close to N and singular for p larger than N. We modify the MLE to an unweighted estimator and propose new tests which we compare with the previous likelihood ratio test (LRT) based on the weighted estimator, i.e., the MLE. We show that the performance of these new tests based on the unweighted estimator is better than the LRT based on the MLE.  相似文献   


11.
We provide a method for simultaneous variable selection and outlier identification using the mean-shift outlier model. The procedure consists of two steps: the first step is to identify potential outliers, and the second step is to perform all possible subset regressions for the mean-shift outlier model containing the potential outliers identified in step 1. This procedure is helpful for model selection while simultaneously considering outlier identification, and can be used to identify multiple outliers. In addition, we can evaluate the impact on the regression model of simultaneous omission of variables and interesting observations. In an example, we provide detailed output from the R system, and compare the results with those using posterior model probabilities as proposed by Hoeting et al. [Comput. Stat. Data Anal. 22 (1996), pp. 252-270] for simultaneous variable selection and outlier identification.  相似文献   

12.
Nonlinear mixed-effects models are very useful to analyze repeated measures data and are used in a variety of applications. Normal distributions for random effects and residual errors are usually assumed, but such assumptions make inferences vulnerable to the presence of outliers. In this work, we introduce an extension of a normal nonlinear mixed-effects model considering a subclass of elliptical contoured distributions for both random effects and residual errors. This elliptical subclass, the scale mixtures of normal (SMN) distributions, includes heavy-tailed multivariate distributions, such as Student-t, the contaminated normal and slash, among others, and represents an interesting alternative to outliers accommodation maintaining the elegance and simplicity of the maximum likelihood theory. We propose an exact estimation procedure to obtain the maximum likelihood estimates of the fixed-effects and variance components, using a stochastic approximation of the EM algorithm. We compare the performance of the normal and the SMN models with two real data sets.  相似文献   

13.
It is well known that in a traditional outlier-free situation, the generalized quasi-likelihood (GQL) approach [B.C. Sutradhar, On exact quasilikelihood inference in generalized linear mixed models, Sankhya: Indian J. Statist. 66 (2004), pp. 261–289] performs very well to obtain the consistent as well as the efficient estimates for the parameters involved in the generalized linear mixed models (GLMMs). In this paper, we first examine the effect of the presence of one or more outliers on the GQL estimation for the parameters in such GLMMs, especially in two important models such as count and binary mixed models. The outliers appear to cause serious biases and hence inconsistency in the estimation. As a remedy, we then propose a robust GQL (RGQL) approach in order to obtain the consistent estimates for the parameters in the GLMMs in the presence of one or more outliers. An extensive simulation study is conducted to examine the consistency performance of the proposed RGQL approach.  相似文献   

14.
Daniel Hohmann 《Statistics》2013,47(2):348-362
We consider a two-component location mixture model with symmetric components, one of which is assumed to be known, the other is unknown. We show identifiability under assumptions on the tails of the characteristic function for the true underlying mixture, and also construct asymptotically normal estimates. The model is an extension of the contamination model in Bordes et al. [Semiparametric estimation of a two-component mixture model when a component is known, Scand. J. Statist. 33 (2006), pp. 733–752], and also related to a location mixture of one symmetric density as in Bordes et al. [Semiparametric estimation of a two component mixture model, Ann. Statist. 34 (2006), pp. 1204–1232]. We show by simulation that estimating the additional location parameter leads to a slight loss of efficiency as compared with the contamination model.  相似文献   

15.
This paper deals with a formal identification of outliers in regression based on tests of hypotheses. The hypothesis is not the standard one but is based on performance criteria that relates to the coefficient estimation and predictive capabilities of the model. The cri-teria include the trace of the mean square error matrix on the coefficients and integrated mean square error of prediction. Both the mean shift outlier model and the variance in-flation model are discussed.  相似文献   

16.
Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   

17.
Quantitative traits measured over pedigrees of individuals may be analysed using maximum likelihood estimation, assuming that the trait has a multivariate normal distribution. This approach is often used in the analysis of mixed linear models. In this paper a robust version of the log likelihood for multivariate normal data is used to construct M-estimators which are resistant to contamination by outliers. The robust estimators are found using a minimisation routine which retains the flexible parameterisations of the multivariate normal approach. Asymptotic properties of the estimators are derived, computation of the estimates and their use in outlier detection tests are discussed, and a small simulation study is conducted.  相似文献   

18.
In many areas of application mixed linear models serve as a popular tool for analyzing highly complex data sets. For inference about fixed effects and variance components, likelihood-based methods such as (restricted) maximum likelihood estimators, (RE)ML, are commonly pursued. However, it is well-known that these fully efficient estimators are extremely sensitive to small deviations from hypothesized normality of random components as well as to other violations of distributional assumptions. In this article, we propose a new class of robust-efficient estimators for inference in mixed linear models. The new three-step estimation procedure provides truncated generalized least squares and variance components' estimators with hard-rejection weights adaptively computed from the data. More specifically, our data re-weighting mechanism first detects and removes within-subject outliers, then identifies and discards between-subject outliers, and finally it employs maximum likelihood procedures on the “clean” data. Theoretical efficiency and robustness properties of this approach are established.  相似文献   

19.
A method for robustness in linear models is to assume that there is a mixture of standard and outlier observations with a different error variance for each class. For generalised linear models (GLMs) the mixture model approach is more difficult as the error variance for many distributions has a fixed relationship to the mean. This model is extended to GLMs by changing the classes to one where the standard class is a standard GLM and the outlier class which is an overdispersed GLM achieved by including a random effect term in the linear predictor. The advantages of this method are it can be extended to any model with a linear predictor, and outlier observations can be easily identified. Using simulation the model is compared to an M-estimator, and found to have improved bias and coverage. The method is demonstrated on three examples.  相似文献   

20.
Maclean et al. (1976) applied a specific Box-Cox transformation to test for mixtures of distributions against a single distribution. Their null hypothesis is that a sample of n observations is from a normal distribution with unknown mean and variance after a restricted Box-Cox transformation. The alternative is that the sample is from a mixture of two normal distributions, each with unknown mean and unknown, but equal, variance after another restricted Box-Cox transformation. We developed a computer program that calculated the maximum likelihood estimates (MLEs) and likelihood ratio test (LRT) statistic for the above. Our algorithm for the calculation of the MLEs of the unknown parameters used multiple starting points to protect against convergence to a local rather than global maximum. We then simulated the distribution of the LRT for samples drawn from a normal distribution and five Box-Cox transformations of a normal distribution. The null distribution appeared to be the same for the Box-Cox transformations studied and appeared to be distributed as a chi-square random variable for samples of 25 or more. The degrees of freedom parameter appeared to be a monotonically decreasing function of the sample size. The null distribution of this LRT appeared to converge to a chi-square distribution with 2.5 degrees of freedom. We estimated the critical values for the 0.10, 0.05, and 0.01 levels of significance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号