首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this article, utilizing a scale mixture of skew-normal distribution in which mixing random variable is assumed to follow a mixture model with varying weights for each observation, we introduce a generalization of skew-normal linear regression model with the aim to provide resistant results. This model, which also includes the skew-slash distribution in a particular case, allows us to accommodate and detect outlying observations under the skew-normal linear regression model. Inferences about the model are carried out through the empirical Bayes approach. The conditions for propriety of the posterior and for existence of posterior moments are given under the standard noninformative priors for regression and scale parameters as well as proper prior for skewness parameter. Then, for Bayesian inference, a Markov chain Monte Carlo method is described. Since posterior results depend on the prior hyperparameters, we estimate them adopting the empirical Bayes method as well as using a Monte Carlo EM algorithm. Furthermore, to identify possible outliers, we also apply the Bayes factor obtained through the generalized Savage-Dickey density ratio. Examining the proposed approach on simulated instance and real data, it is found to provide not only satisfactory parameter estimates rather allow identifying outliers favorably.  相似文献   

2.
Estimators derived from the expectation‐maximization (EM) algorithm are not robust since they are based on the maximization of the likelihood function. We propose an iterative proximal‐point algorithm based on the EM algorithm to minimize a divergence criterion between a mixture model and the unknown distribution that generates the data. The algorithm estimates in each iteration the proportions and the parameters of the mixture components in two separate steps. Resulting estimators are generally robust against outliers and misspecification of the model. Convergence properties of our algorithm are studied. The convergence of the introduced algorithm is discussed on a two‐component Weibull mixture entailing a condition on the initialization of the EM algorithm in order for the latter to converge. Simulations on Gaussian and Weibull mixture models using different statistical divergences are provided to confirm the validity of our work and the robustness of the resulting estimators against outliers in comparison to the EM algorithm. An application to a dataset of velocities of galaxies is also presented. The Canadian Journal of Statistics 47: 392–408; 2019 © 2019 Statistical Society of Canada  相似文献   

3.
Regression procedures are not only hindered by large p and small n, but can also suffer in cases when outliers are present or the data generating mechanisms are heavy tailed. Since the penalized estimates like the least absolute shrinkage and selection operator (LASSO) are equipped to deal with the large p small n by encouraging sparsity, we combine a LASSO type penalty with the absolute deviation loss function, instead of the standard least squares loss, to handle the presence of outliers and heavy tails. The model is cast in a Bayesian setting and a Gibbs sampler is derived to efficiently sample from the posterior distribution. We compare our method to existing methods in a simulation study as well as on a prostate cancer data set and a base deficit data set from trauma patients.  相似文献   

4.
The EM algorithm is often used for finding the maximum likelihood estimates in generalized linear models with incomplete data. In this article, the author presents a robust method in the framework of the maximum likelihood estimation for fitting generalized linear models when nonignorable covariates are missing. His robust approach is useful for downweighting any influential observations when estimating the model parameters. To avoid computational problems involving irreducibly high‐dimensional integrals, he adopts a Metropolis‐Hastings algorithm based on a Markov chain sampling method. He carries out simulations to investigate the behaviour of the robust estimates in the presence of outliers and missing covariates; furthermore, he compares these estimates to the classical maximum likelihood estimates. Finally, he illustrates his approach using data on the occurrence of delirium in patients operated on for abdominal aortic aneurysm.  相似文献   

5.
In this paper, we propose a novel robust principal component analysis (PCA) for high-dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy-tail-distributed data, whose covariances may be non-existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of simulations, we demonstrate the benefits of our approach over the classical PCA. Finally, using data on protein expression in mice of various genotypes in a biological study, we apply the novel robust PCA to categorise the mice and find that our approach is more effective at identifying abnormal mice than the classical PCA.  相似文献   

6.
In this article, we present the performance of the maximum likelihood estimates of the Burr XII parameters for constant-stress partially accelerated life tests under multiple censored data. Two maximum likelihood estimation methods are considered. One method is based on observed-data likelihood function and the maximum likelihood estimates are obtained by using the quasi-Newton algorithm. The other method is based on complete-data likelihood function and the maximum likelihood estimates are derived by using the expectation-maximization (EM) algorithm. The variance–covariance matrices are derived to construct the confidence intervals of the parameters. The performance of these two algorithms is compared with each other by a simulation study. The simulation results show that the maximum likelihood estimation via the EM algorithm outperforms the quasi-Newton algorithm in terms of the absolute relative bias, the bias, the root mean square error and the coverage rate. Finally, a numerical example is given to illustrate the performance of the proposed methods.  相似文献   

7.
Toxicologists and pharmacologists often describe toxicity of a chemical using parameters of a nonlinear regression model. Thus estimation of parameters of a nonlinear regression model is an important problem. The estimates of the parameters and their uncertainty estimates depend upon the underlying error variance structure in the model. Typically, a priori the researcher would not know if the error variances are homoscedastic (i.e., constant across dose) or if they are heteroscedastic (i.e., the variance is a function of dose). Motivated by this concern, in this paper we introduce an estimation procedure based on preliminary test which selects an appropriate estimation procedure accounting for the underlying error variance structure. Since outliers and influential observations are common in toxicological data, the proposed methodology uses M-estimators. The asymptotic properties of the preliminary test estimator are investigated; in particular its asymptotic covariance matrix is derived. The performance of the proposed estimator is compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using a data set obtained from the National Toxicology Program.  相似文献   

8.
The least squares estimates of the parameters in the multistage dose-response model are unduly affected by outliers in a data set whereas the minimum sum of absolute errors, MSAE estimates are more resistant to outliers. Algorithms to compute the MSAE estimates can be tedious and computationally burdensome. We propose a linear approximation for the dose-response model that can be used to find the MSAE estimates by a simple and computationally less intensive algorithm. A few illustrative ex-amples and a Monte Carlo study show that we get comparable values of the MSAE estimates of the parameters in a dose-response model using the exact model and the linear approximation.  相似文献   

9.
In this paper, we consider a regression model with non-spherical covariance structure and outliers in the response. The generalized least squares estimator obtained from the full data set is generally not used in the presence of outliers and an estimator based on only the non-outlying observations is preferred. Here we propose as an estimator a convex combination of the full set and the deleted set estimators and compare its performance with the other two.  相似文献   

10.
This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as “target clusters” (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation–maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification.  相似文献   

11.
Principal component analysis (PCA) is widely used to analyze high-dimensional data, but it is very sensitive to outliers. Robust PCA methods seek fits that are unaffected by the outliers and can therefore be trusted to reveal them. FastHCS (high-dimensional congruent subsets) is a robust PCA algorithm suitable for high-dimensional applications, including cases where the number of variables exceeds the number of observations. After detailing the FastHCS algorithm, we carry out an extensive simulation study and three real data applications, the results of which show that FastHCS is systematically more robust to outliers than state-of-the-art methods.  相似文献   

12.
Probabilistic Principal Component Analysis   总被引:2,自引:0,他引:2  
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA.  相似文献   

13.
A novel application of the expectation maximization (EM) algorithm is proposed for modeling right-censored multiple regression. Parameter estimates, variability assessment, and model selection are summarized in a multiple regression settings assuming a normal model. The performance of this method is assessed through a simulation study. New formulas for measuring model utility and diagnostics are derived based on the EM algorithm. They include reconstructed coefficient of determination and influence diagnostics based on a one-step deletion method. A real data set, provided by North Dakota Department of Veterans Affairs, is modeled using the proposed methodology. Empirical findings should be of benefit to government policy-makers.  相似文献   

14.
In this article, we consider a competing cause scenario and assume the wider family of Conway–Maxwell–Poisson (COM–Poisson) distribution to model the number of competing causes. Assuming the type of the data to be interval censored, the main contribution is in developing the steps of the expectation maximization (EM) algorithm to determine the maximum likelihood estimates (MLEs) of the model parameters. A profile likelihood approach within the EM framework is proposed to estimate the COM–Poisson shape parameter. An extensive simulation study is conducted to evaluate the performance of the proposed EM algorithm. Model selection within the wider class of COM–Poisson distribution is carried out using likelihood ratio test and information-based criteria. A study to demonstrate the effect of model mis-specification is also carried out. Finally, the proposed estimation method is applied to a data on smoking cessation and a detailed analysis of the obtained results is presented.  相似文献   

15.
In this article, we investigate a new estimation approach for the partially linear single-index model based on modal regression method, where the non parametric function is estimated by penalized spline method. Moreover, we develop an expection maximum (EM)-type algorithm and establish the large sample properties of the proposed estimation method. A distinguishing characteristic of the newly proposed estimation is robust against outliers through introducing an additional tuning parameter which can be automatically selected using the observed data. Simulation studies and real data example are used to evaluate the finite-sample performance, and the results show that the newly proposed method works very well.  相似文献   

16.
This article considers inference for the log-normal distribution based on progressive Type I interval censored data by both frequentist and Bayesian methods. First, the maximum likelihood estimates (MLEs) of the unknown model parameters are computed by expectation-maximization (EM) algorithm. The asymptotic standard errors (ASEs) of the MLEs are obtained by applying the missing information principle. Next, the Bayes’ estimates of the model parameters are obtained by Gibbs sampling method under both symmetric and asymmetric loss functions. The Gibbs sampling scheme is facilitated by adopting a similar data augmentation scheme as in EM algorithm. The performance of the MLEs and various Bayesian point estimates is judged via a simulation study. A real dataset is analyzed for the purpose of illustration.  相似文献   

17.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

18.
Ordinal data, such as student's grades or customer satisfaction surveys, are widely used in daily life. We can fit a probit or logistic regression model to the ordinal data using software such as SAS and get the estimates of regression parameters. However, it is hard to define residuals and detect outliers due to the fact that the estimated probabilities of an observation falling in every category form a vector instead of a scalar. With the help of latent variable and latent residuals, a Bayesian perspective of detecting outliers is explored and several methods were proposed in this article. Several figures are also given.  相似文献   

19.
We propose here a robust multivariate extension of the bivariate Birnbaum–Saunders (BS) distribution derived by Kundu et al. [Bivariate Birnbaum–Saunders distribution and associated inference. J Multivariate Anal. 2010;101:113–125], based on scale mixtures of normal (SMN) distributions that are used for modelling symmetric data. This resulting multivariate BS-type distribution is an absolutely continuous distribution whose marginal and conditional distributions are of BS-type distribution of Balakrishnan et al. [Estimation in the Birnbaum–Saunders distribution based on scalemixture of normals and the EM algorithm. Stat Oper Res Trans. 2009;33:171–192]. Due to the complexity of the likelihood function, parameter estimation by direct maximization is very difficult to achieve. For this reason, we exploit the nice hierarchical representation of the proposed distribution to propose a fast and accurate EM algorithm for computing the maximum likelihood (ML) estimates of the model parameters. We then evaluate the finite-sample performance of the developed EM algorithm and the asymptotic properties of the ML estimates through empirical experiments. Finally, we illustrate the obtained results with a real data and display the robustness feature of the estimation procedure developed here.  相似文献   

20.
Rank tests are known to be robust to outliers and violation of distributional assumptions. Two major issues besetting microarray data are violation of the normality assumption and contamination by outliers. In this article, we formulate the normal theory simultaneous tests and their aligned rank transformation (ART) analog for detecting differentially expressed genes. These tests are based on the least-squares estimates of the effects when data follow a linear model. Application of the two methods are then demonstrated on a real data set. To evaluate the performance of the aligned rank transform method with the corresponding normal theory method, data were simulated according to the characteristics of a real gene expression data. These simulated data are then used to compare the two methods with respect to their sensitivity to the distributional assumption and to outliers for controlling the family-wise Type I error rate, power, and false discovery rate. It is demonstrated that the ART generally possesses the robustness of validity property even for microarray data with small number of replications. Although these methods can be applied to more general designs, in this article the simulation study is carried out for a dye-swap design since this design is broadly used in cDNA microarray experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号