Estimation for the log-logistic and Weibull distributions can be performed by using the equations used for probability plotting, and this technique outperforms the maximum likelihood (ML) estimation often in small samples. This leads to a highly heteroskedastic regression problem. Exact expressions for the variances of the residuals are derived which can be used to perform weighted regression. In large samples, the ML performs best, but it is shown that in smaller samples, the weighted regression outperforms the ML estimation with respect to bias and mean square error.  相似文献   

A cluster methodology, motivated by a robust similarity matrix is proposed for identifying likely multivariate outlier structure and to estimate weighted least-square (WLS) regression parameters in linear models. The proposed method is an agglomeration of procedures that begins from clustering the n-observations through a test of ‘no-outlier hypothesis’ (TONH) to a weighted least-square regression estimation. The cluster phase partition the n-observations into h-set called main cluster and a minor cluster of size n?h. A robust distance emerge from the main cluster upon which a test of no outlier hypothesis’ is conducted. An initial WLS regression estimation is computed from the robust distance obtained from the main cluster. Until convergence, a re-weighted least-squares (RLS) regression estimate is updated with weights based on the normalized residuals. The proposed procedure blends an agglomerative hierarchical cluster analysis of a complete linkage through the TONH to the Re-weighted regression estimation phase. Hence, we propose to call it cluster-based re-weighted regression (CBRR). The CBRR is compared with three existing procedures using two data sets known to exhibit masking and swamping. The performance of CBRR is further examined through simulation experiment. The results obtained from the data set illustration and the Monte Carlo study shows that the CBRR is effective in detecting multivariate outliers where other methods are susceptible to it. The CBRR does not require enormous computation and is substantially not susceptible to masking and swamping.  相似文献   

The interval-censored survival data appear very frequently, where the event of interest is not observed exactly but it is only known to occur within some time interval. In this paper, we propose a location-scale regression model based on the log-generalized gamma distribution for modelling interval-censored data. We shall be concerned only with parametric forms. The proposed model for interval-censored data represents a parametric family of models that has, as special submodels, other regression models which are broadly used in lifetime data analysis. Assuming interval-censored data, we consider a frequentist analysis, a Jackknife estimator and a non-parametric bootstrap for the model parameters. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some techniques to perform global influence.  相似文献   

There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L1-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers.  相似文献   

This paper discusses the problem of statistical inference in multivariate linear regression models when the errors involved are non normally distributed. We consider multivariate t-distribution, a fat-tailed distribution, for the errors as alternative to normal distribution. Such non normality is commonly observed in working with many data sets, e.g., financial data that are usually having excess kurtosis. This distribution has a number of applications in many other areas of research as well. We use modified maximum likelihood estimation method that provides the estimator, called modified maximum likelihood estimator (MMLE), in closed form. These estimators are shown to be unbiased, efficient, and robust as compared to the widely used least square estimators (LSEs). Also, the tests based upon MMLEs are found to be more powerful than the similar tests based upon LSEs.  相似文献   


Statistical methods are effectively used in the evaluation of pharmaceutical formulations instead of laborious liquid chromatography. However, signal overlapping, nonlinearity, multicollinearity and presence of outliers deteriorate the performance of statistical methods. The Partial Least Squares Regression (PLSR) is a very popular method in the quantification of high dimensional spectrally overlapped drug formulations. The SIMPLS is the mostly used PLSR algorithm, but it is highly sensitive to outliers that also effect the diagnostics. In this paper, we propose new robust multivariate diagnostics to identify outliers, influential observations and points causing non-normality for a PLSR model. We study performances of the proposed diagnostics on two everyday use highly overlapping drug systems: Paracetamol–Caffeine and Doxylamine Succinate–Pyridoxine Hydrochloride.  相似文献   

We introduce a log-linear regression model based on the odd log-logistic generalized half-normal distribution [7 G.M. Cordeiro, M. Alizadeh, R.R. Pescim, and E.M.M. Ortega, The odd log-logistic generalized half-normal lifetime distribution: Properties and applications, Comm. Statist. Theory Methods (2015), accepted for publication. [Google Scholar]]. Some of its structural properties including explicit expressions for the density function, quantile and generating functions and ordinary moments are derived. We estimate the model parameters by the maximum likelihood method. For different parameter settings, proportion of censoring and sample size, some simulations are performed to investigate the behavior of the estimators. We derive the appropriate matrices for assessing local influence diagnostics on the parameter estimates under different perturbation schemes. We also define the martingale and modified deviance residuals to detect outliers and evaluate the model assumptions. In addition, we demonstrate that the extended regression model can be very useful in the analysis of real data and provide more realistic fits than other special regression models. The potentiality of the new regression model is illustrated by means of a real data set.  相似文献   

In applications of survival analysis, the failure rate function may frequently present a unimodal shape. In such cases, the log-normal and log-logistic distributions are used. In this paper, we shall be concerned only with parametric forms, so a location-scale regression model based on the odd log-logistic Weibull distribution is proposed for modelling data with a decreasing, increasing, unimodal and bathtub failure rate function as an alternative to the log-Weibull regression model. For censored data, we consider a classic method to estimate the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Further, for different parameter settings, sample sizes and censoring percentages, various simulations are performed. In addition, the empirical distribution of some modified residuals is determined and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the new regression model applied to censored data. We analyse a real data set using the log-odd log-logistic Weibull regression model.  相似文献   

In this paper we discuss the recursive (or on line) estimation in (i) regression and (ii) autoregressive integrated moving average (ARIMA) time series models. The adopted approach uses Kalman filtering techniques to calculate estimates recursively. This approach is used for the estimation of constant as well as time varying parameters. In the first section of the paper we consider the linear regression model. We discuss recursive estimation both for constant and time varying parameters. For constant parameters, Kalman filtering specializes to recursive least squares. In general, we allow the parameters to vary according to an autoregressive integrated moving average process and update the parameter estimates recursively. Since the stochastic model for the parameter changes will "be rarely known, simplifying assumptions have to be made. In particular we assume a random walk model for the time varying parameters and show how to determine whether the parameters are changing over time. This is illustrated with an example.  相似文献   

The multinomial logistic regression model (MLRM) can be interpreted as a natural extension of the binomial model with logit link function to situations where the response variable can have three or more possible outcomes. In addition, when the categories of the response variable are nominal, the MLRM can be expressed in terms of two or more logistic models and analyzed in both frequentist and Bayesian approaches. However, few discussions about post modeling in categorical data models are found in the literature, and they mainly use Bayesian inference. The objective of this work is to present classic and Bayesian diagnostic measures for categorical data models. These measures are applied to a dataset (status) of patients undergoing kidney transplantation.  相似文献   

Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

Partial linear varying coefficient models are often used in real data analysis for a good balance between flexibility and parsimony. In this paper, we propose a robust adaptive model selection method based on the rank regression, which can do simultaneous coefficient estimation and three types of selections, i.e., varying and constant effects selection, relevant variable selection. The new method has superiority in robustness and efficiency by inheriting the advantage of the rank regression approach. Furthermore, consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies also confirm our method.  相似文献   

Various mixed models were developed to capture the features of between- and within-individual variation for longitudinal data under the normality assumption of the random effect and the within-individual random error. However, the normality assumption may be violated in some applications. To this end, this article assumes that the random effect follows a skew-normal distribution and the within-individual error is distributed as a reproductive dispersion model. An expectation conditional maximization (ECME) algorithm together with the Metropolis-Hastings (MH) algorithm within the Gibbs sampler is presented to simultaneously obtain estimates of parameters and random effects. Several diagnostic measures are developed to identify the potentially influential cases and assess the effect of minor perturbation to model assumptions via the case-deletion method and local influence analysis. To reduce the computational burden, we derive the first-order approximations to case-deletion diagnostics. Several simulation studies and a real data example are presented to illustrate the newly developed methodologies.  相似文献   

