 共查询到20条相似文献,搜索用时 15 毫秒
Five widely used test statistics for detecting outliers and influential observations were studied using Monte Carlo method . The test statistic based on Studentized residuals, with critical values given by Tietjen, Moore and Beckman (1973), appears to be the best procedure for detecting a single outlier in simple linear regression.  相似文献   

In this paper, asymptotic normality is established for the parameters of the multivariate skew-normal distribution under two parametrizations. Also, an analytic expression and an asymptotic normal law are derived for the skewness vector of the skew-normal distribution. The estimates are derived using the method of moments. Convergence to the asymptotic distributions is examined both computationally and in a simulation experiment.  相似文献   

Grouped data are frequently used in several fields of study. In this work, we use the expectation-maximization (EM) algorithm for fitting the skew-normal (SN) mixture model to the grouped data. Implementing the EM algorithm requires computing the one-dimensional integrals for each group or class. Our simulation study and real data analyses reveal that the EM algorithm not only always converges but also can be implemented in just a few seconds even when the number of components is large, contrary to the Bayesian paradigm that is computationally expensive. The accuracy of the EM algorithm and superiority of the SN mixture model over the traditional normal mixture model in modelling grouped data are demonstrated through the simulation and three real data illustrations. For implementing the EM algorithm, we use the package called ForestFit developed for R environment available at https://cran.r-project.org/web/packages/ForestFit/index.html.  相似文献   

Various mixed models were developed to capture the features of between- and within-individual variation for longitudinal data under the normality assumption of the random effect and the within-individual random error. However, the normality assumption may be violated in some applications. To this end, this article assumes that the random effect follows a skew-normal distribution and the within-individual error is distributed as a reproductive dispersion model. An expectation conditional maximization (ECME) algorithm together with the Metropolis-Hastings (MH) algorithm within the Gibbs sampler is presented to simultaneously obtain estimates of parameters and random effects. Several diagnostic measures are developed to identify the potentially influential cases and assess the effect of minor perturbation to model assumptions via the case-deletion method and local influence analysis. To reduce the computational burden, we derive the first-order approximations to case-deletion diagnostics. Several simulation studies and a real data example are presented to illustrate the newly developed methodologies.  相似文献   

The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. There are various methods available in the literature for detection of outliers in multivariate data [V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, 1994] using the Mahalanobis distance measure. An attempt is made to propose an alternate method of outlier detection based on the comedian introduced by Falk [On MAD and Comedians, Ann. Inst. Statist. Math. 49 (1997), pp. 615–644]. The proposed method is computationally efficient with high breakdown value and low computation time. Further, important properties, namely, success rates (SR) and false detection rates (FDR) are studied and compared with some of the well-known outlier detection methods through a simulation study. The Comedian method has high SR and low FDR for all combination of parameters. On removal of the detected outliers or down weighing, the same, highly robust and approximately affine equivariant estimators of multivariate location and scatter can be obtained. Finally, the method is applied to well-known real data sets to evaluate its performance.  相似文献   

The distribution of weighted function of independent skew-normal random variables, which includes the sample mean, is useful in many applications. In this paper, we derive this distribution and study the null distribution of a linear form and a quadratic form. Finally, we discuss some of its applications in control charts, in which the skew-normal model plays a key role.  相似文献   

Following the paper by Genton and Loperfido [Generalized skew-elliptical distributions and their quadratic forms, Ann. Inst. Statist. Math. 57 (2005), pp. 389–401], we say that Z has a generalized skew-normal distribution, if its probability density function (p.d.f.) is given by f(z)=2φ p (z; ξ, Ω)π (z?ξ), z∈? p , where φ p (·; ξ, Ω) is the p-dimensional normal p.d.f. with location vector ξ and scale matrix Ω, ξ∈? p , Ω>0, and π is a skewing function from ? p to ?, that is 0≤π (z)≤1 and π (?z)=1?π (z), ? z∈? p . First the distribution of linear transformations of Z are studied, and some moments of Z and its quadratic forms are derived. Next we obtain the joint moment-generating functions (m.g.f.’s) of linear and quadratic forms of Z and then investigate conditions for their independence. Finally explicit forms for the above distributions, m.g.f.’s and moments are derived when π (z)=κ (αz), where α∈? p and κ is the normal, Laplace, logistic or uniform distribution function.  相似文献   


We introduce here the truncated version of the unified skew-normal (SUN) distributions. By considering a special truncations for both univariate and multivariate cases, we derive the joint distribution of consecutive order statistics X(r, ..., r + k) = (X(r), ..., X(r + K))T from an exchangeable n-dimensional normal random vector X. Further we show that the conditional distributions of X(r + j, ..., r + k) given X(r, ..., r + j ? 1), X(r, ..., r + k) given (X(r) > t)?and X(r, ..., r + k) given (X(r + k) < t) are special types of singular SUN distributions. We use these results to determine some measures in the reliability theory such as the mean past life (MPL) function and mean residual life (MRL) function.  相似文献   


This paper studies decision theoretic properties of Stein type shrinkage estimators in simultaneous estimation of location parameters in a multivariate skew-normal distribution with known skewness parameters under a quadratic loss. The benchmark estimator is the best location equivariant estimator which is minimax. A class of shrinkage estimators improving on the best location equivariant estimator is constructed when the dimension of the location parameters is larger than or equal to four. An empirical Bayes estimator is also derived, and motivated from the Bayesian procedure, we suggest a simple skew-adjusted shrinkage estimator and show its dominance property. The performances of these estimators are investigated by simulation.  相似文献   

Traditional statistical modeling of continuous outcome variables relies heavily on the assumption of a normal distribution. However, in some applications, such as analysis of microRNA (miRNA) data, normality may not hold. Skewed distributions play an important role in such studies and might lead to robust results in the presence of extreme outliers. We apply a skew-normal (SN) distribution, which is indexed by three parameters (location, scale and shape), in the context of miRNA studies. We developed a test statistic for comparing means of two conditions replacing the normal assumption with SN distribution. We compared the performance of the statistic with other Wald-type statistics through simulations. Two real miRNA datasets are analyzed to illustrate the methods. Our simulation findings showed that the use of a SN distribution can result in improved identification of differentially expressed miRNAs, especially with markedly skewed data and when the two groups have different variances. It also appeared that the statistic with SN assumption performs comparably with other Wald-type statistics irrespective of the sample size or distribution. Moreover, the real dataset analyses suggest that the statistic with SN assumption can be used effectively for identification of important miRNAs. Overall, the statistic with SN distribution is useful when data are asymmetric and when the samples have different variances for the two groups.  相似文献   

The existing studies on spatial dynamic panel data model (SDPDM) mainly focus on the normality assumption of response variables and random effects. This assumption may be inappropriate in some applications. This paper proposes a new SDPDM by assuming that response variables and random effects follow the multivariate skew-normal distribution. A Markov chain Monte Carlo algorithm is developed to evaluate Bayesian estimates of unknown parameters and random effects in skew-normal SDPDM by combining the Gibbs sampler and the Metropolis–Hastings algorithm. A Bayesian local influence analysis method is developed to simultaneously assess the effect of minor perturbations to the data, priors and sampling distributions. Simulation studies are conducted to investigate the finite-sample performance of the proposed methodologies. An example is illustrated by the proposed methodologies.  相似文献   

This paper explores the usefulness of the multivariate skew-normal distribution in the context of graphical models. A slight extension of the family recently discussed by Azzalini & Dalla Valle (1996 ) and Azzalini & Capitanio (1999 ) is described, the main motivation being the additional property of closure under conditioning. After considerations of the main probabilistic features, the focus of the paper is on the construction of conditional independence graphs for skew-normal variables. Necessary and sufficient conditions for conditional independence are stated, and the admissible structures of a graph under restriction on univariate marginal distribution are studied. Finally, parameter estimation is considered. It is shown how the factorization of the likelihood function according to a graph can be rearranged in order to obtain a parameter based factorization.  相似文献   

Modern methods for detecting changes in the scale or covariance of multivariate distributions rely primarily on testing for the constancy of the covariance matrix. These depend on higher-order moment conditions, and also do not work well when the dimension of the data is large or even moderate relative to the sample size. In this paper, we propose a nonparametric change point test for multivariate data using rankings obtained from data depth measures. As the data depth of an observation measures its centrality relative to the sample, changes in data depth may signify a change of scale of the underlying distribution, and the proposed test is particularly responsive to detecting such changes. We provide a full asymptotic theory for the proposed test statistic under the null hypothesis that the observations are stable, and natural conditions under which the test is consistent. The finite sample properties are investigated by means of a Monte Carlo simulation, and these along with the theoretical results confirm that the test is robust to heavy tails, skewness and high dimensionality. The proposed methods are demonstrated with an application to structural break detection in the rate of change of pollutants linked to acid rain measured in Turkey lake, a lake in central Ontario, Canada. Our test suggests a change in the rate of acid rain in the late 1980s/early 1990s, which coincides with clean air legislation in Canada and the US. The Canadian Journal of Statistics 48: 417–446; 2020 © 2020 Statistical Society of Canada  相似文献   

Missing values are common in longitudinal data studies. The missing data mechanism is termed non-ignorable (NI) if the probability of missingness depends on the non-response (missing) observations. This paper presents a model for the ordinal categorical longitudinal data with NI non-monotone missing values. We assumed two separate models for the response and missing procedure. The response is modeled as ordinal logistic, whereas the logistic binary model is considered for the missing process. We employ these models in the context of so-called shared-parameter models, where the outcome and missing data models are connected by a common set of random effects. It is commonly assumed that the random effect follows the normal distribution in longitudinal data with or without missing data. This can be extremely restrictive in practice, and it may result in misleading statistical inferences. In this paper, we instead adopt a more flexible alternative distribution which is called the skew-normal distribution. The methodology is illustrated through an application to Schizophrenia Collaborative Study data [19 D. Hedeker, Generalized linear mixed models, in Encyclopedia of Statistics in Behavioral Science, B. Everitt and D. Howell, eds., John Wiley, London, 2005, pp. 729738. [Google Scholar]] and a simulation.  相似文献   

Linear mixed models have been widely used to analyze repeated measures data which arise in many studies. In most applications, it is assumed that both the random effects and the within-subjects errors are normally distributed. This can be extremely restrictive, obscuring important features of within-and among-subject variations. Here, quantile regression in the Bayesian framework for the linear mixed models is described to carry out the robust inferences. We also relax the normality assumption for the random effects by using a multivariate skew-normal distribution, which includes the normal ones as a special case and provides robust estimation in the linear mixed models. For posterior inference, we propose a Gibbs sampling algorithm based on a mixture representation of the asymmetric Laplace distribution and multivariate skew-normal distribution. The procedures are demonstrated by both simulated and real data examples.  相似文献   

It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

In practice, a financial or actuarial data set may be a skewed or heavy-tailed and this motivates us to study a class of distribution functions in risk management theory that provide more information about these characteristics resulting in a more accurate risk analysis. In this paper, we consider a multivariate tail conditional expectation (MTCE) for multivariate scale mixtures of skew-normal (SMSN) distributions. This class of distributions contains skewed distributions and some members of this class can be used to analyse heavy-tailed data sets. We also provide a closed form for TCE in a univariate skew-normal distribution framework. Numerical examples are also provided for illustration.  相似文献   

Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353–365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R.B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362–1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis–Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271–275]. Our algorithm has only one Metropolis–Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146–178; R.J. Patz and B.W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342–366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599–607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3 Azevedo, C. L.N., Bolfarine, H. and Andrade, D. F. 2011. Bayesian inference for a skew-normal IRT model under the centred parameterization. Comput. Stat. Data Anal., 55: 353365. [Crossref], [Web of Science ®] [Google Scholar]]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3 Azevedo, C. L.N., Bolfarine, H. and Andrade, D. F. 2011. Bayesian inference for a skew-normal IRT model under the centred parameterization. Comput. Stat. Data Anal., 55: 353365. [Crossref], [Web of Science ®] [Google Scholar]], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.  相似文献   

An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.  相似文献   

Liseo and Loperfido [A note on reference priors for the scalar skew-normal distribution. J Statist Plann Inference. 2006;136(2):373–389] studied some peculiar features of default Bayes analysis of the scalar skew-normal model. In particular, they showed that, by considering the simplest model with a single unknown parameter λ of skewness, the reference – or Jeffreys’ – prior for this parameter is proper. They proved that tails of Jeffreys’ prior are of order O?3/2). But they made a mistake in their proof. In this note, we will modify their proof.  相似文献   

