首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Fitting multiplicative models by robust alternating regressions   总被引:1,自引:0,他引:1  
In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from the residual plot. Also provided is an accompanying robust R 2-plot to determine the appropriate number of factors. The approach is illustrated by real and artificial examples and compared with factor analysis based on robust covariance matrix estimators. The same estimation technique can fit models with both additive and multiplicative effects (FANOVA models) to two-way tables, thereby extending the median polish technique.  相似文献   

2.
Huber (1964) found the minimax-variance M-estimate of location under the assumption that the scale parameter is known; Li and Zamar (1991) extended this result to the case when the scale is unknown. We consider the robust estimation of the regression coefficients (β1,…,βp) when the scale and the intercept parameters are unknown. The minimax-variance estimates of (β1,…,βp) with respect to the trace of their asymptotic covariance matrix are derived. The maximum is taken over ?-contamination neighbourhoods of a central regression model with Gaussian errors (asymmetric contamination is allowed), and the minimum is taken over a large class of generalized M-estimates of regression of the Mallow type. The optimal choice of estimates for the nuisance parameters (scale and intercept) is also considered.  相似文献   

3.
Several methods have been suggested to calculate robust M- and G-M -estimators of the regression parameter β and of the error scale parameter σ in a linear model. This paper shows that, for some data sets well known in robust statistics, the nonlinear systems of equations for the simultaneous estimation of β, with an M-estimate with a redescending ψ-function, and σ, with the residual median absolute deviation (MAD), have many solutions. This multiplicity is not caused by the possible lack of uniqueness, for redescending ψ-functions, of the solutions of the system defining β with known σ; rather, the simultaneous estimation of β and σ together creates the problem. A way to avoid these multiple solutions is to proceed in two steps. First take σ as the median absolute deviation of the residuals for a uniquely defined robust M-estimate such as Huber's Proposal 2 or the L1-estimate. Then solve the nonlinear system for the M-estimate with σ equal to the value obtained at the first step to get the estimate of β. Analytical conditions for the uniqueness of M and G-M-estimates are also given.  相似文献   

4.
We consider the asymptotic behaviour of least-squares and M-estimates of the autoregressive parameter when the process is an infinite-variance random walk. It is shown that certain M -estimates converge faster than least-squares estimates and that they are also asymptotically normal.  相似文献   

5.
Biplots are useful tools to explore the relationship among variables. In this paper, the specific regression relationship between a set of predictors X and set of response variables Y by means of partial least-squares (PLS) regression is represented. The PLS biplot provides a single graphical representation of the samples together with the predictor and response variables, as well as their interrelationships in terms of the matrix of regression coefficients.  相似文献   

6.
In multiple linear regression analysis each lower-dimensional subspace L of a known linear subspace M of ? n corresponds to a non empty subset of the columns of the regressor matrix. For a fixed subspace L, the C p statistic is an unbiased estimator of the mean square error if the projection of the response vector onto L is used to estimate the expected response. In this article, we consider two truncated versions of the C p statistic that can also be used to estimate this mean square error. The C p statistic and its truncated versions are compared in two example data sets, illustrating that use of the truncated versions may result in models different from those selected by standard C p .  相似文献   

7.
Biplots represent a widely used statistical tool for visualizing the resulting loadings and scores of a dimension reduction technique applied to multivariate data. If the underlying data carry only relative information (i.e. compositional data expressed in proportions, mg/kg, etc.) they have to be pre-processed with a logratio transformation before the dimension reduction is carried out. In the context of principal component analysis, the resulting biplot is called compositional biplot. We introduce an alternative, the ilr biplot, which is based on a special choice of orthonormal coordinates resulting from an isometric logratio (ilr) transformation. This allows to incorporate also external non-compositional variables, and to study the relations to the compositional variables. The methodology is demonstrated on real data sets.  相似文献   

8.
In this paper, we consider the asymptotic distributions of functionals of the sample covariance matrix and the sample mean vector obtained under the assumption that the matrix of observations has a matrix‐variate location mixture of normal distributions. The central limit theorem is derived for the product of the sample covariance matrix and the sample mean vector. Moreover, we consider the product of the inverse sample covariance matrix and the mean vector for which the central limit theorem is established as well. All results are obtained under the large‐dimensional asymptotic regime, where the dimension p and the sample size n approach infinity such that p/nc ∈ [0, + ) when the sample covariance matrix does not need to be invertible and p/nc ∈ [0,1) otherwise.  相似文献   

9.
Recently, several new robust multivariate estimators of location and scatter have been proposed that provide new and improved methods for detecting multivariate outliers. But for small sample sizes, there are no results on how these new multivariate outlier detection techniques compare in terms of p n , their outside rate per observation (the expected proportion of points declared outliers) under normality. And there are no results comparing their ability to detect truly unusual points based on the model that generated the data. Moreover, there are no results comparing these methods to two fairly new techniques that do not rely on some robust covariance matrix. It is found that for an approach based on the orthogonal Gnanadesikan–Kettenring estimator, p n can be very unsatisfactory with small sample sizes, but a simple modification gives much more satisfactory results. Similar problems were found when using the median ball algorithm, but a modification proved to be unsatisfactory. The translated-biweights (TBS) estimator generally performs well with a sample size of n≥20 and when dealing with p-variate data where p≤5. But with p=8 it can be unsatisfactory, even with n=200. A projection method as well the minimum generalized variance method generally perform best, but with p≤5 conditions where the TBS method is preferable are described. In terms of detecting truly unusual points, the methods can differ substantially depending on where the outliers happen to be, the number of outliers present, and the correlations among the variables.  相似文献   

10.
Although the t-type estimator is a kind of M-estimator with scale optimization, it has some advantages over the M-estimator. In this article, we first propose a t-type joint generalized linear model as a robust extension to the classical joint generalized linear models for modeling data containing extreme or outlying observations. Next, we develop a t-type pseudo-likelihood (TPL) approach, which can be viewed as a robust version to the existing pseudo-likelihood (PL) approach. To determine which variables significantly affect the variance of the response variable, we then propose a unified penalized maximum TPL method to simultaneously select significant variables for the mean and dispersion models in t-type joint generalized linear models. Thus, the proposed variable selection method can simultaneously perform parameter estimation and variable selection in the mean and dispersion models. With appropriate selection of the tuning parameters, we establish the consistency and the oracle property of the regularized estimators. Simulation studies are conducted to illustrate the proposed methods.  相似文献   

11.
At the core of multivariate statistics is the investigation of relationships between different sets of variables. More precisely, the inter-variable relationships and the causal relationships. The latter is a regression problem, where one set of variables is referred to as the response variables and the other set of variables as the predictor variables. In this situation, the effect of the predictors on the response variables is revealed through the regression coefficients. Results from the resulting regression analysis can be viewed graphically using the biplot. The consequential biplot provides a single graphical representation of the samples together with the predictor variables and response variables. In addition, their effect in terms of the regression coefficients can be visualized, although sub-optimally, in the said biplot.KEYWORDS: Biplot, regression analysis, multivariate regression, rank approximation  相似文献   

12.
Consider using values of variables X 1, X 2,…, X p to classify entities into one of two classes. Kernel-based procedures such as support vector machines (SVMs) are well suited for this task. In general, the classification accuracy of SVMs can be substantially improved if instead of all p candidate variables, a smaller subset of (say m) variables is used. A new two-step approach to variable selection for SVMs is therefore proposed: best variable subsets of size k = 1,2,…, p are first identified, and then a new data-dependent criterion is used to determine a value for m. The new approach is evaluated in a Monte Carlo simulation study, and on a sample of data sets.  相似文献   

13.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

14.
Approximate confidence intervals are given for the lognormal regression problem. The error in the nominal level can be reduced to O(n ?2), where n is the sample size. An alternative procedure is given which avoids the non-robust assumption of lognormality. This amounts to finding a confidence interval based on M-estimates for a general smooth function of both ? and F, where ? are the parameters of the general (possibly nonlinear) regression problem and F is the unknown distribution function of the residuals. The derived intervals are compared using theory, simulation and real data sets.  相似文献   

15.
We address the problem of robust inference about the stress–strength reliability parameter R = P(X < Y), where X and Y are taken to be independent random variables. Indeed, although classical likelihood based procedures for inference on R are available, it is well-known that they can be badly affected by mild departures from model assumptions, regarding both stress and strength data. The proposed robust method relies on the theory of bounded influence M-estimators. We obtain large-sample test statistics with the standard asymptotic distribution by means of delta-method asymptotics. The finite sample behavior of these tests is investigated by some numerical studies, when both X and Y are independent exponential or normal random variables. An illustrative application in a regression setting is also discussed.  相似文献   

16.
Numerical approaches to developing accurate and efficient approximations to combined likelihoods of population correlation matrices in meta-analysis under normality assumptions for the data are studied. The likelihood is expressed as a multiple integral over the unit cube in (p ? 1)-dimensional space, where p is the row and column dimensionality of the correlation matrix. Three types of computation are proposed as ways to calculate the likelihood for any population correlation matrix P. As an application, inference is explored concerning intercorrelations among math, spatial and verbal scores in a SAT exam. Comparisons are made with conventional methods.  相似文献   

17.
Under non-normality, this article is concerned with testing diagonality of high-dimensional covariance matrix, which is more practical than testing sphericity and identity in high-dimensional setting. The existing testing procedure for diagonality is not robust against either the data dimension or the data distribution, producing tests with distorted type I error rates much larger than nominal levels. This is mainly due to bias from estimating some functions of high-dimensional covariance matrix under non-normality. Compared to the sphericity and identity hypotheses, the asymptotic property of the diagonality hypothesis would be more involved and we should be more careful to deal with bias. We develop a correction that makes the existing test statistic robust against both the data dimension and the data distribution. We show that the proposed test statistic is asymptotically normal without the normality assumption and without specifying an explicit relationship between the dimension p and the sample size n. Simulations show that it has good size and power for a wide range of settings.  相似文献   

18.
The mean vector associated with several independent variates from the exponential subclass of Hudson (1978) is estimated under weighted squared error loss. In particular, the formal Bayes and “Stein-like” estimators of the mean vector are given. Conditions are also given under which these estimators dominate any of the “natural estimators”. Our conditions for dominance are motivated by a result of Stein (1981), who treated the Np (θ, I) case with p ≥ 3. Stein showed that formal Bayes estimators dominate the usual estimator if the marginal density of the data is superharmonic. Our present exponential class generalization entails an elliptic differential inequality in some natural variables. Actually, we assume that each component of the data vector has a probability density function which satisfies a certain differential equation. While the densities of Hudson (1978) are particular solutions of this equation, other solutions are not of the exponential class if certain parameters are unknown. Our approach allows for the possibility of extending the parametric Stein-theory to useful nonexponential cases, but the problem of nuisance parameters is not treated here.  相似文献   

19.
Consider the canonical-form MANOVA setup with X: n × p = (+ E, Xi ni × p, i = 1, 2, 3, Mi: ni × p, i = 1, 2, n1 + n2 + n3) p, where E is a normally distributed error matrix with mean zero and dispersion In (> 0 (positive definite). Assume (in contrast with the usual case) that M1i is normal with mean zero and dispersion In1) and M22 is either fixed or random normal with mean zero and different dispersion matrix In2 (being unknown. It is also assumed that M1 E, and M2 (if random) are all independent. For testing H0) = 0 versus H1: (> 0, it is shown that when either n2 = 0 or M2 is fixed if n2 > 0, the trace test of Pillai (1955) is uniformly most powerful invariant (UMPI) if min(n1, p)= 1 and locally best invariant (LBI) if min(n1 p) > 1 underthe action of the full linear group Gl (p). When p > 1, the LBI test is also derived under a somewhat smaller group GT(p) of p × p lower triangular matrices with positive diagonal elements. However, such results do not hold if n2 > 0 and M2 is random. The null, nonnull, and optimality robustness of Pillai's trace test under Gl(p) for suitable deviations from normality is pointed out.  相似文献   

20.
The robust M-estimators for the partly linear model under stochastic adapted errors are considered. It is shown that the M-estimator of parameter is asymptotically normal and the M-estimator of the nonparametric function achieves the optimal rate of convergence for nonparametric regression. Some known results are improved and generalized. Some simulations and a real data example are conducted to illustrate the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号