首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 30 毫秒
1.

Among the most well known estimators of multivariate location and scatter is the Minimum Volume Ellipsoid (MVE). Many algorithms have been proposed to compute it. Most of these attempt merely to approximate as close as possible the exact MVE, but some of them led to the definition of new estimators which maintain the properties of robustness and affine equivariance that make the MVE so attractive. Rousseeuw and van Zomeren (1990) used the <$>(p+1)<$>- subset estimator which was modified by Croux and Haesbroeck (1997) to give rise to the averaged <$>(p+1)<$>- subset estimator . This note shows by means of simulations that the averaged <$>(p+1)<$>-subset estimator outperforms the exact estimator as far as finite-sample efficiency is concerned. We also present a new robust estimator for the MVE, closely related to the averaged <$>(p+1)<$>-subset estimator, but yielding a natural ranking of the data.  相似文献   

2.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

3.
To summarize a set of data by a distribution function in Johnson's translation system, we use a least-squares approach to parameter estimation wherein we seek to minimize the distance between the vector of "uniformized" oeder statistics and the corresponding vector of expected values. We use the software package FITTRI to apply this technique to three problems arising respectively in medicine, applied statistics, and civil engineering. Compared to traditional methods of distribution fitting based on moment matching, percentile matchingL 1 estimation, and L ? estimation, the least-squares technique is seen to yield fits of similar accuracy and to converge more rapidly and reliably to a set of acceptable parametre estimates.  相似文献   

4.
We consider here a generalization of the skew-normal distribution, GSN(λ1,λ2,ρ), defined through a standard bivariate normal distribution with correlation ρ, which is a special case of the unified multivariate skew-normal distribution studied recently by Arellano-Valle and Azzalini [2006. On the unification of families of skew-normal distributions. Scand. J. Statist. 33, 561–574]. We then present some simple and useful properties of this distribution and also derive its moment generating function in an explicit form. Next, we show that distributions of order statistics from the trivariate normal distribution are mixtures of these generalized skew-normal distributions; thence, using the established properties of the generalized skew-normal distribution, we derive the moment generating functions of order statistics, and also present expressions for means and variances of these order statistics.Next, we introduce a generalized skew-tν distribution, which is a special case of the unified multivariate skew-elliptical distribution presented by Arellano-Valle and Azzalini [2006. On the unification of families of skew-normal distributions. Scand. J. Statist. 33, 561–574] and is in fact a three-parameter generalization of Azzalini and Capitanio's [2003. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. J. Roy. Statist. Soc. Ser. B 65, 367–389] univariate skew-tν form. We then use the relationship between the generalized skew-normal and skew-tν distributions to discuss some properties of generalized skew-tν as well as distributions of order statistics from bivariate and trivariate tν distributions. We show that these distributions of order statistics are indeed mixtures of generalized skew-tν distributions, and then use this property to derive explicit expressions for means and variances of these order statistics.  相似文献   

5.
The resistance of least absolute values (L1) estimators to outliers and their robustness to heavy-tailed distributions make these estimators useful alternatives to the usual least squares estimators. The recent development of efficient algorithms for L1 estimation in linear models has permitted their use in practical data analysis. Although in general the L1 estimators are not unique, there are a number of properties they all share. The set of all L1 estimators for a given model and data set can be characterized as the convex hull of some extreme estimators. Properties of the extreme estimators and of the L1-estimate set are considered.  相似文献   

6.
By modifying the direct method to solve the overdetermined linear system we are able to present an algorithm for L1 estimation which appears to be superior computationally to any other known algorithm for the simple linear regression problem.  相似文献   

7.
In this paper, we discuss a parsimonious approach to estimation of high-dimensional covariance matrices via the modified Cholesky decomposition with lasso. Two different methods are proposed. They are the equi-angular and equi-sparse methods. We use simulation to compare the performance of the proposed methods with others available in the literature, including the sample covariance matrix, the banding method, and the L1-penalized normal loglikelihood method. We then apply the proposed methods to a portfolio selection problem using 80 series of daily stock returns. To facilitate the use of lasso in high-dimensional time series analysis, we develop the dynamic weighted lasso (DWL) algorithm that extends the LARS-lasso algorithm. In particular, the proposed algorithm can efficiently update the lasso solution as new data become available. It can also add or remove explanatory variables. The entire solution path of the L1-penalized normal loglikelihood method is also constructed.  相似文献   

8.
In healthcare studies, count data sets measured with covariates often exhibit heterogeneity and contain extreme values. To analyse such count data sets, we use a finite mixture of regression model framework and investigate a robust estimation approach, called the L2E [D.W. Scott, On fitting and adapting of density estimates, Comput. Sci. Stat. 30 (1998), pp. 124–133], to estimate the parameters. The L2E is based on an integrated L2 distance between parametric conditional and true conditional mass functions. In addition to studying the theoretical properties of the L2E estimator, we compare the performance of L2E with the maximum likelihood (ML) estimator and a minimum Hellinger distance (MHD) estimator via Monte Carlo simulations for correctly specified and gross-error contaminated mixture of Poisson regression models. These show that the L2E is a viable robust alternative to the ML and MHD estimators. More importantly, we use the L2E to perform a comprehensive analysis of a Western Australia hospital inpatient obstetrical length of stay (LOS) (in days) data that contains extreme values. It is shown that the L2E provides a two-component Poisson mixture regression fit to the LOS data which is better than those based on the ML and MHD estimators. The L2E fit identifies admission type as a significant covariate that profiles the predominant subpopulation of normal-stayers as planned patients and the small subpopulation of long-stayers as emergency patients.  相似文献   

9.
The nonlinear least squares algorithm of Gill and Murray (1978) is extended and modified to solve nonlinear L р-norm estimation problems efficiently. The new algorithm uses a mixture of 1st-order derivative (Guass-Newton) and 2nd-order derivative (Newton) search directions. A new rule for selecting the “grade” r of the p-jacobiab matrix Jp was also incorporated. This brought about rapid convergence of the algorithm on previously reported test examples.  相似文献   

10.
In bayesian inference, the Bayes estimator is the alternative with the minimum expected loss. In most cases, the loss function shows the distance between the alternative and the parameter. Therefore, any distance can lead to a loss function. Among the best known distance functions is L p one, where the choice of value p may be difficult and arbitrary. This paper examines robust models where the loss function is modelled by family L p . Our solution concept is the non-dominated alternative. We characterize the non-dominated set by having the posterior distribution function satisfy a particular asymmetry property. We also include an example to illustrate the methodology described.  相似文献   

11.
Benoît Cadre 《Statistics》2013,47(4):509-521
Let E be a separable Banach space, which is the dual of a Banach space F. If X is an E-valued random variable, the set of L1-medians of X is ArgminE[(d)]. Assume that this set contains only one element. From any sequence of probability measures {(d) 1} on E, which converges in law to X, we give two approximating sequences of the L1-median, for the weak* topology induced by F.  相似文献   

12.
We propose the L1 distance between the distribution of a binned data sample and a probability distribution from which it is hypothetically drawn as a statistic for testing agreement between the data and a model. We study the distribution of this distance for N-element samples drawn from k bins of equal probability and derive asymptotic formulae for the mean and dispersion of L1 in the large-N limit. We argue that the L1 distance is asymptotically normally distributed, with the mean and dispersion being accurately reproduced by asymptotic formulae even for moderately large values of N and k.  相似文献   

13.
A number of efficient computer codes are available for the simple linear L 1 regression problem. However, a number of these codes can be made more efficient by utilizing the least squares solution. In fact, a couple of available computer programs already do so.

We report the results of a computational study comparing several openly available computer programs for solving the simple linear L 1 regression problem with and without computing and utilizing a least squares solution.  相似文献   

14.
In multiple linear regression analysis each lower-dimensional subspace L of a known linear subspace M of ? n corresponds to a non empty subset of the columns of the regressor matrix. For a fixed subspace L, the C p statistic is an unbiased estimator of the mean square error if the projection of the response vector onto L is used to estimate the expected response. In this article, we consider two truncated versions of the C p statistic that can also be used to estimate this mean square error. The C p statistic and its truncated versions are compared in two example data sets, illustrating that use of the truncated versions may result in models different from those selected by standard C p .  相似文献   

15.
We developed robust estimators that minimize a weighted L1 norm for the first-order bifurcating autoregressive model. When all of the weights are fixed, our estimate is an L1 estimate that is robust against outlying points in the response space and more efficient than the least squares estimate for heavy-tailed error distributions. When the weights are random and depend on the points in the factor space, the weighted L1 estimate is robust against outlying points in the factor space. Simulated and artificial examples are presented. The behavior of the proposed estimate is modeled through a Monte Carlo study.  相似文献   

16.
Process capability indices (PCIs) have been widely used in manufacturing industries to previde a quantitative measure of process potential and performance. While some efforts have been dedicated in the literature to the statistical properties of PCIs estimators, scarce attention has been given to the evaluation of these properties when sample data are affected by measurement errors. In this work we deal with the problem of measurement errors effects on the performance of PCIs. The analysis is illustrated with reference toC p , i.e. the simplest and most common measure suggested to evaluate process capability. The authors would like to thank two anonymous referees for their comments and suggestion that were useful in the preparation and improvement of this paper. This work was partially supported by a MURST research grant.  相似文献   

17.
Summary: L p –norm weighted depth functions are introduced and the local and global robustness of these weighted L p –depth functions and their induced multivariate medians are investigated via influence function and finite sample breakdown point. To study the global robustness of depth functions, a notion of finite sample breakdown point is introduced. The weighted L p –depth functions turn out to have the same low breakdown point as some other popular depth functions. Their influence functions are also unbounded. On the other hand, the weighted L p –depth induced medians are globally robust with the highest possible breakdown point for any reasonable estimator. The weighted L p –medians are also locally robust with bounded influence functions for suitable weight functions. Unlike other existing depth functions and multivariate medians, the weighted L p depth and medians are easy to calculate in high dimensions. The price for this advantage is the lack of affine invariance and equivariance of the weighted L p depth and medians, respectively.*The author thanks the referees for their very insightful and constructive comments and suggestions which led to corrections and substantial improvements. Supported in part by NSF Grants DMS-0071976 and DMS-0134628.  相似文献   

18.
A novel method is proposed for choosing the tuning parameter associated with a family of robust estimators. It consists of minimising estimated mean squared error, an approach that requires pilot estimation of model parameters. The method is explored for the family of minimum distance estimators proposed by [Basu, A., Harris, I.R., Hjort, N.L. and Jones, M.C., 1998, Robust and efficient estimation by minimising a density power divergence. Biometrika, 85, 549–559.] Our preference in that context is for a version of the method using the L 2 distance estimator [Scott, D.W., 2001, Parametric statistical modeling by minimum integrated squared error. Technometrics, 43, 274–285.] as pilot estimator.  相似文献   

19.
We present an estimating framework for quantile regression where the usual L 1-norm objective function is replaced by its smooth parametric approximation. An exact path-following algorithm is derived, leading to the well-known ‘basic’ solutions interpolating exactly a number of observations equal to the number of parameters being estimated. We discuss briefly possible practical implications of the proposed approach, such as early stopping for large data sets, confidence intervals, and additional topics for future research.  相似文献   

20.
Let (X 1, X 2) be a bivariate L p -norm generalized symmetrized Dirichlet (LpGSD) random vector with parameters α12. If p12=2, then (X 1, X 2) is a spherical random vector. The estimation of the conditional distribution of Z u *:=X 2 | X 1>u for u large is of some interest in statistical applications. When (X 1, X 2) is a spherical random vector with associated random radius in the Gumbel max-domain of attraction, the distribution of Z u * can be approximated by a Gaussian distribution. Surprisingly, the same Gaussian approximation holds also for Z u :=X 2| X 1=u. In this paper, we are interested in conditional limit results in terms of convergence of the density functions considering a d-dimensional LpGSD random vector. Stating our results for the bivariate setup, we show that the density function of Z u * and Z u can be approximated by the density function of a Kotz type I LpGSD distribution, provided that the associated random radius has distribution function in the Gumbel max-domain of attraction. Further, we present two applications concerning the asymptotic behaviour of concomitants of order statistics of bivariate Dirichlet samples and the estimation of the conditional quantile function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号