首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

2.
A new statistic, (p), is developed for variable selection in a system-of-equations model. The standardized total mean square error in the (p)statistic is weighted by the covariance matrix of dependent variables instead of the error covariance matrix of the true model as in the original definition. The new statistic can be also used for model selection in the non-nested models. The estimate of (p), SC(p), is derived and shown to become SCε(p) in the similar form of Cp in a single-equation model when the covariance matrix of sampled dependent variables is replaced by the error covariance matrix under the full model.  相似文献   

3.
A robust estimator is developed for Poisson mixture models with a known number of components. The proposed estimator minimizes the L2 distance between a sample of data and the model. When the component distributions are completely known, the estimators for the mixing proportions are in closed form. When the parameters for the component Poisson distributions are unknown, numerical methods are needed to calculate the estimators. Compared to the minimum Hellinger distance estimator, the minimum L2 estimator can be less robust to extreme outliers, and often more robust to moderate outliers.  相似文献   

4.
A number of efficient computer codes are available for the simple linear L 1 regression problem. However, a number of these codes can be made more efficient by utilizing the least squares solution. In fact, a couple of available computer programs already do so.

We report the results of a computational study comparing several openly available computer programs for solving the simple linear L 1 regression problem with and without computing and utilizing a least squares solution.  相似文献   

5.
Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L 1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized log-likelihood yielding models with reduced complexity. In contrast to common procedures it can be used in high-dimensional settings where a large number of potentially influential explanatory variables is available. The method is investigated in simulation studies and illustrated by use of real data sets.  相似文献   

6.
We propose penalized-likelihood methods for parameter estimation of high dimensional t distribution. First, we show that a general class of commonly used shrinkage covariance matrix estimators for multivariate normal can be obtained as penalized-likelihood estimator with a penalty that is proportional to the entropy loss between the estimate and an appropriately chosen shrinkage target. Motivated by this fact, we then consider applying this penalty to multivariate t distribution. The penalized estimate can be computed efficiently using EM algorithm for given tuning parameters. It can also be viewed as an empirical Bayes estimator. Taking advantage of its Bayesian interpretation, we propose a variant of the method of moments to effectively elicit the tuning parameters. Simulations and real data analysis demonstrate the competitive performance of the new methods.  相似文献   

7.
Bien and Tibshirani (Biometrika, 98(4):807–820, 2011) have proposed a covariance graphical lasso method that applies a lasso penalty on the elements of the covariance matrix. This method is definitely useful because it not only produces sparse and positive definite estimates of the covariance matrix but also discovers marginal independence structures by generating exact zeros in the estimated covariance matrix. However, the objective function is not convex, making the optimization challenging. Bien and Tibshirani (Biometrika, 98(4):807–820, 2011) described a majorize-minimize approach to optimize it. We develop a new optimization method based on coordinate descent. We discuss the convergence property of the algorithm. Through simulation experiments, we show that the new algorithm has a number of advantages over the majorize-minimize approach, including its simplicity, computing speed and numerical stability. Finally, we show that the cyclic version of the coordinate descent algorithm is more efficient than the greedy version.  相似文献   

8.
We propose an 1-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data.  相似文献   

9.
In high-dimensional setting, componentwise L2boosting has been used to construct sparse model that performs well, but it tends to select many ineffective variables. Several sparse boosting methods, such as, SparseL2Boosting and Twin Boosting, have been proposed to improve the variable selection of L2boosting algorithm. In this article, we propose a new general sparse boosting method (GSBoosting). The relations are established between GSBoosting and other well known regularized variable selection methods in the orthogonal linear model, such as adaptive Lasso, hard thresholds, etc. Simulation results show that GSBoosting has good performance in both prediction and variable selection.  相似文献   

10.
ABSTRACT

Incremental modelling of data streams is of great practical importance, as shown by its applications in advertising and financial data analysis. We propose two incremental covariance matrix decomposition methods for a compositional data type. The first method, exact incremental covariance decomposition of compositional data (C-EICD), gives an exact decomposition result. The second method, covariance-free incremental covariance decomposition of compositional data (C-CICD), is an approximate algorithm that can efficiently compute high-dimensional cases. Based on these two methods, many frequently used compositional statistical models can be incrementally calculated. We take multiple linear regression and principle component analysis as examples to illustrate the utility of the proposed methods via extensive simulation studies.  相似文献   

11.
To summarize a set of data by a distribution function in Johnson's translation system, we use a least-squares approach to parameter estimation wherein we seek to minimize the distance between the vector of "uniformized" oeder statistics and the corresponding vector of expected values. We use the software package FITTRI to apply this technique to three problems arising respectively in medicine, applied statistics, and civil engineering. Compared to traditional methods of distribution fitting based on moment matching, percentile matchingL 1 estimation, and L ? estimation, the least-squares technique is seen to yield fits of similar accuracy and to converge more rapidly and reliably to a set of acceptable parametre estimates.  相似文献   

12.
We propose a new adaptive L1 penalized quantile regression estimator for high-dimensional sparse regression models with heterogeneous error sequences. We show that under weaker conditions compared with alternative procedures, the adaptive L1 quantile regression selects the true underlying model with probability converging to one, and the unique estimates of nonzero coefficients it provides have the same asymptotic normal distribution as the quantile estimator which uses only the covariates with non-zero impact on the response. Thus, the adaptive L1 quantile regression enjoys oracle properties. We propose a completely data driven choice of the penalty level λnλn, which ensures good performance of the adaptive L1 quantile regression. Extensive Monte Carlo simulation studies have been conducted to demonstrate the finite sample performance of the proposed method.  相似文献   

13.
The fused lasso penalizes a loss function by the L1 norm for both the regression coefficients and their successive differences to encourage sparsity of both. In this paper, we propose a Bayesian generalized fused lasso modeling based on a normal-exponential-gamma (NEG) prior distribution. The NEG prior is assumed into the difference of successive regression coefficients. The proposed method enables us to construct a more versatile sparse model than the ordinary fused lasso using a flexible regularization term. Simulation studies and real data analyses show that the proposed method has superior performance to the ordinary fused lasso.  相似文献   

14.
The hypothesis testing and confidence region are considered for the common mean vector of several multivariate normal populations when the covariance matrices are unknown and possibly unequal. A generalized confidence region is derived using the concepts of generalized method based on the generalized pp-value. The generalized confidence region is illustrated with two numerical examples. The merits of the proposed method are numerically compared with those of existing methods with respect to their expected area or expected d-dimensional volumes and coverage probabilities under different scenarios.  相似文献   

15.
The graphical lasso has now become a useful tool to estimate high-dimensional Gaussian graphical models, but its practical applications suffer from the problem of choosing regularization parameters in a data-dependent way. In this article, we propose a model-averaged method for estimating sparse inverse covariance matrices for Gaussian graphical models. We consider the graphical lasso regularization path as the model space for Bayesian model averaging and use Markov chain Monte Carlo techniques for the regularization path point selection. Numerical performance of our method is investigated using both simulated and real datasets, in comparison with some state-of-art model selection procedures.  相似文献   

16.
L2Boosting is an effective method for constructing model. In the case of high-dimensional setting, Bühlmann and Yu (2003 Bühlmann, P., Yu, B. (2003). Boosting with the L2-loss: regression and classification. J. Amer. Stat. Assoc. 98:324339.[Taylor &; Francis Online], [Web of Science ®] [Google Scholar]) proposed the componentwise L2Boosting, but componentwise L2Boosting can only fit a special limited model. In this paper, by combining a boosting and sufficient dimension reduction method, e.g., sliced inverse regression (SIR), we propose a new method for regression, called dimension reduction boosting (DRBoosting). Compared with L2Boosting, the computation of DRBoosting is less intensive and its prediction is better, especially for high-dimensional data. Simulations confirm the advantage of the new method.  相似文献   

17.
Estimating multivariate location and scatter with both affine equivariance and positive breakdown has always been difficult. A well-known estimator which satisfies both properties is the Minimum Volume Ellipsoid Estimator (MVE). Computing the exact MVE is often not feasible, so one usually resorts to an approximate algorithm. In the regression setup, algorithms for positive-breakdown estimators like Least Median of Squares typically recompute the intercept at each step, to improve the result. This approach is called intercept adjustment. In this paper we show that a similar technique, called location adjustment, can be applied to the MVE. For this purpose we use the Minimum Volume Ball (MVB), in order to lower the MVE objective function. An exact algorithm for calculating the MVB is presented. As an alternative to MVB location adjustment we propose L 1 location adjustment, which does not necessarily lower the MVE objective function but yields more efficient estimates for the location part. Simulations compare the two types of location adjustment. We also obtain the maxbias curves of L 1 and the MVB in the multivariate setting, revealing the superiority of L 1.  相似文献   

18.
《统计学通讯:理论与方法》2012,41(13-14):2465-2489
The Akaike information criterion, AIC, and Mallows’ C p statistic have been proposed for selecting a smaller number of regressors in the multivariate regression models with fully unknown covariance matrix. All of these criteria are, however, based on the implicit assumption that the sample size is substantially larger than the dimension of the covariance matrix. To obtain a stable estimator of the covariance matrix, it is required that the dimension of the covariance matrix is much smaller than the sample size. When the dimension is close to the sample size, it is necessary to use ridge-type estimators for the covariance matrix. In this article, we use a ridge-type estimators for the covariance matrix and obtain the modified AIC and modified C p statistic under the asymptotic theory that both the sample size and the dimension go to infinity. It is numerically shown that these modified procedures perform very well in the sense of selecting the true model in large dimensional cases.  相似文献   

19.
The problem of modeling the relationship between a set of covariates and a multivariate response with correlated components often arises in many areas of research such as genetics, psychometrics, signal processing. In the linear regression framework, such task can be addressed using a number of existing methods. In the high-dimensional sparse setting, most of these methods rely on the idea of penalization in order to efficiently estimate the regression matrix. Examples of such methods include the lasso, the group lasso, the adaptive group lasso or the simultaneous variable selection (SVS) method. Crucially, a suitably chosen penalty also allows for an efficient exploitation of the correlation structure within the multivariate response. In this paper we introduce a novel variant of such method called the adaptive SVS, which is closely linked with the adaptive group lasso. Via a simulation study we investigate its performance in the high-dimensional sparse regression setting. We provide a comparison with a number of other popular methods under different scenarios and show that the adaptive SVS is a powerful tool for efficient recovery of signal in such setting. The methods are applied to genetic data.  相似文献   

20.
Bayesian model averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining ?1 regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the ?1 regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional data sets. We apply our technique in simulations, as well as to some applications that arise in genomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号