期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Transductive versions of the LASSO and the Dantzig Selector

Pierre Alquier Mohamed Hebiri 《Journal of statistical planning and inference》2012

Transductive methods are useful in prediction problems when the training dataset is composed of a large number of unlabeled observations and a smaller number of labeled observations. In this paper, we propose an approach for developing transductive prediction procedures that are able to take advantage of the sparsity in the high dimensional linear regression. More precisely, we define transductive versions of the LASSO (Tibshirani, 1996) and the Dantzig Selector (Candès and Tao, 2007). These procedures combine labeled and unlabeled observations of the training dataset to produce a prediction for the unlabeled observations. We propose an experimental study of the transductive estimators that shows that they improve the LASSO and Dantzig Selector in many situations, and particularly in high dimensional problems when the predictors are correlated. We then provide non-asymptotic theoretical guarantees for these estimation methods. Interestingly, our theoretical results show that the Transductive LASSO and Dantzig Selector satisfy sparsity inequalities under weaker assumptions than those required for the “original” LASSO. 相似文献

2.

基于fused惩罚的稀疏主成分分析

张波刘晓倩《统计研究》2019,36(4):119-128

本文旨在研究基于fused惩罚的稀疏主成分分析方法，以适用于相邻变量之间高度相关甚至完全相等的数据情形。首先，从回归分析角度出发，提出一种求解稀疏主成分的简便思路，给出一种广义的稀疏主成分模型—— GSPCA模型及其求解算法，并证明在惩罚函数取1-范数时，该模型与现有的稀疏主成分模型——SPC模型的求解结果一致。其次，本文提出将fused惩罚与主成分分析相结合，得到一种fused稀疏主成分分析方法，并从惩罚性矩阵分解和回归分析两个角度，给出两种模型形式。在理论上证明了两种模型的求解结果是一致的，故将其统称为FSPCA模型。模拟实验显示，FSPCA模型在处理相邻变量之间高度相关甚至完全相等的数据集上的表现良好。最后，将FSPCA模型应用于手写数字识别，发现与SPC模型相比，FSPCA模型所提取的主成分具备更好的解释性，这使得该模型更具实用价值。相似文献

3.

In-Sample Inference and Forecasting in Misspecified Factor Models

Marine Carrasco Barbara Rossi 《商业与经济统计学杂志》2016,34(3):313-338

This article considers in-sample prediction and out-of-sample forecasting in regressions with many exogenous predictors. We consider four dimension-reduction devices: principal components, ridge, Landweber Fridman, and partial least squares. We derive rates of convergence for two representative models: an ill-posed model and an approximate factor model. The theory is developed for a large cross-section and a large time-series. As all these methods depend on a tuning parameter to be selected, we also propose data-driven selection methods based on cross-validation and establish their optimality. Monte Carlo simulations and an empirical application to forecasting inflation and output growth in the U.S. show that data-reduction methods outperform conventional methods in several relevant settings, and might effectively guard against instabilities in predictors’ forecasting ability. 相似文献

4.

General Sparse Boosting: Improving Feature Selection of L2 Boosting by Correlation-Based Penalty Family

Junlong Zhao 《统计学通讯:模拟与计算》2015,44(6):1612-1640

In high-dimensional setting, componentwise L₂boosting has been used to construct sparse model that performs well, but it tends to select many ineffective variables. Several sparse boosting methods, such as, SparseL₂Boosting and Twin Boosting, have been proposed to improve the variable selection of L₂boosting algorithm. In this article, we propose a new general sparse boosting method (GSBoosting). The relations are established between GSBoosting and other well known regularized variable selection methods in the orthogonal linear model, such as adaptive Lasso, hard thresholds, etc. Simulation results show that GSBoosting has good performance in both prediction and variable selection. 相似文献

5.

Sparse additive models

Pradeep Ravikumar John Lafferty Han Liu Larry Wasserman 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(5):1009-1030

Summary. We present a new class of methods for high dimensional non-parametric regression and classification called sparse additive models. Our methods combine ideas from sparse linear modelling and additive non-parametric regression. We derive an algorithm for fitting the models that is practical and effective even when the number of covariates is larger than the sample size. Sparse additive models are essentially a functional version of the grouped lasso of Yuan and Lin. They are also closely related to the COSSO model of Lin and Zhang but decouple smoothing and sparsity, enabling the use of arbitrary non-parametric smoothers. We give an analysis of the theoretical properties of sparse additive models and present empirical results on synthetic and real data, showing that they can be effective in fitting sparse non-parametric models in high dimensional data. 相似文献

6.

Inference of Genetic Networks from Time Course Expression Data Using Functional Regression with Lasso Penalty

Zhaoping Hong 《统计学通讯:理论与方法》2013,42(10):1768-1779

Statistical inference of genetic regulatory networks is essential for understanding temporal interactions of regulatory elements inside the cells. In this work, we propose to infer the parameters of the ordinary differential equations using the techniques from functional data analysis (FDA) by regarding the observed time course expression data as continuous-time curves. For networks with a large number of genes, we take advantage of the sparsity of the networks by penalizing the linear coefficients with a L ₁ norm. The ability of the algorithm to infer network structure is demonstrated using the cell-cycle time course data for Saccharomyces cerevisiae. 相似文献

7.

Sparse partial least squares regression for simultaneous dimension reduction and variable selection

Hyonho Chun Sündüz Kele&#; 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2010,72(1):3-25

Summary. Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. 相似文献

8.

Persistence of plug-in rule in classification of high dimensional multivariate binary data

Junyong Park Jayanta K. Ghosh 《Journal of statistical planning and inference》2007

相似文献

9.

Some challenges for statistics

A. C. Davison 《Statistical Methods and Applications》2008,17(2):167-181

The paper gives a highly personal sketch of some current trends in statistical inference. After an account of the challenges that new forms of data bring, there is a brief overview of some topics in stochastic modelling. The paper then turns to sparsity, illustrated using Bayesian wavelet analysis based on a mixture model and metabolite profiling. Modern likelihood methods including higher order approximation and composite likelihood inference are then discussed, followed by some thoughts on statistical education. 相似文献

10.

A Thresholding Algorithm for Order Selection in Finite Mixture Models

Chen Xu Jiahua Chen 《统计学通讯:模拟与计算》2015,44(2):433-453

Order selection is an important step in the application of finite mixture models. Classical methods such as AIC and BIC discourage complex models with a penalty directly proportional to the number of mixing components. In contrast, Chen and Khalili propose to link the penalty to two types of overfitting. In particular, they introduce a regularization penalty to merge similar subpopulations in a mixture model, where the shrinkage idea of regularized regression is seamlessly employed. However, the new method requires an effective and efficient algorithm. When the popular expectation-maximization (EM)-algorithm is used, we need to maximize a nonsmooth and nonconcave objective function in the M-step, which is computationally challenging. In this article, we show that such an objective function can be transformed into a sum of univariate auxiliary functions. We then design an iterative thresholding descent algorithm (ITD) to efficiently solve the associated optimization problem. Unlike many existing numerical approaches, the new algorithm leads to sparse solutions and thereby avoids undesirable ad hoc steps. We establish the convergence of the ITD and further assess its empirical performance using both simulations and real data examples. 相似文献