Dealing with big data: comparing dimension reduction and shrinkage regression methods |
| |
Authors: | Hamideh D Hamedani Sara Sadat Moosavi |
| |
Institution: | Statistics Department, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran |
| |
Abstract: | In the past decades, the number of variables explaining observations in different practical applications increased gradually. This has led to heavy computational tasks, despite of widely using provisional variable selection methods in data processing. Therefore, more methodological techniques have appeared to reduce the number of explanatory variables without losing much of the information. In these techniques, two distinct approaches are apparent: ‘shrinkage regression’ and ‘sufficient dimension reduction’. Surprisingly, there has not been any communication or comparison between these two methodological categories, and it is not clear when each of these two approaches are appropriate. In this paper, we fill some of this gap by first reviewing each category in brief, paying special attention to the most commonly used methods in each category. We then compare commonly used methods from both categories based on their accuracy, computation time, and their ability to select effective variables. A simulation study on the performance of the methods in each category is generated as well. The selected methods are concurrently tested on two sets of real data which allows us to recommend conditions under which one approach is more appropriate to be applied to high-dimensional data. |
| |
Keywords: | Sufficient dimension reduction central subspace SPICE method shrinkage regression LASSO Elastic-Net FLASH OSCAR SCAD Ridge regression |
|
|