首页 | 本学科首页   官方微博 | 高级检索  
     

稳健高效的高维成分数据近似零值插补方法及应用
引用本文:熊巍等. 稳健高效的高维成分数据近似零值插补方法及应用[J]. 统计研究, 2020, 37(5): 104-116. DOI: 10.19343/j.cnki.11-1302/c.2020.05.009
作者姓名:熊巍等
作者单位:对外经济贸易大学统计学院;对外经济贸易大学统计学院数据科学系
基金项目:教育部人文社科项目“基因与环境交互效应对复杂疾病的影响及稳健识别分析与应用”(16YJCZH22);对外经济贸易大学中央高校基本科研业务费专项资金(CXTD10-09)。
摘    要:
随着计算机技术的迅猛发展,高维成分数据不断涌现并伴有大量近似零值和缺失,数据的高维特性不仅给传统统计方法带来了巨大的挑战,其厚尾特征、复杂的协方差结构也使得理论分析难上加难。于是如何对高维成分数据的近似零值进行稳健的插补,挖掘潜在的内蕴结构成为当今学者研究的焦点。对此,本文结合修正的EM算法,提出基于R型聚类的Lasso-分位回归插补法(SubLQR)对高维成分数据的近似零值问题予以解决。与现有高维近似零值插补方法相比,本文所提出的SubLQR具有如下优势。①稳健全面性:利用Lasso-分位回归方法,不仅可以有效地探测到响应变量的整个条件分布,还能提供更加真实的高维稀疏模式;②有效准确性:采用基于R型聚类的思想进行插补,可以降低计算复杂度,极大提高插补的精度。模拟研究证实,本文提出的SubLQR高效灵活准确,特别在零值、异常值较多的情形更具优势。最后将SubLQR方法应用于罕见病代谢组学研究中,进一步表明本文所提出的方法具有广泛的适用性。

关 键 词:高维成分数据  近似零值  Lasso-分位回归  修正EM算法  稳健

Robust Efficient Imputation of Rounded Zeros in High-dimensional Compositional Data and Its Applications
Xiong Wei et al.. Robust Efficient Imputation of Rounded Zeros in High-dimensional Compositional Data and Its Applications[J]. Statistical Research, 2020, 37(5): 104-116. DOI: 10.19343/j.cnki.11-1302/c.2020.05.009
Authors:Xiong Wei et al.
Abstract:
High-dimensional compositional data with massive rounded zeros and missing values are arising with the fast development of computer technology and bring much challenge to traditional statistical methods. The thick-tail and complicated covariance structure make the analysis even more difficult, thus exploring robust methods for imputation of rounded zeros in high-dimensional compositional data becomes the current focus of academic research. To this end, a robust method (SubLQR) based on modified EM algorithm is proposed, combining R-type clustering and Lasso-Quantile regression. The proposed SubLQR is superior to the existing imputation methods in the following aspects: 1) Robustness: with the application of Lasso-Quantile regression, a sparse pattern is provided; 2) Efficiency: with the use of R-type clustering, computation cost is reduced and precision is improved. Simulation results suggest that the proposed method performs better than the existing methods, especially when the percentage of zeros and outliners is large. Finally, real data analysis in metabolomics study of rare disease indicates the wide applicability of the proposed SubLQR.
Keywords:High-dimensional Compositional Data  Rounded Zeros  Lasso-Quantile Regression  Modified EM Algorithm  Robustness  
本文献已被 维普 等数据库收录!
点击此处可从《统计研究》浏览原始摘要信息
点击此处可从《统计研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号