首页 | 本学科首页   官方微博 | 高级检索  
     检索      


High dimensional variable selection with clustered data: an application of random multivariate survival forests for detection of outlier medical device components
Authors:Guy Cafri  Peter Calhoun  Juanjuan Fan
Institution:1. Surgical Outcomes and Analysis, Kaiser Permanente, San Diego, CA, USA;2. Computational Science Research Center, San Diego State University, San Diego, CA, USA;3. Department of Mathematics and Statistics, San Diego State University, San Diego, CA, USA
Abstract:In many medical studies patients are nested or clustered within doctor. With many explanatory variables, variable selection with clustered data can be challenging. We propose a method for variable selection based on random forest that addresses clustered data through stratified binary splits. Our motivating example involves the detection orthopedic device components from a large pool of candidates, where each patient belongs to a surgeon. Simulations compare the performance of survival forests grown using the stratified logrank statistic to conventional and robust logrank statistics, as well as a method to select variables using a threshold value based on a variable's empirical null distribution. The stratified logrank test performs superior to conventional and robust methods when data are generated to have cluster-specific effects, and when cluster sizes are sufficiently large, perform comparably to the splitting alternatives in the absence of cluster-specific effects. Thresholding was effective at distinguishing between important and unimportant variables.
Keywords:Medical devices  multivariate  random forest  stratification  survival
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号