Temporal prediction of future state occupation in a multistate model from high-dimensional baseline covariates via pseudo-value regression |
| |
Authors: | Sandipan Dutta Susmita Datta Somnath Datta |
| |
Institution: | 1. Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, USAsandipan.dutta07@gmail.com;3. Department of Biostatistics, University of Florida, Gainesville, FL, USA |
| |
Abstract: | In many complex diseases such as cancer, a patient undergoes various disease stages before reaching a terminal state (say disease free or death). This fits a multistate model framework where a prognosis may be equivalent to predicting the state occupation at a future time t. With the advent of high-throughput genomic and proteomic assays, a clinician may intent to use such high-dimensional covariates in making better prediction of state occupation. In this article, we offer a practical solution to this problem by combining a useful technique, called pseudo-value (PV) regression, with a latent factor or a penalized regression method such as the partial least squares (PLS) or the least absolute shrinkage and selection operator (LASSO), or their variants. We explore the predictive performances of these combinations in various high-dimensional settings via extensive simulation studies. Overall, this strategy works fairly well provided the models are tuned properly. Overall, the PLS turns out to be slightly better than LASSO in most settings investigated by us, for the purpose of temporal prediction of future state occupation. We illustrate the utility of these PV-based high-dimensional regression methods using a lung cancer data set where we use the patients’ baseline gene expression values. |
| |
Keywords: | Censoring covariate gene expression LASSO PLS survival |
|
|