首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In many complex diseases such as cancer, a patient undergoes various disease stages before reaching a terminal state (say disease free or death). This fits a multistate model framework where a prognosis may be equivalent to predicting the state occupation at a future time t. With the advent of high-throughput genomic and proteomic assays, a clinician may intent to use such high-dimensional covariates in making better prediction of state occupation. In this article, we offer a practical solution to this problem by combining a useful technique, called pseudo-value (PV) regression, with a latent factor or a penalized regression method such as the partial least squares (PLS) or the least absolute shrinkage and selection operator (LASSO), or their variants. We explore the predictive performances of these combinations in various high-dimensional settings via extensive simulation studies. Overall, this strategy works fairly well provided the models are tuned properly. Overall, the PLS turns out to be slightly better than LASSO in most settings investigated by us, for the purpose of temporal prediction of future state occupation. We illustrate the utility of these PV-based high-dimensional regression methods using a lung cancer data set where we use the patients’ baseline gene expression values.  相似文献   

2.
The proportional hazards mixed-effects model (PHMM) was proposed to handle dependent survival data. Motivated by its application in genetic epidemiology, we study the interpretation of its parameter estimates under violations of the proportional hazards assumption. The estimated fixed effect turns out to be an averaged regression effect over time, while the estimated variance component could be unaffected, inflated or attenuated depending on whether the random effect is on the baseline hazard, and whether the non-proportional regression effect decreases or increases over time. Using the conditional distribution of the covariates we define the standardized covariate residuals, which can be used to check the proportional hazards assumption. The model checking technique is illustrated on a multi-center lung cancer trial.  相似文献   

3.
Positron emission tomography (PET) imaging can be used to study the effects of pharmacologic intervention on brain function. Partial least squares (PLS) regression is a standard tool that can be applied to characterize such effects throughout the brain volume and across time. We have extended the PLS regression methodology to adjust for covariate effects that may influence spatial and temporal aspects of the functional image data over the brain volume. The extension involves multi-dimensional latent variables, experimental design variables based upon sequential PET scanning, and covariates. An illustration is provided using a sequential PET data set acquired to study the effect of d-amphetamine on cerebral blood flow in baboons. An iterative algorithm is developed and implemented and validation results are provided through computer simulation studies.  相似文献   

4.
Survival studies usually collect on each participant, both duration until some terminal event and repeated measures of a time-dependent covariate. Such a covariate is referred to as an internal time-dependent covariate. Usually, some subjects drop out of the study before occurence of the terminal event of interest. One may then wish to evaluate the relationship between time to dropout and the internal covariate. The Cox model is a standard framework for that purpose. Here, we address this problem in situations where the value of the covariate at dropout is unobserved. We suggest a joint model which combines a first-order Markov model for the longitudinaly measured covariate with a time-dependent Cox model for the dropout process. We consider maximum likelihood estimation in this model and show how estimation can be carried out via the EM-algorithm. We state that the suggested joint model may have applications in the context of longitudinal data with nonignorable dropout. Indeed, it can be viewed as generalizing Diggle and Kenward's model (1994) to situations where dropout may occur at any point in time and may be censored. Hence we apply both models and compare their results on a data set concerning longitudinal measurements among patients in a cancer clinical trial.  相似文献   

5.
Linear regression with compositional explanatory variables   总被引:1,自引:0,他引:1  
Compositional explanatory variables should not be directly used in a linear regression model because any inference statistic can become misleading. While various approaches for this problem were proposed, here an approach based on the isometric logratio (ilr) transformation is used. It turns out that the resulting model is easy to handle, and that parameter estimation can be done in like in usual linear regression. Moreover, it is possible to use the ilr variables for inference statistics in order to obtain an appropriate interpretation of the model.  相似文献   

6.
The authors propose graphical and numerical methods for checking the adequacy of the logistic regression model for matched case‐control data. Their approach is based on the cumulative sum of residuals over the covariate or linear predictor. Under the assumed model, the cumulative residual process converges weakly to a centered Gaussian limit whose distribution can be approximated via computer simulation. The observed cumulative residual pattern can then be compared both visually and analytically to a certain number of simulated realizations of the approximate limiting process under the null hypothesis. The proposed techniques allow one to check the functional form of each covariate, the logistic link function as well as the overall model adequacy. The authors assess the performance of the proposed methods through simulation studies and illustrate them using data from a cardiovascular study.  相似文献   

7.
Using some logarithmic and integral transformation we transform a continuous covariate frailty model into a polynomial regression model with a random effect. The responses of this mixed model can be ‘estimated’ via conditional hazard function estimation. The random error in this model does not have zero mean and its variance is not constant along the covariate and, consequently, these two quantities have to be estimated. Since the asymptotic expression for the bias is complicated, the two-large-bandwidth trick is proposed to estimate the bias. The proposed transformation is very useful for clustered incomplete data subject to left truncation and right censoring (and for complex clustered data in general). Indeed, in this case no standard software is available to fit the frailty model, whereas for the transformed model standard software for mixed models can be used for estimating the unknown parameters in the original frailty model. A small simulation study illustrates the good behavior of the proposed method. This method is applied to a bladder cancer data set.  相似文献   

8.
Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.  相似文献   

9.
Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park et al. (Bioinformatics 18(Suppl. 1):S120–S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods.  相似文献   

10.
11.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

12.
Count data often contain many zeros. In parametric regression analysis of zero-inflated count data, the effect of a covariate of interest is typically modelled via a linear predictor. This approach imposes a restrictive, and potentially questionable, functional form on the relation between the independent and dependent variables. To address the noted restrictions, a flexible parametric procedure is employed to model the covariate effect as a linear combination of fixed-knot cubic basis splines or B-splines. The semiparametric zero-inflated Poisson regression model is fitted by maximizing the likelihood function through an expectation–maximization algorithm. The smooth estimate of the functional form of the covariate effect can enhance modelling flexibility. Within this modelling framework, a log-likelihood ratio test is used to assess the adequacy of the covariate function. Simulation results show that the proposed test has excellent power in detecting the lack of fit of a linear predictor. A real-life data set is used to illustrate the practicality of the methodology.  相似文献   

13.
Inverse Gaussian first hitting time regression models sometimes provide an attractive representation of lifetime data. Various authors comment that dependence of both parameters on the same covariate may imply multicollinearity. The frequent appearance of conflicting signs for the two coefficients of the same covariate may be related to this. We carry out simulation studies to examine the reality of this possible multicollinearity. Although there is some dependence between estimates, multicollinearity does not seem to be a major problem. Fitting this model to data generated by a Weibull regression suggests that conflicting signs of estimates may be due to model misspecification.  相似文献   

14.
The mean residual life measures the expected remaining life of a subject who has survived up to a particular time. When survival time distribution is highly skewed or heavy tailed, the restricted mean residual life must be considered. In this paper, we propose an additive–multiplicative restricted mean residual life model to study the association between the restricted mean residual life function and potential regression covariates in the presence of right censoring. This model extends the proportional mean residual life model using an additive model as its covariate dependent baseline. For the suggested model, some covariate effects are allowed to be time‐varying. To estimate the model parameters, martingale estimating equations are developed, and the large sample properties of the resulting estimators are established. In addition, to assess the adequacy of the model, we investigate a goodness of fit test that is asymptotically justified. The proposed methodology is evaluated via simulation studies and further applied to a kidney cancer data set collected from a clinical trial.  相似文献   

15.
In the parametric regression model, the covariate missing problem under missing at random is considered. It is often desirable to use flexible parametric or semiparametric models for the covariate distribution, which can reduce a potential misspecification problem. Recently, a completely nonparametric approach was developed by [H.Y. Chen, Nonparametric and semiparametric models for missing covariates in parameter regression, J. Amer. Statist. Assoc. 99 (2004), pp. 1176–1189; Z. Zhang and H.E. Rockette, On maximum likelihood estimation in parametric regression with missing covariates, J. Statist. Plann. Inference 47 (2005), pp. 206–223]. Although it does not require a model for the covariate distribution or the missing data mechanism, the proposed method assumes that the covariate distribution is supported only by observed values. Consequently, their estimator is a restricted maximum likelihood estimator (MLE) rather than the global MLE. In this article, we show the restricted semiparametric MLE could be very misleading in some cases. We discuss why this problem occurs and suggest an algorithm to obtain the global MLE. Then, we assess the performance of the proposed method via some simulation experiments.  相似文献   

16.
In this article, we propose a flexible parametric (FP) approach for adjusting for covariate measurement errors in regression that can accommodate replicated measurements on the surrogate (mismeasured) version of the unobserved true covariate on all the study subjects or on a sub-sample of the study subjects as error assessment data. We utilize the general framework of the FP approach proposed by Hossain and Gustafson in 2009 for adjusting for covariate measurement errors in regression. The FP approach is then compared with the existing non-parametric approaches when error assessment data are available on the entire sample of the study subjects (complete error assessment data) considering covariate measurement error in a multiple logistic regression model. We also developed the FP approach when error assessment data are available on a sub-sample of the study subjects (partial error assessment data) and investigated its performance using both simulated and real life data. Simulation results reveal that, in comparable situations, the FP approach performs as good as or better than the competing non-parametric approaches in eliminating the bias that arises in the estimated regression parameters due to covariate measurement errors. Also, it results in better efficiency of the estimated parameters. Finally, the FP approach is found to perform adequately well in terms of bias correction, confidence coverage, and in achieving appropriate statistical power under partial error assessment data.  相似文献   

17.
We extend four tests common in classical regression – Wald, score, likelihood ratio and F tests – to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.  相似文献   

18.
Generalized linear models with random effects and/or serial dependence are commonly used to analyze longitudinal data. However, the computation and interpretation of marginal covariate effects can be difficult. This led Heagerty (1999, 2002) to propose models for longitudinal binary data in which a logistic regression is first used to explain the average marginal response. The model is then completed by introducing a conditional regression that allows for the longitudinal, within‐subject, dependence, either via random effects or regressing on previous responses. In this paper, the authors extend the work of Heagerty to handle multivariate longitudinal binary response data using a triple of regression models that directly model the marginal mean response while taking into account dependence across time and across responses. Markov Chain Monte Carlo methods are used for inference. Data from the Iowa Youth and Families Project are used to illustrate the methods.  相似文献   

19.
In regression analyses of spatially structured data, it is common practice to introduce spatially correlated random effects into the regression model to reduce or even avoid unobserved variable bias in the estimation of other covariate effects. If besides the response the covariates are also spatially correlated, the spatial effects may confound the effect of the covariates or vice versa. In this case, the model fails to identify the true covariate effect due to multicollinearity. For highly collinear continuous covariates, path analysis and structural equation modeling techniques prove to be helpful to disentangle direct covariate effects from indirect covariate effects arising from correlation with other variables. This work discusses the applicability of these techniques in regression setups, where spatial and covariate effects coincide at least partly and classical geoadditive models fail to separate these effects. Supplementary materials for this article are available online.  相似文献   

20.
We propose a general framework for regression models with functional response containing a potentially large number of flexible effects of functional and scalar covariates. Special emphasis is put on historical functional effects, where functional response and functional covariate are observed over the same interval and the response is only influenced by covariate values up to the current grid point. Historical functional effects are mostly used when functional response and covariate are observed on a common time interval, as they account for chronology. Our formulation allows for flexible integration limits including, e.g., lead or lag times. The functional responses can be observed on irregular curve-specific grids. Additionally, we introduce different parameterizations for historical effects and discuss identifiability issues.The models are estimated by a component-wise gradient boosting algorithm which is suitable for models with a potentially high number of covariate effects, even more than observations, and inherently does model selection. By minimizing corresponding loss functions, different features of the conditional response distribution can be modeled, including generalized and quantile regression models as special cases. The methods are implemented in the open-source R package FDboost. The methodological developments are motivated by biotechnological data on Escherichia coli fermentations, but cover a much broader model class.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号