首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 203 毫秒
This paper discusses regression analysis of current status or case I interval‐censored failure time data arising from the additive hazards model. In this situation, some covariates could be missing because of various reasons, but there may exist some auxiliary information about the missing covariates. To address the problem, we propose an estimated partial likelihood approach for estimation of regression parameters, which makes use of the available auxiliary information. The method can be easily implemented, and the asymptotic properties of the resulting estimates are established. To assess the finite sample performance of the proposed method, an extensive simulation study is conducted and indicates that the method works well.  相似文献   

This paper deals with the analysis of proportional rate model for recurrent event data when covariates are subject to missing. The true covariate is measured only on a randomly chosen validation set, whereas auxiliary information is available for all cohort subjects. To further utilize the auxiliary information to improve study efficiency, we propose an estimated estimating equation for the regression parameters. The resulting estimators are shown to be consistent and asymptotically normal. Both graphical and numerical techniques for checking the adequacy of the model are presented. Simulations are conducted to evaluate the finite sample performance of the proposed estimators. Illustration with a real medical study is provided.  相似文献   

Random coefficient model (RCM) is a powerful statistical tool in analyzing correlated data collected from studies with different clusters or from longitudinal studies. In practice, there is a need for statistical methods that allow biomedical researchers to adjust for the measured and unmeasured covariates that might affect the regression model. This article studies two nonparametric methods dealing with auxiliary covariate data in linear random coefficient models. We demonstrate how to estimate the coefficients of the models and how to predict the random effects when the covariates are missing or mismeasured. We employ empirical estimator and kernel smoother to handle a discrete and continuous auxiliary, respectively. Simulation results show that the proposed methods perform better than an alternative method that only uses data in the validation data set and ignores the random effects in the random coefficient model.  相似文献   

This paper discusses regression analysis of current status failure time data with information observations and continuous auxiliary covariates. Under the additive hazards model, we employ a frailty model to describe the relationship between the failure time of interest and censoring time through some latent variables and propose an estimated partial likelihood estimator of regression parameters that makes use of the available auxiliary information. Asymptotic properties of the resulting estimators are established. To assess the finite sample performance of the proposed method, an extensive simulation study is conducted, and the results indicate that the proposed method works well. An illustrative example is also provided.  相似文献   

There are several procedures for fitting generalized additive models, i.e. regression models for an exponential family response where the influence of each single covariates is assumed to have unknown, potentially non-linear shape. Simulated data are used to compare a smoothing parameter optimization approach for selection of smoothness and of covariates, a stepwise approach, a mixed model approach, and a procedure based on boosting techniques. In particular it is investigated how the performance of procedures is linked to amount of information, type of response, total number of covariates, number of influential covariates, and extent of non-linearity. Measures for comparison are prediction performance, identification of influential covariates, and smoothness of fitted functions. One result is that the mixed model approach returns sparse fits with frequently over-smoothed functions, while the functions are less smooth for the boosting approach and variable selection is less strict. The other approaches are in between with respect to these measures. The boosting procedure is seen to perform very well when little information is available and/or when a large number of covariates is to be investigated. It is somewhat surprising that in scenarios with low information the fitting of a linear model, even with stepwise variable selection, has not much advantage over the fitting of an additive model when the true underlying structure is linear. In cases with more information the prediction performance of all procedures is very similar. So, in difficult data situations the boosting approach can be recommended, in others the procedures can be chosen conditional on the aim of the analysis.  相似文献   


This paper analyses the behaviour of the goodness-of-fit tests for regression models. To this end, it uses statistics based on an estimation of the integrated regression function with missing observations either in the response variable or in some of the covariates. It proposes several versions of one empirical process, constructed from a previous estimation, that uses only the complete observations or replaces the missing observations with imputed values. In the case of missing covariates, a link model is used to fill the missing observations with other complete covariates. In all the situations, Bootstrap methodology is used to calibrate the distribution of the test statistics. A broad simulation study compares the different procedures based on empirical regression methodology, with smoothed tests previously studied in the literature. The comparison reflects the effect of the correlation between the covariates in the tests based on the imputed sample for missing covariates. In addition, the paper proposes a computational binning strategy to evaluate the tests based on an empirical process for large data sets. Finally, two applications to real data illustrate the performance of the tests.  相似文献   

We derive the optimal regression function (i.e., the best approximation in the L2 sense) when the vector of covariates has a random dimension. Furthermore, we consider applications of these results to problems in statistical regression and classification with missing covariates. It will be seen, perhaps surprisingly, that the correct regression function for the case with missing covariates can sometimes perform better than the usual regression function corresponding to the case with no missing covariates. This is because even if some of the covariates are missing, an indicator random variable δδ, which is always observable, and is equal to 1 if there are no missing values (and 0 otherwise), may have far more information and predictive power about the response variable Y than the missing covariates do. We also propose kernel-based procedures for estimating the correct regression function nonparametrically. As an alternative estimation procedure, we also consider the least-squares method.  相似文献   

Based on the inverse probability weight method, we, in this article, construct the empirical likelihood (EL) and penalized empirical likelihood (PEL) ratios of the parameter in the linear quantile regression model when the covariates are missing at random, in the presence and absence of auxiliary information, respectively. It is proved that the EL ratio admits a limiting Chi-square distribution. At the same time, the asymptotic normality of the maximum EL and PEL estimators of the parameter is established. Also, the variable selection of the model in the presence and absence of auxiliary information, respectively, is discussed. Simulation study and a real data analysis are done to evaluate the performance of the proposed methods.  相似文献   

In this paper, we consider how to incorporate quantile information to improve estimator efficiency for regression model with missing covariates. We combine the quantile information with least-squares normal equations and construct an unbiased estimating equations (EEs). The lack of smoothness of the objective EEs is overcome by replacing them with smooth approximations. The maximum smoothed empirical likelihood (MSEL) estimators are established based on inverse probability weighted (IPW) smoothed EEs and their asymptotic properties are studied under some regular conditions. Moreover, we develop two novel testing procedures for the underlying model. The finite-sample performance of the proposed methodology is examined by simulation studies. A real example is used to illustrate our methods.  相似文献   

We consider graphs, confidence procedures and tests that can be used to compare transition probabilities in a Markov chain model with intensities specified by a Cox proportional hazard model. Under assumptions of this model, the regression coefficients provide information about the relative risks of covariates in one–step transitions, however, they cannot in general be used to to assess whether or not the covariates have a beneficial or detrimental effect on the endpoint events. To alleviate this problem, we consider graphical tests based on confidence procedures for a generalized Q–Q plot and for the difference between transition probabilities. The procedures are illustrated using data of the International Bone Marrow Transplant Registry.  相似文献   

In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology.  相似文献   

Most regression problems in practice require flexible semiparametric forms of the predictor for modelling the dependence of responses on covariates. Moreover, it is often necessary to add random effects accounting for overdispersion caused by unobserved heterogeneity or for correlation in longitudinal or spatial data. We present a unified approach for Bayesian inference via Markov chain Monte Carlo simulation in generalized additive and semiparametric mixed models. Different types of covariates, such as the usual covariates with fixed effects, metrical covariates with non-linear effects, unstructured random effects, trend and seasonal components in longitudinal data and spatial covariates, are all treated within the same general framework by assigning appropriate Markov random field priors with different forms and degrees of smoothness. We applied the approach in several case-studies and consulting cases, showing that the methods are also computationally feasible in problems with many covariates and large data sets. In this paper, we choose two typical applications.  相似文献   

We propose a method for estimating parameters in generalized linear models when the outcome variable is missing for some subjects and the missing data mechanism is non-ignorable. We assume throughout that the covariates are fully observed. One possible method for estimating the parameters is maximum likelihood with a non-ignorable missing data model. However, caution must be used when fitting non-ignorable missing data models because certain parameters may be inestimable for some models. Instead of fitting a non-ignorable model, we propose the use of auxiliary information in a likelihood approach to reduce the bias, without having to specify a non-ignorable model. The method is applied to a mental health study.  相似文献   

Right‐censored and length‐biased failure time data arise in many fields including cross‐sectional prevalent cohort studies, and their analysis has recently attracted a great deal of attention. It is well‐known that for regression analysis of failure time data, two commonly used approaches are hazard‐based and quantile‐based procedures, and most of the existing methods are the hazard‐based ones. In this paper, we consider quantile regression analysis of right‐censored and length‐biased data and present a semiparametric varying‐coefficient partially linear model. For estimation of regression parameters, a three‐stage procedure that makes use of the inverse probability weighted technique is developed, and the asymptotic properties of the resulting estimators are established. In addition, the approach allows the dependence of the censoring variable on covariates, while most of the existing methods assume the independence between censoring variables and covariates. A simulation study is conducted and suggests that the proposed approach works well in practical situations. Also, an illustrative example is provided.  相似文献   

Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data.  相似文献   

The article’s topic is logistic regression for direct data on the covariates, but indirect data on the endogenous variable. The indirect data may result from a privacy-protecting survey procedure for sensitive characteristics or from statistical disclosure control. Various procedures to generate the indirect data exist. However, we show that it is possible to develop a general approach for logistic regression analyses with indirect data that covers many procedures. We first derive a general algorithm for the maximum likelihood estimation and a general procedure for variance estimation. Subsequently, lots of examples demonstrate the broad applicability of our general framework.  相似文献   

Screening procedures play an important role in data analysis, especially in high-throughput biological studies where the datasets consist of more covariates than independent subjects. In this article, a Bayesian screening procedure is introduced for the binary response models with logit and probit links. In contrast to many screening rules based on marginal information involving one or a few covariates, the proposed Bayesian procedure simultaneously models all covariates and uses closed-form screening statistics. Specifically, we use the posterior means of the regression coefficients as screening statistics; by imposing a generalized g-prior on the regression coefficients, we derive the analytical form of their posterior means and compute the screening statistics without Markov chain Monte Carlo implementation. We evaluate the utility of the proposed Bayesian screening method using simulations and real data analysis. When the sample size is small, the simulation results suggest improved performance with comparable computational cost.  相似文献   

Accounting for an auxiliary covariate in a two-phase sampling strategy in order to reduce the experimental costs was initially proposed by Cochran (Sampling Techniques, 2nd Edition, Wiley, New York, 1963, Sampling Techniques, 3rd Edition, Wiley, New York, 1977) in the context of sample surveys. Conniffe and Moran (Biometrics 28 (1972) 1011) have extended this methodology to the estimation of linear regression functions. More recently, Conniffe (J. Econometrics 27 (1985) 179) and Causeur and Dhorne (Biometrics 54 (4) (1998) 1591) have derived two-phase sampling estimators of the linear regression function in the situation where many auxiliary covariates are available. A detailed study of the distributional aspects of these estimators is provided by Causeur (Statistics 32 (1999) 297). In the same multivariate context, this paper aims at an extension of the double-sampling strategies to monotone designs accounting for differences between the costs of subsets of covariates. In particular, the maximum-likelihood estimators are provided and asymptotic solutions for the optimal designs are derived.  相似文献   

This article considers a nonparametric varying coefficient regression model with longitudinal observations. The relationship between the dependent variable and the covariates is assumed to be linear at a specific time point, but the coefficients are allowed to change over time. A general formulation is used to treat mean regression, median regression, quantile regression, and robust mean regression in one setting. The local M-estimators of the unknown coefficient functions are obtained by local linear method. The asymptotic distributions of M-estimators of unknown coefficient functions at both interior and boundary points are established. Various applications of the main results, including estimating conditional quantile coefficient functions and robustifying the mean regression coefficient functions are derived. Finite sample properties of our procedures are studied through Monte Carlo simulations.  相似文献   

We propose methods for Bayesian inference for missing covariate data with a novel class of semi-parametric survival models with a cure fraction. We allow the missing covariates to be either categorical or continuous and specify a parametric distribution for the covariates that is written as a sequence of one dimensional conditional distributions. We assume that the missing covariates are missing at random (MAR) throughout. We propose an informative class of joint prior distributions for the regression coefficients and the parameters arising from the covariate distributions. The proposed class of priors are shown to be useful in recovering information on the missing covariates especially in situations where the missing data fraction is large. Properties of the proposed prior and resulting posterior distributions are examined. Also, model checking techniques are proposed for sensitivity analyses and for checking the goodness of fit of a particular model. Specifically, we extend the Conditional Predictive Ordinate (CPO) statistic to assess goodness of fit in the presence of missing covariate data. Computational techniques using the Gibbs sampler are implemented. A real data set involving a melanoma cancer clinical trial is examined to demonstrate the methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号