首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Covariate informed product partition models incorporate the intuitively appealing notion that individuals or units with similar covariate values a priori have a higher probability of co-clustering than those with dissimilar covariate values. These methods have been shown to perform well if the number of covariates is relatively small. However, as the number of covariates increase, their influence on partition probabilities overwhelm any information the response may provide in clustering and often encourage partitions with either a large number of singleton clusters or one large cluster resulting in poor model fit and poor out-of-sample prediction. This same phenomenon is observed in Bayesian nonparametric regression methods that induce a conditional distribution for the response given covariates through a joint model. In light of this, we propose two methods that calibrate the covariate-dependent partition model by capping the influence that covariates have on partition probabilities. We demonstrate the new methods’ utility using simulation and two publicly available datasets.  相似文献   

2.
Under the case-cohort design introduced by Prentice (Biometrica 73:1–11, 1986), the covariate histories are ascertained only for the subjects who experience the event of interest (i.e., the cases) during the follow-up period and for a relatively small random sample from the original cohort (i.e., the subcohort). The case-cohort design has been widely used in clinical and epidemiological studies to assess the effects of covariates on failure times. Most statistical methods developed for the case-cohort design use the proportional hazards model, and few methods allow for time-varying regression coefficients. In addition, most methods disregard data from subjects outside of the subcohort, which can result in inefficient inference. Addressing these issues, this paper proposes an estimation procedure for the semiparametric additive hazards model with case-cohort/two-phase sampling data, allowing the covariates of interest to be missing for cases as well as for non-cases. A more flexible form of the additive model is considered that allows the effects of some covariates to be time varying while specifying the effects of others to be constant. An augmented inverse probability weighted estimation procedure is proposed. The proposed method allows utilizing the auxiliary information that correlates with the phase-two covariates to improve efficiency. The asymptotic properties of the proposed estimators are established. An extensive simulation study shows that the augmented inverse probability weighted estimation is more efficient than the widely adopted inverse probability weighted complete-case estimation method. The method is applied to analyze data from a preventive HIV vaccine efficacy trial.  相似文献   

3.
A major survey of the determinants of access to primary education in Madagascar was carried out in 1994. The probability of enrolment, probability of admission, delay before beginning school, probability of repeating a year and probability of dropping out were studied. The results of the survey are briefly described. In the analysis, one major problem was non-random missing values in the covariates. Some simple methods were developed for detecting whether a response variable depends on the missingness of a given covariate and whether eliminating the missing values would distort the resulting model. A way of incorporating covariates with randomly missing values was used such that the individuals having the missing values did not need to be eliminated. These methods are described and examples are given on how they were applied for one of the key covariates that had a large number of non-random missing values and for one for which the values appear to be randomly missing.  相似文献   

4.
The authors consider semiparametric efficient estimation of parameters in the conditional mean model for a simple incomplete data structure in which the outcome of interest is observed only for a random subset of subjects but covariates and surrogate (auxiliary) outcomes are observed for all. They use optimal estimating function theory to derive the semiparametric efficient score in closed form. They show that when covariates and auxiliary outcomes are discrete, a Horvitz‐Thompson type estimator with empirically estimated weights is semiparametric efficient. The authors give simulation studies validating the finite‐sample behaviour of the semiparametric efficient estimator and its asymptotic variance; they demonstrate the efficiency of the estimator in realistic settings.  相似文献   

5.
We consider surveys with one or more callbacks and use a series of logistic regressions to model the probabilities of nonresponse at first contact and subsequent callbacks. These probabilities are allowed to depend on covariates as well as the categorical variable of interest and so the nonresponse mechanism is nonignorable. Explicit formulae for the score functions and information matrices are given for some important special cases to facilitate implementation of the method of scoring for obtaining maximum likelihood estimates of the model parameters. For estimating finite population quantities, we suggest the imputation and prediction approaches as alternatives to weighting adjustment. Simulation results suggest that the proposed methods work well in reducing the bias due to nonresponse. In our study, the imputation and prediction approaches perform better than weighting adjustment and they continue to perform quite well in simulations involving misspecified response models.  相似文献   

6.
Latent Variable Models for Mixed Discrete and Continuous Outcomes   总被引:1,自引:0,他引:1  
We propose a latent variable model for mixed discrete and continuous outcomes. The model accommodates any mixture of outcomes from an exponential family and allows for arbitrary covariate effects, as well as direct modelling of covariates on the latent variable. An EM algorithm is proposed for parameter estimation and estimates of the latent variables are produced as a by-product of the analysis. A generalized likelihood ratio test can be used to test the significance of covariates affecting the latent outcomes. This method is applied to birth defects data, where the outcomes of interest are continuous measures of size and binary indicators of minor physical anomalies. Infants who were exposed in utero to anticonvulsant medications are compared with controls.  相似文献   

7.
This paper studies the problem of mean response estimation where missingness occurs to the response but multiple-dimensional covariates are observable. Two main challenges occur in this situation: curse of dimensionality and model specification. The non parametric imputation method relieves model specification but suffers curse of dimensionality, while some model-based methods such as inverse probability weighting (IPW) and augmented inverse probability weighting (AIPW) methods are the opposite. We propose a unified non parametric method to overcome the two challenges with the aiding of sufficient dimension reduction. It imposes no parametric structure on propensity score or conditional mean response, and thus retains the non parametric flavor. Moreover, the estimator achieves the optimal efficiency that a double robust estimator can attain. Simulations were conducted and it demonstrates the excellent performances of our method in various situations.  相似文献   

8.
Assessing dose response from flexible‐dose clinical trials is problematic. The true dose effect may be obscured and even reversed in observed data because dose is related to both previous and subsequent outcomes. To remove selection bias, we propose marginal structural models, inverse probability of treatment‐weighting (IPTW) methodology. Potential clinical outcomes are compared across dose groups using a marginal structural model (MSM) based on a weighted pooled repeated measures analysis (generalized estimating equations with robust estimates of standard errors), with dose effect represented by current dose and recent dose history, and weights estimated from the data (via logistic regression) and determined as products of (i) inverse probability of receiving dose assignments that were actually received and (ii) inverse probability of remaining on treatment by this time. In simulations, this method led to almost unbiased estimates of true dose effect under various scenarios. Results were compared with those obtained by unweighted analyses and by weighted analyses under various model specifications. The simulation showed that the IPTW MSM methodology is highly sensitive to model misspecification even when weights are known. Practitioners applying MSM should be cautious about the challenges of implementing MSM with real clinical data. Clinical trial data are used to illustrate the methodology. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
Suppose that data are generated according to the model f ( y | x ; θ ) g ( x ), where y is a response and x are covariates. We derive and compare semiparametric likelihood and pseudolikelihood methods for estimating θ for situations in which units generated are not fully observed and in which it is impossible or undesirable to model the covariate distribution. The probability that a unit is fully observed may depend on y , and there may be a subset of covariates which is observed only for a subsample of individuals. Our key assumptions are that the probability that a unit has missing data depends only on which of a finite number of strata that ( y , x ) belongs to and that the stratum membership is observed for every unit. Applications include case–control studies in epidemiology, field reliability studies and broad classes of missing data and measurement error problems. Our results make fully efficient estimation of θ feasible, and they generalize and provide insight into a variety of methods that have been proposed for specific problems.  相似文献   

10.
Efficient statistical inference on nonignorable missing data is a challenging problem. This paper proposes a new estimation procedure based on composite quantile regression (CQR) for linear regression models with nonignorable missing data, that is applicable even with high-dimensional covariates. A parametric model is assumed for modelling response probability, which is estimated by the empirical likelihood approach. Local identifiability of the proposed strategy is guaranteed on the basis of an instrumental variable approach. A set of data-based adaptive weights constructed via an empirical likelihood method is used to weight CQR functions. The proposed method is resistant to heavy-tailed errors or outliers in the response. An adaptive penalisation method for variable selection is proposed to achieve sparsity with high-dimensional covariates. Limiting distributions of the proposed estimators are derived. Simulation studies are conducted to investigate the finite sample performance of the proposed methodologies. An application to the ACTG 175 data is analysed.  相似文献   

11.
Regression methods typically construct a mapping from the covariates into the real numbers. Here, however, we consider regression problems where the task is to form a mapping from the covariates into a set of (univariate) real-valued functions. Examples are given by conditional density estimation, hazard regression and regression with a functional response. Our approach starts by modeling the function of interest using a sum of B-spline basis functions. To model dependence on the covariates, the coefficients of this expansion are each modeled as functions of the covariates. We propose to estimate these coefficient functions using boosted tree models. Algorithms are provided for the above three situations, and real data sets are used to investigate their performance. The results indicate that the proposed methodology performs well. In addition, it is both straightforward, and capable of handling a large number of covariates.  相似文献   

12.
This paper compares methods for modeling the probability of removal when variable amounts of removal effort are present. A hierarchical modeling framework can produce estimates of animal abundance and detection from replicated removal counts taken at different locations in a region of interest. A common method of specifying variation in detection probabilities across locations or replicates is with a logistic model that incorporates relevant detection covariates. As an alternative to this logistic model, we propose using a catch–effort (CE) model to account for heterogeneity in detection when a measure of removal effort is available for each removal count. This method models the probability of detection as a nonlinear function of removal effort and a removal probability parameter that can vary spatially. Simulation results demonstrate that the CE model can effectively estimate abundance and removal probabilities when average removal rates are large but both the CE and logistic models tend to produce biased estimates as average removal rates decrease. We also found that the CE model fits better than logistic models when estimating wild turkey abundance using harvest and hunter counts collected by the Minnesota Department of Natural Resources during the spring turkey hunting season.  相似文献   

13.
The elderly population in the USA is expected to double in size by the year 2025, making longitudinal health studies of this population of increasing importance. The degree of loss to follow-up in studies of the elderly, which is often because elderly people cannot remain in the study, enter a nursing home or die, make longitudinal studies of this population problematic. We propose a latent class model for analysing multiple longitudinal binary health outcomes with multiple-cause non-response when the data are missing at random and a non-likelihood-based analysis is performed. We extend the estimating equations approach of Robins and co-workers to latent class models by reweighting the multiple binary longitudinal outcomes by the inverse probability of being observed. This results in consistent parameter estimates when the probability of non-response depends on observed outcomes and covariates (missing at random) assuming that the model for non-response is correctly specified. We extend the non-response model so that institutionalization, death and missingness due to failure to locate, refusal or incomplete data each have their own set of non-response probabilities. Robust variance estimates are derived which account for the use of a possibly misspecified covariance matrix, estimation of missing data weights and estimation of latent class measurement parameters. This approach is then applied to a study of lower body function among a subsample of the elderly participating in the 6-year Longitudinal Study of Aging.  相似文献   

14.
Clinical studies aimed at identifying effective treatments to reduce the risk of disease or death often require long term follow-up of participants in order to observe a sufficient number of events to precisely estimate the treatment effect. In such studies, observing the outcome of interest during follow-up may be difficult and high rates of censoring may be observed which often leads to reduced power when applying straightforward statistical methods developed for time-to-event data. Alternative methods have been proposed to take advantage of auxiliary information that may potentially improve efficiency when estimating marginal survival and improve power when testing for a treatment effect. Recently, Parast et al. (J Am Stat Assoc 109(505):384–394, 2014) proposed a landmark estimation procedure for the estimation of survival and treatment effects in a randomized clinical trial setting and demonstrated that significant gains in efficiency and power could be obtained by incorporating intermediate event information as well as baseline covariates. However, the procedure requires the assumption that the potential outcomes for each individual under treatment and control are independent of treatment group assignment which is unlikely to hold in an observational study setting. In this paper we develop the landmark estimation procedure for use in an observational setting. In particular, we incorporate inverse probability of treatment weights (IPTW) in the landmark estimation procedure to account for selection bias on observed baseline (pretreatment) covariates. We demonstrate that consistent estimates of survival and treatment effects can be obtained by using IPTW and that there is improved efficiency by using auxiliary intermediate event and baseline information. We compare our proposed estimates to those obtained using the Kaplan–Meier estimator, the original landmark estimation procedure, and the IPTW Kaplan–Meier estimator. We illustrate our resulting reduction in bias and gains in efficiency through a simulation study and apply our procedure to an AIDS dataset to examine the effect of previous antiretroviral therapy on survival.  相似文献   

15.
Generalized linear models are commonly used to analyze categorical data such as binary, count, and ordinal outcomes. Adjusting for important prognostic factors or baseline covariates in generalized linear models may improve the estimation efficiency. The model‐based mean for a treatment group produced by most software packages estimates the response at the mean covariate, not the mean response for this treatment group for the studied population. Although this is not an issue for linear models, the model‐based group mean estimates in generalized linear models could be seriously biased for the true group means. We propose a new method to estimate the group mean consistently with the corresponding variance estimation. Simulation showed the proposed method produces an unbiased estimator for the group means and provided the correct coverage probability. The proposed method was applied to analyze hypoglycemia data from clinical trials in diabetes. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

16.
The case-cohort study design is widely used to reduce cost when collecting expensive covariates in large cohort studies with survival or competing risks outcomes. A case-cohort study dataset consists of two parts: (a) a random sample and (b) all cases or failures from a specific cause of interest. Clinicians often assess covariate effects on competing risks outcomes. The proportional subdistribution hazards model directly evaluates the effect of a covariate on the cumulative incidence function under the non-covariate-dependent censoring assumption for the full cohort study. However, the non-covariate-dependent censoring assumption is often violated in many biomedical studies. In this article, we propose a proportional subdistribution hazards model for case-cohort studies with stratified data with covariate-adjusted censoring weight. We further propose an efficient estimator when extra information from the other causes is available under case-cohort studies. The proposed estimators are shown to be consistent and asymptotically normal. Simulation studies show (a) the proposed estimator is unbiased when the censoring distribution depends on covariates and (b) the proposed efficient estimator gains estimation efficiency when using extra information from the other causes. We analyze a bone marrow transplant dataset and a coronary heart disease dataset using the proposed method.  相似文献   

17.
We consider Markov-switching regression models, i.e. models for time series regression analyses where the functional relationship between covariates and response is subject to regime switching controlled by an unobservable Markov chain. Building on the powerful hidden Markov model machinery and the methods for penalized B-splines routinely used in regression analyses, we develop a framework for nonparametrically estimating the functional form of the effect of the covariates in such a regression model, assuming an additive structure of the predictor. The resulting class of Markov-switching generalized additive models is immensely flexible, and contains as special cases the common parametric Markov-switching regression models and also generalized additive and generalized linear models. The feasibility of the suggested maximum penalized likelihood approach is demonstrated by simulation. We further illustrate the approach using two real data applications, modelling (i) how sales data depend on advertising spending and (ii) how energy price in Spain depends on the Euro/Dollar exchange rate.  相似文献   

18.
Abstract.  We study a binary regression model using the complementary log–log link, where the response variable Δ is the indicator of an event of interest (for example, the incidence of cancer, or the detection of a tumour) and the set of covariates can be partitioned as ( X ,  Z ) where Z (real valued) is the primary covariate and X (vector valued) denotes a set of control variables. The conditional probability of the event of interest is assumed to be monotonic in Z , for every fixed X . A finite-dimensional (regression) parameter β describes the effect of X . We show that the baseline conditional probability function (corresponding to X  =  0 ) can be estimated by isotonic regression procedures and develop an asymptotically pivotal likelihood-ratio-based method for constructing (asymptotic) confidence sets for the regression function. We also show how likelihood-ratio-based confidence intervals for the regression parameter can be constructed using the chi-square distribution. An interesting connection to the Cox proportional hazards model under current status censoring emerges. We present simulation results to illustrate the theory and apply our results to a data set involving lung tumour incidence in mice.  相似文献   

19.
Hierarchical models enable the encoding of a variety of parametric structures. However, when presented with a large number of covariates upon which some component of a model hierarchy depends, the modeller may be unwilling or unable to specify a form for that dependence. Data-mining methods are designed to automatically discover relationships between many covariates and a response surface, easily accommodating non-linearities and higher-order interactions. We present a method of wrapping hierarchical models around data-mining methods, preserving the best qualities of the two paradigms. We fit the resulting semi-parametric models using an approximate Gibbs sampler called HEBBRU. Using a simulated dataset, we show that HEBBRU is useful for exploratory analysis and displays excellent predictive accuracy. Finally, we apply HEBBRU to an ornithological dataset drawn from the eBird database.  相似文献   

20.
When confronted with multiple covariates and a response variable, analysts sometimes apply a variable‐selection algorithm to the covariate‐response data to identify a subset of covariates potentially associated with the response, and then wish to make inferences about parameters in a model for the marginal association between the selected covariates and the response. If an independent data set were available, the parameters of interest could be estimated by using standard inference methods to fit the postulated marginal model to the independent data set. However, when applied to the same data set used by the variable selector, standard (“naive”) methods can lead to distorted inferences. The authors develop testing and interval estimation methods for parameters reflecting the marginal association between the selected covariates and response variable, based on the same data set used for variable selection. They provide theoretical justification for the proposed methods, present results to guide their implementation, and use simulations to assess and compare their performance to a sample‐splitting approach. The methods are illustrated with data from a recent AIDS study. The Canadian Journal of Statistics 37: 625–644; 2009 © 2009 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号