首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Covariate informed product partition models incorporate the intuitively appealing notion that individuals or units with similar covariate values a priori have a higher probability of co-clustering than those with dissimilar covariate values. These methods have been shown to perform well if the number of covariates is relatively small. However, as the number of covariates increase, their influence on partition probabilities overwhelm any information the response may provide in clustering and often encourage partitions with either a large number of singleton clusters or one large cluster resulting in poor model fit and poor out-of-sample prediction. This same phenomenon is observed in Bayesian nonparametric regression methods that induce a conditional distribution for the response given covariates through a joint model. In light of this, we propose two methods that calibrate the covariate-dependent partition model by capping the influence that covariates have on partition probabilities. We demonstrate the new methods’ utility using simulation and two publicly available datasets.  相似文献   

We present a new experimental design procedure that divides a set of experimental units into two groups in order to minimize error in estimating a treatment effect. One concern is the elimination of large covariate imbalance between the two groups before the experiment begins. Another concern is robustness of the design to misspecification in response models. We address both concerns in our proposed design: we first place subjects into pairs using optimal nonbipartite matching, making our estimator robust to complicated nonlinear response models. Our innovation is to keep the matched pairs extant, take differences of the covariate values within each matched pair, and then use the greedy switching heuristic of Krieger et al. (2019) or rerandomization on these differences. This latter step greatly reduces covariate imbalance. Furthermore, our resultant designs are shown to be nearly as random as matching, which is robust to unobserved covariates. When compared to previous designs, our approach exhibits significant improvement in the mean squared error of the treatment effect estimator when the response model is nonlinear and performs at least as well when the response model is linear. Our design procedure can be found as a method in the open source R package available on CRAN called GreedyExperimentalDesign .  相似文献   

With competing risks data, one often needs to assess the treatment and covariate effects on the cumulative incidence function. Fine and Gray proposed a proportional hazards regression model for the subdistribution of a competing risk with the assumption that the censoring distribution and the covariates are independent. Covariate‐dependent censoring sometimes occurs in medical studies. In this paper, we study the proportional hazards regression model for the subdistribution of a competing risk with proper adjustments for covariate‐dependent censoring. We consider a covariate‐adjusted weight function by fitting the Cox model for the censoring distribution and using the predictive probability for each individual. Our simulation study shows that the covariate‐adjusted weight estimator is basically unbiased when the censoring time depends on the covariates, and the covariate‐adjusted weight approach works well for the variance estimator as well. We illustrate our methods with bone marrow transplant data from the Center for International Blood and Marrow Transplant Research. Here, cancer relapse and death in complete remission are two competing risks.  相似文献   

Response adaptive randomization (RAR) methods for clinical trials are susceptible to imbalance in the distribution of influential covariates across treatment arms. This can make the interpretation of trial results difficult, because observed differences between treatment groups may be a function of the covariates and not necessarily because of the treatments themselves. We propose a method for balancing the distribution of covariate strata across treatment arms within RAR. The method uses odds ratios to modify global RAR probabilities to obtain stratum‐specific modified RAR probabilities. We provide illustrative examples and a simple simulation study to demonstrate the effectiveness of the strategy for maintaining covariate balance. The proposed method is straightforward to implement and applicable to any type of RAR method or outcome.  相似文献   

Identifying the risk factors for comorbidity is important in psychiatric research. Empirically, studies have shown that testing multiple, correlated traits simultaneously is more powerful than testing a single trait at a time in association analysis. Furthermore, for complex diseases, especially mental illnesses and behavioral disorders, the traits are often recorded in different scales such as dichotomous, ordinal and quantitative. In the absence of covariates, nonparametric association tests have been developed for multiple complex traits to study comorbidity. However, genetic studies generally contain measurements of some covariates that may affect the relationship between the risk factors of major interest (such as genes) and the outcomes. While it is relatively easy to adjust these covariates in a parametric model for quantitative traits, it is challenging for multiple complex traits with possibly different scales. In this article, we propose a nonparametric test for multiple complex traits that can adjust for covariate effects. The test aims to achieve an optimal scheme of adjustment by using a maximum statistic calculated from multiple adjusted test statistics. We derive the asymptotic null distribution of the maximum test statistic, and also propose a resampling approach, both of which can be used to assess the significance of our test. Simulations are conducted to compare the type I error and power of the nonparametric adjusted test to the unadjusted test and other existing adjusted tests. The empirical results suggest that our proposed test increases the power through adjustment for covariates when there exist environmental effects, and is more robust to model misspecifications than some existing parametric adjusted tests. We further demonstrate the advantage of our test by analyzing a data set on genetics of alcoholism.  相似文献   

We propose methods for Bayesian inference for missing covariate data with a novel class of semi-parametric survival models with a cure fraction. We allow the missing covariates to be either categorical or continuous and specify a parametric distribution for the covariates that is written as a sequence of one dimensional conditional distributions. We assume that the missing covariates are missing at random (MAR) throughout. We propose an informative class of joint prior distributions for the regression coefficients and the parameters arising from the covariate distributions. The proposed class of priors are shown to be useful in recovering information on the missing covariates especially in situations where the missing data fraction is large. Properties of the proposed prior and resulting posterior distributions are examined. Also, model checking techniques are proposed for sensitivity analyses and for checking the goodness of fit of a particular model. Specifically, we extend the Conditional Predictive Ordinate (CPO) statistic to assess goodness of fit in the presence of missing covariate data. Computational techniques using the Gibbs sampler are implemented. A real data set involving a melanoma cancer clinical trial is examined to demonstrate the methodology.  相似文献   

In this contribution we aim at improving ordinal variable selection in the context of causal models for credit risk estimation. In this regard, we propose an approach that provides a formal inferential tool to compare the explanatory power of each covariate and, therefore, to select an effective model for classification purposes. Our proposed model is Bayesian nonparametric thus keeps the amount of model specification to a minimum. We consider the case in which information from the covariates is at the ordinal level. A noticeable instance of this regards the situation in which ordinal variables result from rankings of companies that are to be evaluated according to different macro and micro economic aspects, leading to ordinal covariates that correspond to various ratings, that entail different magnitudes of the probability of default. For each given covariate, we suggest to partition the statistical units in as many groups as the number of observed levels of the covariate. We then assume individual defaults to be homogeneous within each group and heterogeneous across groups. Our aim is to compare and, therefore select, the partition structures resulting from the consideration of different explanatory covariates. The metric we choose for variable comparison is the calculation of the posterior probability of each partition. The application of our proposal to a European credit risk database shows that it performs well, leading to a coherent and clear method for variable averaging of the estimated default probabilities.  相似文献   

The case-cohort study design is widely used to reduce cost when collecting expensive covariates in large cohort studies with survival or competing risks outcomes. A case-cohort study dataset consists of two parts: (a) a random sample and (b) all cases or failures from a specific cause of interest. Clinicians often assess covariate effects on competing risks outcomes. The proportional subdistribution hazards model directly evaluates the effect of a covariate on the cumulative incidence function under the non-covariate-dependent censoring assumption for the full cohort study. However, the non-covariate-dependent censoring assumption is often violated in many biomedical studies. In this article, we propose a proportional subdistribution hazards model for case-cohort studies with stratified data with covariate-adjusted censoring weight. We further propose an efficient estimator when extra information from the other causes is available under case-cohort studies. The proposed estimators are shown to be consistent and asymptotically normal. Simulation studies show (a) the proposed estimator is unbiased when the censoring distribution depends on covariates and (b) the proposed efficient estimator gains estimation efficiency when using extra information from the other causes. We analyze a bone marrow transplant dataset and a coronary heart disease dataset using the proposed method.  相似文献   

We extend four tests common in classical regression – Wald, score, likelihood ratio and F tests – to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.  相似文献   

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

Odile Pons 《Statistics》2013,47(4):273-293
A semi-Markov model with covariates is proposed for a multi-state process with a finite number of states such that the transition probabilities between the states and the distribution functions of the duration times between the occurrence of two states depend on a discrete covariate. The hazard rates for the time elapsed between two successive states depend on the covariate through a proportional hazards model involving a set of regression parameters, while the transition probabilities depend on the covariate in an unspecified way. We propose estimators for these parameters and for the cumulative hazard functions of the sojourn times. A difficulty comes from the fact that when a sojourn time in a state is right-censored, the next state is unknown. We prove that our estimators are consistent and asymptotically Gaussian under the model constraints.  相似文献   

 在纵向数据研究中,混合效应模型的随机误差通常采用正态性假设。而诸如病毒载量和CD4细胞数目等病毒性数据通常呈现偏斜性,因此正态性假设可能影响模型结果甚至导致错误的结论。在HIV动力学研究中,病毒响应值往往与协变量相关,且协变量的测量值通常存在误差,为此论文中联立协变量过程建立具有偏正态分布的非线性混合效应联合模型,并用贝叶斯推断方法估计模型的参数。由于协变量能够解释个体内的部分变化,因此协变量过程的模型选择对病毒载量的拟合效果有重要的影响。该文提出了一次移动平均模型作为协变量过程的改进模型,比较后发现当协变量采用移动平均模型时,病毒载量模型的拟合效果更好。该结果对协变量模型的研究具有重要的指导意义。  相似文献   

The accelerated hazard model in survival analysis assumes that the covariate effect acts the time scale of the baseline hazard rate. In this paper, we study the stochastic properties of the mixed accelerated hazard model since the covariate is considered basically unobservable. We build dependence structure between the population variable and the covariate, and also present some preservation properties. Using some well-known stochastic orders, we compare two mixed accelerated hazards models arising out of different choices of distributions for unobservable covariates or different baseline hazard rate functions.  相似文献   

Suppose that data are generated according to the model f ( y | x ; θ ) g ( x ), where y is a response and x are covariates. We derive and compare semiparametric likelihood and pseudolikelihood methods for estimating θ for situations in which units generated are not fully observed and in which it is impossible or undesirable to model the covariate distribution. The probability that a unit is fully observed may depend on y , and there may be a subset of covariates which is observed only for a subsample of individuals. Our key assumptions are that the probability that a unit has missing data depends only on which of a finite number of strata that ( y , x ) belongs to and that the stratum membership is observed for every unit. Applications include case–control studies in epidemiology, field reliability studies and broad classes of missing data and measurement error problems. Our results make fully efficient estimation of θ feasible, and they generalize and provide insight into a variety of methods that have been proposed for specific problems.  相似文献   

In the parametric regression model, the covariate missing problem under missing at random is considered. It is often desirable to use flexible parametric or semiparametric models for the covariate distribution, which can reduce a potential misspecification problem. Recently, a completely nonparametric approach was developed by [H.Y. Chen, Nonparametric and semiparametric models for missing covariates in parameter regression, J. Amer. Statist. Assoc. 99 (2004), pp. 1176–1189; Z. Zhang and H.E. Rockette, On maximum likelihood estimation in parametric regression with missing covariates, J. Statist. Plann. Inference 47 (2005), pp. 206–223]. Although it does not require a model for the covariate distribution or the missing data mechanism, the proposed method assumes that the covariate distribution is supported only by observed values. Consequently, their estimator is a restricted maximum likelihood estimator (MLE) rather than the global MLE. In this article, we show the restricted semiparametric MLE could be very misleading in some cases. We discuss why this problem occurs and suggest an algorithm to obtain the global MLE. Then, we assess the performance of the proposed method via some simulation experiments.  相似文献   

Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semi-parametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.  相似文献   

A. Galbete  J.A. Moler 《Statistics》2016,50(2):418-434
In a randomized clinical trial, response-adaptive randomization procedures use the information gathered, including the previous patients' responses, to allocate the next patient. In this setting, we consider randomization-based inference. We provide an algorithm to obtain exact p-values for statistical tests that compare two treatments with dichotomous responses. This algorithm can be applied to a family of response adaptive randomization procedures which share the following property: the distribution of the allocation rule depends only on the imbalance between treatments and on the imbalance between successes for treatments 1 and 2 in the previous step. This family includes some outstanding response adaptive randomization procedures. We study a randomization test to contrast the null hypothesis of equivalence of treatments and we show that this test has a similar performance to that of its parametric counterpart. Besides, we study the effect of a covariate in the inferential process. First, we obtain a parametric test, constructed assuming a logit model which relates responses to treatments and covariate levels, and we give conditions that guarantee its asymptotic normality. Finally, we show that the randomization test, which is free of model specification, performs as well as the parametric test that takes the covariate into account.  相似文献   

We introduce multicovariate-adjusted regression (MCAR), an adjustment method for regression analysis, where both the response (Y) and predictors (X 1, …, X p ) are not directly observed. The available data have been contaminated by unknown functions of a set of observable distorting covariates, Z 1, …, Z s , in a multiplicative fashion. The proposed method substantially extends the current contaminated regression modelling capability, by allowing for multiple distorting covariate effects. MCAR is a flexible generalisation of the recently proposed covariate-adjusted regression method, an effective adjustment method in the presence of a single covariate, Z. For MCAR estimation, we establish a connection between the MCAR models and adaptive varying coefficient models. This connection leads to an adaptation of a hybrid backfitting estimation algorithm. Extensive simulations are used to study the performance and limitations of the proposed iterative estimation algorithm. In particular, the bias and mean square error of the proposed MCAR estimators are examined, relative to a baseline and a consistent benchmark estimator. The method is also illustrated with a Pima Indian diabetes data set, where the response and predictors are potentially contaminated by body mass index and triceps skin fold thickness. Both distorting covariates measure aspects of obesity, an important risk factor in type 2 diabetes.  相似文献   

In observational studies, unbalanced observed covariates between treatment groups often cause biased inferences on the estimation of treatment effects. Recently, generalized propensity score (GPS) has been proposed to overcome this problem; however, a practical technique to apply the GPS is lacking. This study demonstrates how clustering algorithms can be used to group similar subjects based on transformed GPS. We compare four popular clustering algorithms: k-means clustering (KMC), model-based clustering, fuzzy c-means clustering and partitioning around medoids based on the following three criteria: average dissimilarity between subjects within clusters, average Dunn index and average silhouette width under four various covariate scenarios. Simulation studies show that the KMC algorithm has overall better performance compared with the other three clustering algorithms. Therefore, we recommend using the KMC algorithm to group similar subjects based on the transformed GPS.  相似文献   

When confronted with multiple covariates and a response variable, analysts sometimes apply a variable‐selection algorithm to the covariate‐response data to identify a subset of covariates potentially associated with the response, and then wish to make inferences about parameters in a model for the marginal association between the selected covariates and the response. If an independent data set were available, the parameters of interest could be estimated by using standard inference methods to fit the postulated marginal model to the independent data set. However, when applied to the same data set used by the variable selector, standard (“naive”) methods can lead to distorted inferences. The authors develop testing and interval estimation methods for parameters reflecting the marginal association between the selected covariates and response variable, based on the same data set used for variable selection. They provide theoretical justification for the proposed methods, present results to guide their implementation, and use simulations to assess and compare their performance to a sample‐splitting approach. The methods are illustrated with data from a recent AIDS study. The Canadian Journal of Statistics 37: 625–644; 2009 © 2009 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号