首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒

Weighted distributions, as an example of informative sampling, work appropriately under the missing at random mechanism since they neglect missing values and only completely observed subjects are used in the study plan. However, length-biased distributions, as a special case of weighted distributions, remove the subjects with short length deliberately, which surely meet the missing not at random mechanism. Accordingly, applying length-biased distributions jeopardizes the results by producing biased estimates. Hence, an alternate method has to be used such that the results are improved by means of valid inferences. We propose methods that are based on weighted distributions and joint modelling procedure and compare them in analysing longitudinal data. After introducing three methods in use, a set of simulation studies and analysis of two real longitudinal datasets affirm our claim.  相似文献   

Joint modeling of associated mixed biomarkers in longitudinal studies leads to a better clinical decision by improving the efficiency of parameter estimates. In many clinical studies, the observed time for two biomarkers may not be equivalent and one of the longitudinal responses may have recorded in a longer time than the other one. In addition, the response variables may have different missing patterns. In this paper, we propose a new joint model of associated continuous and binary responses by accounting different missing patterns for two longitudinal outcomes. A conditional model for joint modeling of the two responses is used and two shared random effects models are considered for intermittent missingness of two responses. A Bayesian approach using Markov Chain Monte Carlo (MCMC) is adopted for parameter estimation and model implementation. The validation and performance of the proposed model are investigated using some simulation studies. The proposed model is also applied for analyzing a real data set of bariatric surgery.  相似文献   

Randomized response is an interview technique designed to eliminate response bias when sensitive questions are asked. In this paper, we present a logistic regression model on randomized response data when the covariates on some subjects are missing at random. In particular, we propose Horvitz and Thompson (1952)-type weighted estimators by using different estimates of the selection probabilities. We present large sample theory for the proposed estimators and show that they are more efficient than the estimator using the true selection probabilities. Simulation results support theoretical analysis. We also illustrate the approach using data from a survey of cable TV.  相似文献   


In this article, inflation at an arbitrary point β of a member of power series exponential family and mean-inflation as a cause of having semi-continuous distribution are discussed. Also, a joint modeling of such a semi-continuous response and β-inflated Poisson response is presented. Simultaneous effects of covariates on both responses, which have two-component mixture distributions, are investigated. To find the parameter estimates, the maximum likelihood approach is used. The proposed model is illustrated on some simulation studies and applied to a real survey dataset.  相似文献   

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

于力超  金勇进 《统计研究》2016,33(1):95-102
抽样调查领域常采用对多个受访者进行跟踪调查得到面板数据,进而对总体特性进行统计推断,在面板数据中常含缺失数据,大多数处理面板缺失数据的软件都是直接删去含缺失值的受访者以得到完全数据集,当数据缺失机制为非随机缺失时会导致总体参数估计结果有偏。本文针对数据缺失机制为非随机缺失情形下,如何对面板数据进行统计分析进行了阐述,主要采用的是基于模型的似然推断法,对目标变量、缺失指示变量和随机效应向量的联合分布建模,在已有选择模型和模式混合模型的基础上,引入随机效应,研究目标变量期望的计算方法,并研究随机效应杂合模型下参数的估计方法,在变量分布相对简单的情形下给出了用极大似然法推断总体参数的估计步骤,最后通过模拟分析比较方法的优劣。  相似文献   

This article proposes a Bayesian approach, which can simultaneously obtain the Bayesian estimates of unknown parameters and random effects, to analyze nonlinear reproductive dispersion mixed models (NRDMMs) for longitudinal data with nonignorable missing covariates and responses. The logistic regression model is employed to model the missing data mechanisms for missing covariates and responses. A hybrid sampling procedure combining the Gibber sampler and the Metropolis-Hastings algorithm is presented to draw observations from the conditional distributions. Because missing data mechanism is not testable, we develop the logarithm of the pseudo-marginal likelihood, deviance information criterion, the Bayes factor, and the pseudo-Bayes factor to compare several competing missing data mechanism models in the current considered NRDMMs with nonignorable missing covaraites and responses. Three simulation studies and a real example taken from the paediatric AIDS clinical trial group ACTG are used to illustrate the proposed methodologies. Empirical results show that our proposed methods are effective in selecting missing data mechanism models.  相似文献   

Using a multivariate latent variable approach, this article proposes some new general models to analyze the correlated bounded continuous and categorical (nominal or/and ordinal) responses with and without non-ignorable missing values. First, we discuss regression methods for jointly analyzing continuous, nominal, and ordinal responses that we motivated by analyzing data from studies of toxicity development. Second, using the beta and Dirichlet distributions, we extend the models so that some bounded continuous responses are replaced for continuous responses. The joint distribution of the bounded continuous, nominal and ordinal variables is decomposed into a marginal multinomial distribution for the nominal variable and a conditional multivariate joint distribution for the bounded continuous and ordinal variables given the nominal variable. We estimate the regression parameters under the new general location models using the maximum-likelihood method. Sensitivity analysis is also performed to study the influence of small perturbations of the parameters of the missing mechanisms of the model on the maximal normal curvature. The proposed models are applied to two data sets: BMI, Steatosis and Osteoporosis data and Tehran household expenditure budgets.  相似文献   


Handling data with the nonignorably missing mechanism is still a challenging problem in statistics. In this paper, we develop a fully Bayesian adaptive Lasso approach for quantile regression models with nonignorably missing response data, where the nonignorable missingness mechanism is specified by a logistic regression model. The proposed method extends the Bayesian Lasso by allowing different penalization parameters for different regression coefficients. Furthermore, a hybrid algorithm that combined the Gibbs sampler and Metropolis-Hastings algorithm is implemented to simulate the parameters from posterior distributions, mainly including regression coefficients, shrinkage coefficients, parameters in the non-ignorable missing models. Finally, some simulation studies and a real example are used to illustrate the proposed methodology.  相似文献   

Semiparametric models provide a more flexible form for modeling the relationship between the response and the explanatory variables. On the other hand in the literature of modeling for the missing variables, canonical form of the probability of the variable being missing (p) is modeled taking a fully parametric approach. Here we consider a regression spline based semiparametric approach to model the missingness mechanism of nonignorably missing covariates. In this model the relationship between the suitable canonical form of p (e.g. probit p) and the missing covariate is modeled through several splines. A Bayesian procedure is developed to efficiently estimate the parameters. A computationally advantageous prior construction is proposed for the parameters of the semiparametric part. A WinBUGS code is constructed to apply Gibbs sampling to obtain the posterior distributions. We show through an extensive Monte Carlo simulation experiment that response model coefficent estimators maintain better (when the true missingness mechanism is nonlinear) or equivalent (when the true missingness mechanism is linear) bias and efficiency properties with the use of proposed semiparametric missingness model compared to the conventional model.  相似文献   

The family of weighted Poisson distributions offers great flexibility in modeling discrete data due to its potential to capture over/under-dispersion by an appropriate selection of the weight function. In this paper, we introduce a flexible weighted Poisson distribution and further study its properties by using it in the context of cure rate modeling under a competing cause scenario. A special case of the new distribution is the COM-Poisson distribution which in turn encompasses the Bernoulli, Poisson, and geometric distributions; hence, many of the well-studied cure rate models may be seen as special cases of the proposed model. We focus on the estimation, through the maximum likelihood method, of the cured proportion and the properties of the failure time of the susceptibles/non cured individuals; a profile likelihood approach is also adopted for estimating the parameters of the weighted Poisson distribution. A Monte Carlo simulation study demonstrates the accuracy of the proposed inferential method. Finally, as an illustration, we fit the proposed model to a cutaneous melanoma data set.  相似文献   

Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   


A general Bayesian random effects model for analyzing longitudinal mixed correlated continuous and negative binomial responses with and without missing data is presented. This Bayesian model, given some random effects, uses a normal distribution for the continuous response and a negative binomial distribution for the count response. A Markov Chain Monte Carlo sampling algorithm is described for estimating the posterior distribution of the parameters. This Bayesian model is illustrated by a simulation study. For sensitivity analysis to investigate the change of parameter estimates with respect to the perturbation from missing at random to not missing at random assumption, the use of posterior curvature is proposed. The model is applied to a medical data, obtained from an observational study on women, where the correlated responses are the negative binomial response of joint damage and continuous response of body mass index. The simultaneous effects of some covariates on both responses are also investigated.  相似文献   

Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)’s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.  相似文献   

Empirical likelihood for generalized linear models with missing responses   总被引:1,自引:0,他引:1  
The paper uses the empirical likelihood method to study the construction of confidence intervals and regions for regression coefficients and response mean in generalized linear models with missing response. By using the inverse selection probability weighted imputation technique, the proposed empirical likelihood ratios are asymptotically chi-squared. Our approach is to directly calibrate the empirical likelihood ratio, which is called as a bias-correction method. Also, a class of estimators for the parameters of interest is constructed, and the asymptotic distributions of the proposed estimators are obtained. A simulation study indicates that the proposed methods are comparable in terms of coverage probabilities and average lengths/areas of confidence intervals/regions. An example of a real data set is used for illustrating our methods.  相似文献   

It may sometimes be clear from background knowledge that a population under investigation proportionally consists of a known number of subpopulations, whose distributions belong to the same, yet unknown, family. While a parametric family is commonly used in practice, one can also consider some nonparametric families to avoid distributional misspecification. In this article, we propose a solution using a mixture-based nonparametric family for the component distribution in a finite mixture model as opposed to some recent research that utilizes a kernel-based approach. In particular, we present a semiparametric maximum likelihood estimation procedure for the model parameters and tackle the bandwidth parameter selection problem via some popular means for model selection. Empirical comparisons through simulation studies and three real data sets suggest that estimators based on our mixture-based approach are more efficient than those based on the kernel-based approach, in terms of both parameter estimation and overall density estimation.  相似文献   

In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence.  相似文献   


In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology.  相似文献   

Outliers are commonly observed in psychosocial research, generally resulting in biased estimates when comparing group differences using popular mean-based models such as the analysis of variance model. Rank-based methods such as the popular Mann–Whitney–Wilcoxon (MWW) rank sum test are more effective to address such outliers. However, available methods for inference are limited to cross-sectional data and cannot be applied to longitudinal studies under missing data. In this paper, we propose a generalized MWW test for comparing multiple groups with covariates within a longitudinal data setting, by utilizing the functional response models. Inference is based on a class of U-statistics-based weighted generalized estimating equations, providing consistent and asymptotically normal estimates not only under complete but missing data as well. The proposed approach is illustrated with both real and simulated study data.  相似文献   

The tumor burden (TB) process is postulated to be the primary mechanism through which most anticancer treatments provide benefit. In phase II oncology trials, the biologic effects of a therapeutic agent are often analyzed using conventional endpoints for best response, such as objective response rate and progression‐free survival, both of which causes loss of information. On the other hand, graphical methods including spider plot and waterfall plot lack any statistical inference when there is more than one treatment arm. Therefore, longitudinal analysis of TB data is well recognized as a better approach for treatment evaluation. However, longitudinal TB process suffers from informative missingness because of progression or death. We propose to analyze the treatment effect on tumor growth kinetics using a joint modeling framework accounting for the informative missing mechanism. Our approach is illustrated by multisetting simulation studies and an application to a nonsmall‐cell lung cancer data set. The proposed analyses can be performed in early‐phase clinical trials to better characterize treatment effect and thereby inform decision‐making. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号