首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

2.
The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method can be useful when we can make first and second moment assumptions only; in addition, it can be helpful when the EM algorithm applied to the actual likelihood becomes overly complicated. First we derive a loss function for the observed data using an exponential family density which has the same mean and variance structure of the complete data. This loss function is the counterpart of the quasi-deviance for the observed data. Then the loss function is minimized using the EM algorithm. The use of the EM algorithm guarantees a decrease in the loss function at every iteration. When the observed data can be expressed as a deterministic linear transformation of the complete data, or when data are missing completely at random, the proposed method yields consistent estimators. Examples are given for overdispersed polytomous data, linear random effects models, and linear regression with missing covariates. Simulation results for the linear regression model with missing covariates show that the proposed estimates are more efficient than estimates based on completely observed units, even when outcomes are bimodal or skewed.  相似文献   

3.
Covariate data were missing when a semiparametric regression model was used to study bird abundance in the Mai Po Sanctuary, Hong Kong. This paper proposes an EM‐type algorithm to estimate the regression parameters for that study. Analytical calculation of the expectation in the EM method is difficult, or even impossible, especially when missing covariates are continuous. A Monte Carlo method is used in the EM algorithm to ease the calculation complexity. Asymptotic variances of the parameter estimates are also derived. Properties of the proposed estimators are assessed through numerical simulations and a real example.  相似文献   

4.
For capture–recapture models when covariates are subject to measurement errors and missing data, a set of estimating equations is constructed to estimate population size and relevant parameters. These estimating equations can be solved by an algorithm similar to the EM algorithm. The proposed method is also applicable to the situation when covariates with no measurement errors have missing data. Simulation studies are used to assess the performance of the proposed estimator. The estimator is also applied to a capture–recapture experiment on the bird species Prinia flaviventris in Hong Kong. The Canadian Journal of Statistics 37: 645–658; 2009 © 2009 Statistical Society of Canada  相似文献   

5.
Longitudinal data analysis in epidemiological settings is complicated by large multiplicities of short time series and the occurrence of missing observations. To handle such difficulties Rosner & Muñoz (1988) developed a weighted non-linear least squares algorithm for estimating parameters for first-order autoregressive (AR1) processes with time-varying covariates. This method proved efficient when compared to complete case procedures. Here that work is extended by (1) introducing a different estimation procedure based on the EM algorithm, and (2) formulating estimation techniques for second-order autoregressive models. The second development is important because some of the intended areas of application (adult pulmonary function decline, childhood blood pressure) have autocorrelation functions which decay more slowly than the geometric rate imposed by an AR1 model. Simulation studies are used to compare the three methodologies (non-linear, EM based and complete case) with respect to bias, efficiency and coverage both in the presence and in the absence of time-varying covariates. Differing degrees and mechanisms of missingness are examined. Preliminary results indicate the non-linear approach to be the method of choice: it has high efficiency and is easily implemented. An illustrative example concerning pulmonary function decline in the Netherlands is analyzed using this method.  相似文献   

6.
Ibrahim (1990) used the EM-algorithm to obtain maximum likelihood estimates of the regression parameters in generalized linear models with partially missing covariates. The technique was termed EM by the method of weights. In this paper, we generalize this technique to Cox regression analysis with missing values in the covariates. We specify a full model letting the unobserved covariate values be random and then maximize the observed likelihood. The asymptotic covariance matrix is estimated by the inverse information matrix. The missing data are allowed to be missing at random but also the non-ignorable non-response situation may in principle be considered. Simulation studies indicate that the proposed method is more efficient than the method suggested by Paik & Tsai (1997). We apply the procedure to a clinical trials example with six covariates with three of them having missing values.  相似文献   

7.
A common occurrence in clinical trials with a survival end point is missing covariate data. With ignorably missing covariate data, Lipsitz and Ibrahim proposed a set of estimating equations to estimate the parameters of Cox's proportional hazards model. They proposed to obtain parameter estimates via a Monte Carlo EM algorithm. We extend those results to non-ignorably missing covariate data. We present a clinical trials example with three partially observed laboratory markers which are used as covariates to predict survival.  相似文献   

8.
Missing covariates data is a common issue in generalized linear models (GLMs). A model-based procedure arising from properly specifying joint models for both the partially observed covariates and the corresponding missing indicator variables represents a sound and flexible methodology, which lends itself to maximum likelihood estimation as the likelihood function is available in computable form. In this paper, a novel model-based methodology is proposed for the regression analysis of GLMs when the partially observed covariates are categorical. Pair-copula constructions are used as graphical tools in order to facilitate the specification of the high-dimensional probability distributions of the underlying missingness components. The model parameters are estimated by maximizing the weighted log-likelihood function by using an EM algorithm. In order to compare the performance of the proposed methodology with other well-established approaches, which include complete-cases and multiple imputation, several simulation experiments of Binomial, Poisson and Normal regressions are carried out under both missing at random and non-missing at random mechanisms scenarios. The methods are illustrated by modeling data from a stage III melanoma clinical trial. The results show that the methodology is rather robust and flexible, representing a competitive alternative to traditional techniques.  相似文献   

9.
The EM algorithm is often used for finding the maximum likelihood estimates in generalized linear models with incomplete data. In this article, the author presents a robust method in the framework of the maximum likelihood estimation for fitting generalized linear models when nonignorable covariates are missing. His robust approach is useful for downweighting any influential observations when estimating the model parameters. To avoid computational problems involving irreducibly high‐dimensional integrals, he adopts a Metropolis‐Hastings algorithm based on a Markov chain sampling method. He carries out simulations to investigate the behaviour of the robust estimates in the presence of outliers and missing covariates; furthermore, he compares these estimates to the classical maximum likelihood estimates. Finally, he illustrates his approach using data on the occurrence of delirium in patients operated on for abdominal aortic aneurysm.  相似文献   

10.
Incomplete covariate data is a common occurrence in many studies in which the outcome is survival time. With generalized linear models, when the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM by the method of weights proposed in Ibrahim (1990). In this article, we extend the EM by the method of weights to survival outcomes whose distributions may not fall in the class of generalized linear models. This method requires the estimation of the parameters of the distribution of the covariates. We present a clinical trials example with five covariates, four of which have some missing values.  相似文献   

11.
In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology.  相似文献   

12.
The lognormal distribution is quite commonly used as a lifetime distribution. Data arising from life-testing and reliability studies are often left truncated and right censored. Here, the EM algorithm is used to estimate the parameters of the lognormal model based on left truncated and right censored data. The maximization step of the algorithm is carried out by two alternative methods, with one involving approximation using Taylor series expansion (leading to approximate maximum likelihood estimate) and the other based on the EM gradient algorithm (Lange, 1995). These two methods are compared based on Monte Carlo simulations. The Fisher scoring method for obtaining the maximum likelihood estimates shows a problem of convergence under this setup, except when the truncation percentage is small. The asymptotic variance-covariance matrix of the MLEs is derived by using the missing information principle (Louis, 1982), and then the asymptotic confidence intervals for scale and shape parameters are obtained and compared with corresponding bootstrap confidence intervals. Finally, some numerical examples are given to illustrate all the methods of inference developed here.  相似文献   

13.
In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence.  相似文献   

14.
In an attempt to provide a statistical tool for disease screening and prediction, we propose a semiparametric approach to analysis of the Cox proportional hazards cure model in situations where the observations on the event time are subject to right censoring and some covariates are missing not at random. To facilitate the methodological development, we begin with semiparametric maximum likelihood estimation (SPMLE) assuming that the (conditional) distribution of the missing covariates is known. A variant of the EM algorithm is used to compute the estimator. We then adapt the SPMLE to a more practical situation where the distribution is unknown and there is a consistent estimator based on available information. We establish the consistency and weak convergence of the resulting pseudo-SPMLE, and identify a suitable variance estimator. The application of our inference procedure to disease screening and prediction is illustrated via empirical studies. The proposed approach is used to analyze the tuberculosis screening study data that motivated this research. Its finite-sample performance is examined by simulation.  相似文献   

15.
This paper deals with the regression analysis of failure time data when there are censoring and multiple types of failures. We propose a semiparametric generalization of a parametric mixture model of Larson & Dinse (1985), for which the marginal probabilities of the various failure types are logistic functions of the covariates. Given the type of failure, the conditional distribution of the time to failure follows a proportional hazards model. A marginal like lihood approach to estimating regression parameters is suggested, whereby the baseline hazard functions are eliminated as nuisance parameters. The Monte Carlo method is used to approximate the marginal likelihood; the resulting function is maximized easily using existing software. Some guidelines for choosing the number of Monte Carlo replications are given. Fixing the regression parameters at their estimated values, the full likelihood is maximized via an EM algorithm to estimate the baseline survivor functions. The methods suggested are illustrated using the Stanford heart transplant data.  相似文献   

16.
This article proposes a Bayesian approach, which can simultaneously obtain the Bayesian estimates of unknown parameters and random effects, to analyze nonlinear reproductive dispersion mixed models (NRDMMs) for longitudinal data with nonignorable missing covariates and responses. The logistic regression model is employed to model the missing data mechanisms for missing covariates and responses. A hybrid sampling procedure combining the Gibber sampler and the Metropolis-Hastings algorithm is presented to draw observations from the conditional distributions. Because missing data mechanism is not testable, we develop the logarithm of the pseudo-marginal likelihood, deviance information criterion, the Bayes factor, and the pseudo-Bayes factor to compare several competing missing data mechanism models in the current considered NRDMMs with nonignorable missing covaraites and responses. Three simulation studies and a real example taken from the paediatric AIDS clinical trial group ACTG are used to illustrate the proposed methodologies. Empirical results show that our proposed methods are effective in selecting missing data mechanism models.  相似文献   

17.
Based on hybrid censored data, the problem of making statistical inference on parameters of a two parameter Burr Type XII distribution is taken up. The maximum likelihood estimates are developed for the unknown parameters using the EM algorithm. Fisher information matrix is obtained by applying missing value principle and is further utilized for constructing the approximate confidence intervals. Some Bayes estimates and the corresponding highest posterior density intervals of the unknown parameters are also obtained. Lindley’s approximation method and a Markov Chain Monte Carlo (MCMC) technique have been applied to evaluate these Bayes estimates. Further, MCMC samples are utilized to construct the highest posterior density intervals as well. A numerical comparison is made between proposed estimates in terms of their mean square error values and comments are given. Finally, two data sets are analyzed using proposed methods.  相似文献   

18.
We consider the problem of full information maximum likelihood (FIML) estimation in factor analysis when a majority of the data values are missing. The expectation–maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on manifest variables are included in complete data. However, the ordinary EM algorithm has an extremely high computational cost. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on manifest variables as a part of complete data. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes.  相似文献   

19.
We consider statistical inference of unknown parameters in estimating equations (EEs) when some covariates have nonignorably missing values, which is quite common in practice but has rarely been discussed in the literature. When an instrument, a fully observed covariate vector that helps identifying parameters under nonignorable missingness, is available, the conditional distribution of the missing covariates given other covariates can be estimated by the pseudolikelihood method of Zhao and Shao [(2015), ‘Semiparametric pseudo likelihoods in generalised linear models with nonignorable missing data’, Journal of the American Statistical Association, 110, 1577–1590)] and be used to construct unbiased EEs. These modified EEs then constitute a basis for valid inference by empirical likelihood. Our method is applicable to a wide range of EEs used in practice. It is semiparametric since no parametric model for the propensity of missing covariate data is assumed. Asymptotic properties of the proposed estimator and the empirical likelihood ratio test statistic are derived. Some simulation results and a real data analysis are presented for illustration.  相似文献   

20.
ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号