Continuous-time multi-state models are commonly used to study diseases with multiple stages. Potential risk factors associated with the disease are added to the transition intensities of the model as covariates, but missing covariate measurements arise frequently in practice. We propose a likelihood-based method that deals efficiently with a missing covariate in these models. Our simulation study showed that the method performs well for both “missing completely at random” and “missing at random” mechanisms. We also applied our method to a real dataset, the Einstein Aging Study.  相似文献   

We consider a multinomial distribution in which the cell probabilities are known arbitrary functions of a vector parameter θ. It is desired to estimate θ by least squares. Three variations of the least squares approach are investigated, and each is found to be equivalent, in the very strong sense of being algebraically identical, to one of the following estimation procedures: maximum likelihood, minimum χ2 and minimum modified χ2. Two of these results also apply to the multiple hypergeometric distribution.  相似文献   

Logistic regression plays an important role in many fields. In practice, we often encounter missing covariates in different applied sectors, particularly in biomedical sciences. Ibrahim (1990) proposed a method to handle missing covariates in generalized linear model (GLM) setup. It is well known that logistic regression estimates using small or medium sized missing data are biased. Considering the missing data that are missing at random, in this paper we have reduced the bias by two methods; first we have derived a closed form bias expression using Cox and Snell (1968), and second we have used likelihood based modification similar to Firth (1993). Here we have analytically shown that the Firth type likelihood modification in Ibrahim led to the second order bias reduction. The proposed methods are simple to apply on an existing method, need no analytical work, with the exception of a little change in the optimization function. We have carried out extensive simulation studies comparing the methods, and our simulation results are also supported by a real world data.  相似文献   

This paper is concerned wim ine maximum likelihood estimation and the likelihood ratio test for hierarchical loglinear models of multidimensional contingency tables with missing data. The problems of estimation and test for a high dimensional contingency table can be reduced into those for a class of low dimensional tables. In some cases, the incomplete data in the high dimensional table can become complete in the low dimensional tables through the reduction can indicate how much the incomplete data contribute to the estimation and the test.  相似文献   

Several survival regression models have been developed to assess the effects of covariates on failure times. In various settings, including surveys, clinical trials and epidemiological studies, missing data may often occur due to incomplete covariate data. Most existing methods for lifetime data are based on the assumption of missing at random (MAR) covariates. However, in many substantive applications, it is important to assess the sensitivity of key model inferences to the MAR assumption. The index of sensitivity to non-ignorability (ISNI) is a local sensitivity tool to measure the potential sensitivity of key model parameters to small departures from the ignorability assumption, needless of estimating a complicated non-ignorable model. We extend this sensitivity index to evaluate the impact of a covariate that is potentially missing, not at random in survival analysis, using parametric survival models. The approach will be applied to investigate the impact of missing tumor grade on post-surgical mortality outcomes in individuals with pancreas-head cancer in the Surveillance, Epidemiology, and End Results data set. For patients suffering from cancer, tumor grade is an important risk factor. Many individuals in these data with pancreas-head cancer have missing tumor grade information. Our ISNI analysis shows that the magnitude of effect for most covariates (with significant effect on the survival time distribution), specifically surgery and tumor grade as some important risk factors in cancer studies, highly depends on the missing mechanism assumption of the tumor grade. Also a simulation study is conducted to evaluate the performance of the proposed index in detecting sensitivity of key model parameters.  相似文献   

Ranked set sampling (RSS) design as a cost-effective sampling is a powerful tool in situations where measuring the variable of interest is costly and time-consuming; however, ranking information about sampling units can be obtained easily through inexpensive and easy to measure characteristics at little or no cost. In this paper, we study RSS data for analysis of an ordinal population. First, we compare the problem of non-representative extreme samples under RSS and commonly-used simple random sampling. Using RSS data with tie information, we propose non-parametric and maximum likelihood estimators for population parameters. Through extensive numerical studies, we investigate the effect of various factors including ranking ability, tie generating mechanisms, the number of categories and population setting on the performance of the estimators. Finally, we apply the proposed methods to the bone disorder data to estimate the proportions of patients with osteopenia and osteoporosis status.  相似文献   

Various methods have been suggested in the literature to handle a missing covariate in the presence of surrogate covariates. These methods belong to one of two paradigms. In the imputation paradigm, Pepe and Fleming (1991) and Reilly and Pepe (1995) suggested filling in missing covariates using the empirical distribution of the covariate obtained from the observed data. We can proceed one step further by imputing the missing covariate using nonparametric maximum likelihood estimates (NPMLE) of the density of the covariate. Recently Murphy and Van der Vaart (1998a) showed that such an approach yields a consistent, asymptotically normal, and semiparametric efficient estimate for the logistic regression coefficient. In the weighting paradigm, Zhao and Lipsitz (1992) suggested an estimating function using completely observed records after weighting inversely by the probability of observation. An extension of this weighting approach designed to achieve semiparametric efficient bound is considered by Robins, Hsieh and Newey (RHN) (1995). The two ends of each paradigm (NPMLE and RHN) attain the efficiency bound and are asymptotically equivalent. However, both require a substantial amount of computation. A question arises whether and when, in practical situations, this extensive computation is worthwhile. In this paper we investigate the performance of single and multiple imputation estimates, weighting estimates, semiparametric efficient estimates, and two new imputation estimates. Simulation studies suggest that the sample size should be substantially large (e.g. n=2000) for NPMLE and RHN to be more efficient than simpler imputation estimates. When the sample size is moderately large (n≤ 1500), simpler imputation estimates have as small a variance as semiparametric efficient estimates.  相似文献   

In this article, based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained, when the responses of linear models are missing at random. It is proved that the proposed estimators are asymptotically normal. In simulation studies and real example, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.  相似文献   

In earlier work, Kirchner [An estimation procedure for the Hawkes process. Quant Financ. 2017;17(4):571–595], we introduced a nonparametric estimation method for the Hawkes point process. In this paper, we present a simulation study that compares this specific nonparametric method to maximum-likelihood estimation. We find that the standard deviations of both estimation methods decrease as power-laws in the sample size. Moreover, the standard deviations are proportional. For example, for a specific Hawkes model, the standard deviation of the branching coefficient estimate is roughly 20% larger than for MLE – over all sample sizes considered. This factor becomes smaller when the true underlying branching coefficient becomes larger. In terms of runtime, our method clearly outperforms MLE. The present bias of our method can be well explained and controlled. As an incidental finding, we see that also MLE estimates seem to be significantly biased when the underlying Hawkes model is near criticality. This asks for a more rigorous analysis of the Hawkes likelihood and its optimization.  相似文献   

The authors show that for balanced data, the estimates of effects of interest and of their standard errors are unaffected when a covariate is removed from a multiplicative Poisson model. As they point out, this is not verified in the analogous linear model, nor in the logistic model. In the first case, only the estimated coefficients remain the same, while in the second case, both the estimated effects and their standard errors can change.  相似文献   

We consider the problem of full information maximum likelihood (FIML) estimation in factor analysis when a majority of the data values are missing. The expectation–maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on manifest variables are included in complete data. However, the ordinary EM algorithm has an extremely high computational cost. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on manifest variables as a part of complete data. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes.  相似文献   

The EM algorithm is often used for finding the maximum likelihood estimates in generalized linear models with incomplete data. In this article, the author presents a robust method in the framework of the maximum likelihood estimation for fitting generalized linear models when nonignorable covariates are missing. His robust approach is useful for downweighting any influential observations when estimating the model parameters. To avoid computational problems involving irreducibly high‐dimensional integrals, he adopts a Metropolis‐Hastings algorithm based on a Markov chain sampling method. He carries out simulations to investigate the behaviour of the robust estimates in the presence of outliers and missing covariates; furthermore, he compares these estimates to the classical maximum likelihood estimates. Finally, he illustrates his approach using data on the occurrence of delirium in patients operated on for abdominal aortic aneurysm.  相似文献   

Parameter dependency within data sets in simulation studies is common, especially in models such as continuous-time Markov chains (CTMCs). Additionally, the literature lacks a comprehensive examination of estimation performance for the likelihood-based general multi-state CTMC. Among studies attempting to assess the estimation, none have accounted for dependency among parameter estimates. The purpose of this research is twofold: (1) to develop a multivariate approach for assessing accuracy and precision for simulation studies (2) to add to the literature a comprehensive examination of the estimation of a general 3-state CTMC model. Simulation studies are conducted to analyze longitudinal data with a trinomial outcome using a CTMC with and without covariates. Measures of performance including bias, component-wise coverage probabilities, and joint coverage probabilities are calculated. An application is presented using Alzheimer's disease caregiver stress levels. Comparisons of joint and component-wise parameter estimates yield conflicting inferential results in simulations from models with and without covariates. In conclusion, caution should be taken when conducting simulation studies aiming to assess performance and choice of inference should properly reflect the purpose of the simulation.  相似文献   

In this paper, a nonlinear model with response variables missing at random is studied. In order to improve the coverage accuracy for model parameters, the empirical likelihood (EL) ratio method is considered. On the complete data, the EL statistic for the parameters and its approximation have a χ2 asymptotic distribution. When the responses are reconstituted using a semi-parametric method, the empirical log-likelihood on the response variables associated with the imputed data is also asymptotically χ2. The Wilks theorem for EL on the parameters, based on reconstituted data, is also satisfied. These results can be used to construct the confidence region for the model parameters and the response variables. It is shown via Monte Carlo simulations that the EL methods outperform the normal approximation-based method in terms of coverage probability for the unknown parameter, including on the reconstituted data. The advantages of the proposed method are exemplified on real data.  相似文献   

In this study, we consider the problem of selecting explanatory variables of fixed effects in linear mixed models under covariate shift, which is when the values of covariates in the model for prediction differ from those in the model for observed data. We construct a variable selection criterion based on the conditional Akaike information introduced by Vaida & Blanchard (2005). We focus especially on covariate shift in small area estimation and demonstrate the usefulness of the proposed criterion. In addition, numerical performance is investigated through simulations, one of which is a design‐based simulation using a real dataset of land prices. The Canadian Journal of Statistics 46: 316–335; 2018 © 2018 Statistical Society of Canada  相似文献   

Summary. The paper considers canonical link generalized linear models with stratum-specific nuisance intercepts and missing covariate data. This family includes the conditional logistic regression model. Existing methods for this problem, each of which uses a conditioning argu- ment to eliminate the nuisance intercept, model either the missing covariate data or the missingness process. The paper compares these methods under a common likelihood framework. The semiparametric efficient estimator is identified, and a new estimator, which reduces dependence on the model for the missing covariate, is proposed. A simulation study compares the methods with respect to efficiency and robustness to model misspecification.  相似文献   

Missing observations in both responses and covariates arise frequently in longitudinal studies. When missing data are missing not at random, inferences under the likelihood framework often require joint modelling of response and covariate processes, as well as missing data processes associated with incompleteness of responses and covariates. Specification of these four joint distributions is a nontrivial issue from the perspectives of both modelling and computation. To get around this problem, we employ pairwise likelihood formulations, which avoid the specification of third or higher order association structures. In this paper, we consider three specific missing data mechanisms which lead to further simplified pairwise likelihood (SPL) formulations. Under these missing data mechanisms, inference methods based on SPL formulations are developed. The resultant estimators are consistent, and enjoy better robustness and computation convenience. The performance is evaluated empirically though simulation studies. Longitudinal data from the National Population Health Survey and Waterloo Smoking Prevention Project are analysed to illustrate the usage of our methods.  相似文献   

Matched case–control designs are commonly used in epidemiological studies for estimating the effect of exposure variables on the risk of a disease by controlling the effect of confounding variables. Due to retrospective nature of the study, information on a covariate could be missing for some subjects. A straightforward application of the conditional logistic likelihood for analyzing matched case–control data with the partially missing covariate may yield inefficient estimators of the parameters. A robust method has been proposed to handle this problem using an estimated conditional score approach when the missingness mechanism does not depend on the disease status. Within the conditional logistic likelihood framework, an empirical procedure is used to estimate the odds of the disease for the subjects with missing covariate values. The asymptotic distribution and the asymptotic variance of the estimator when the matching variables and the completely observed covariates are categorical. The finite sample performance of the proposed estimator is assessed through a simulation study. Finally, the proposed method has been applied to analyze two matched case–control studies. The Canadian Journal of Statistics 38: 680–697; 2010 © 2010 Statistical Society of Canada  相似文献   

Missing data methods, maximum likelihood estimation (MLE) and multiple imputation (MI), for longitudinal questionnaire data were investigated via simulation. Predictive mean matching (PMM) was applied at both item and scale levels, logistic regression at item level and multivariate normal imputation at scale level. We investigated a hybrid approach which is combination of MLE and MI, i.e. scales from the imputed data are eliminated if all underlying items were originally missing. Bias and mean square error (MSE) for parameter estimates were examined. ML seemed to provide occasionally the best results in terms of bias, but hardly ever on MSE. All imputation methods at the scale level and logistic regression at item level hardly ever showed the best performance. The hybrid approach is similar or better than its original MI. The PMM-hybrid approach at item level demonstrated the best MSE for most settings and in some cases also the smallest bias.  相似文献   

