首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Kang (2006) and Kang and Larsen (in press) used the log likelihood function with Lagrangian multipliers for estimation of cell probabilities in two-way incomplete contingency tables. This paper extends results and simulations to three-way and multi-way tables. Numerous studies cross-classify subjects by three or more categorical factors. Constraints on cell probabilities are incorporated through Lagrangian multipliers. Variances of the MLEs are derived from the matrix of second derivatives of the log likelihood with respect to cell probabilities and the Lagrange multiplier. Wald and likelihood ratio tests of independence are derived using the estimates and estimated variances. In simulation results in Kang and Larsen (in press), for data missing at random, maximum likelihood estimation (MLE) produced more efficient estimates of population proportions than either multiple imputation (MI) based on data augmentation or complete case (CC) analysis. Neither MLE nor MI, however, lead to an improvement over CC analysis with respect to power of tests for independence in two-way tables. Results are extended to multidimensional tables with arbitrary patterns of missing data when the variables are recorded on individual subjects. In three-way and higher-way tables, however, there is information relevant for judging independence in partially classified information, as long as two or more variables are jointly observed. Simulations study three-dimensional tables with three patterns of association and two levels of missing information.  相似文献   

2.
Logistic regression plays an important role in many fields. In practice, we often encounter missing covariates in different applied sectors, particularly in biomedical sciences. Ibrahim (1990) proposed a method to handle missing covariates in generalized linear model (GLM) setup. It is well known that logistic regression estimates using small or medium sized missing data are biased. Considering the missing data that are missing at random, in this paper we have reduced the bias by two methods; first we have derived a closed form bias expression using Cox and Snell (1968), and second we have used likelihood based modification similar to Firth (1993). Here we have analytically shown that the Firth type likelihood modification in Ibrahim led to the second order bias reduction. The proposed methods are simple to apply on an existing method, need no analytical work, with the exception of a little change in the optimization function. We have carried out extensive simulation studies comparing the methods, and our simulation results are also supported by a real world data.  相似文献   

3.
We consider statistical inference of unknown parameters in estimating equations (EEs) when some covariates have nonignorably missing values, which is quite common in practice but has rarely been discussed in the literature. When an instrument, a fully observed covariate vector that helps identifying parameters under nonignorable missingness, is available, the conditional distribution of the missing covariates given other covariates can be estimated by the pseudolikelihood method of Zhao and Shao [(2015), ‘Semiparametric pseudo likelihoods in generalised linear models with nonignorable missing data’, Journal of the American Statistical Association, 110, 1577–1590)] and be used to construct unbiased EEs. These modified EEs then constitute a basis for valid inference by empirical likelihood. Our method is applicable to a wide range of EEs used in practice. It is semiparametric since no parametric model for the propensity of missing covariate data is assumed. Asymptotic properties of the proposed estimator and the empirical likelihood ratio test statistic are derived. Some simulation results and a real data analysis are presented for illustration.  相似文献   

4.
We propose a profile conditional likelihood approach to handle missing covariates in the general semiparametric transformation regression model. The method estimates the marginal survival function by the Kaplan-Meier estimator, and then estimates the parameters of the survival model and the covariate distribution from a conditional likelihood, substituting the Kaplan-Meier estimator for the marginal survival function in the conditional likelihood. This method is simpler than full maximum likelihood approaches, and yields consistent and asymptotically normally distributed estimator of the regression parameter when censoring is independent of the covariates. The estimator demonstrates very high relative efficiency in simulations. When compared with complete-case analysis, the proposed estimator can be more efficient when the missing data are missing completely at random and can correct bias when the missing data are missing at random. The potential application of the proposed method to the generalized probit model with missing continuous covariates is also outlined.  相似文献   

5.
Efficient statistical inference on nonignorable missing data is a challenging problem. This paper proposes a new estimation procedure based on composite quantile regression (CQR) for linear regression models with nonignorable missing data, that is applicable even with high-dimensional covariates. A parametric model is assumed for modelling response probability, which is estimated by the empirical likelihood approach. Local identifiability of the proposed strategy is guaranteed on the basis of an instrumental variable approach. A set of data-based adaptive weights constructed via an empirical likelihood method is used to weight CQR functions. The proposed method is resistant to heavy-tailed errors or outliers in the response. An adaptive penalisation method for variable selection is proposed to achieve sparsity with high-dimensional covariates. Limiting distributions of the proposed estimators are derived. Simulation studies are conducted to investigate the finite sample performance of the proposed methodologies. An application to the ACTG 175 data is analysed.  相似文献   

6.
For the Poisson a posterior distribution for the complete sample size, N, is derived from an incomplete sample when any specified subset of the classes are missing.Means as well as other posterior characteristics of N are obtained for two examples with various classes removed. For the special case of a truncated ‘missing zero class’ Poisson sample a simulation experiment is performed for the small ‘N=25’ sample situation applying both Bayesian and maximum likelihood methods of estimation.  相似文献   

7.
Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)’s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.  相似文献   

8.
We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Rubin (1974), is applied to a more general case of non-monotone missing data. The proposed method is asymptotically equivalent to the Fisher scoring method from the observed likelihood, but avoids the burden of computing the first and second partial derivatives of the observed likelihood. Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method. A numerical example is presented to illustrate the method.  相似文献   

9.
In many clinical studies where time to failure is of primary interest, patients may fail or die from one of many causes where failure time can be right censored. In some circumstances, it might also be the case that patients are known to die but the cause of death information is not available for some patients. Under the assumption that cause of death is missing at random, we compare the Goetghebeur and Ryan (1995, Biometrika, 82, 821–833) partial likelihood approach with the Dewanji (1992, Biometrika, 79, 855–857)partial likelihood approach. We show that the estimator for the regression coefficients based on the Dewanji partial likelihood is not only consistent and asymptotically normal, but also semiparametric efficient. While the Goetghebeur and Ryan estimator is more robust than the Dewanji partial likelihood estimator against misspecification of proportional baseline hazards, the Dewanji partial likelihood estimator allows the probability of missing cause of failure to depend on covariate information without the need to model the missingness mechanism. Tests for proportional baseline hazards are also suggested and a robust variance estimator is derived.  相似文献   

10.
In this paper, we consider the empirical likelihood inferences of the partial functional linear model with missing responses. Two empirical log-likelihood ratios of the parameters of interest are constructed, and the corresponding maximum empirical likelihood estimators of parameters are derived. Under some regularity conditions, we show that the proposed two empirical log-likelihood ratios are asymptotic standard Chi-squared. Thus, the asymptotic results can be used to construct the confidence intervals/regions for the parameters of interest. We also establish the asymptotic distribution theory of corresponding maximum empirical likelihood estimators. A simulation study indicates that the proposed methods are comparable in terms of coverage probabilities and average lengths of confidence intervals. An example of real data is also used to illustrate our proposed methods.  相似文献   

11.
The receiver operating characteristic (ROC) curve is one of the most commonly used methods to compare the diagnostic performance of two or more laboratory or diagnostic tests. In this paper, we propose semi-empirical likelihood based confidence intervals for ROC curves of two populations, where one population is parametric and the other one is non-parametric and both have missing data. After imputing missing values, we derive the semi-empirical likelihood ratio statistic and the corresponding likelihood equations. It is shown that the log-semi-empirical likelihood ratio statistic is asymptotically scaled chi-squared. The estimating equations are solved simultaneously to obtain the estimated lower and upper bounds of semi-empirical likelihood confidence intervals. We conduct extensive simulation studies to evaluate the finite sample performance of the proposed empirical likelihood confidence intervals with various sample sizes and different missing probabilities.  相似文献   

12.
Ibrahim (1990) used the EM-algorithm to obtain maximum likelihood estimates of the regression parameters in generalized linear models with partially missing covariates. The technique was termed EM by the method of weights. In this paper, we generalize this technique to Cox regression analysis with missing values in the covariates. We specify a full model letting the unobserved covariate values be random and then maximize the observed likelihood. The asymptotic covariance matrix is estimated by the inverse information matrix. The missing data are allowed to be missing at random but also the non-ignorable non-response situation may in principle be considered. Simulation studies indicate that the proposed method is more efficient than the method suggested by Paik & Tsai (1997). We apply the procedure to a clinical trials example with six covariates with three of them having missing values.  相似文献   

13.
In this paper, we propose an estimation method when sample data are incomplete. We decompose the likelihood according to missing patterns and combine the estimators based on each likelihood weighting by the Fisher information ratio. This approach provides a simple way of estimating parameters, especially for non‐monotone missing data. Numerical examples are presented to illustrate this method.  相似文献   

14.
Missing observations in both responses and covariates arise frequently in longitudinal studies. When missing data are missing not at random, inferences under the likelihood framework often require joint modelling of response and covariate processes, as well as missing data processes associated with incompleteness of responses and covariates. Specification of these four joint distributions is a nontrivial issue from the perspectives of both modelling and computation. To get around this problem, we employ pairwise likelihood formulations, which avoid the specification of third or higher order association structures. In this paper, we consider three specific missing data mechanisms which lead to further simplified pairwise likelihood (SPL) formulations. Under these missing data mechanisms, inference methods based on SPL formulations are developed. The resultant estimators are consistent, and enjoy better robustness and computation convenience. The performance is evaluated empirically though simulation studies. Longitudinal data from the National Population Health Survey and Waterloo Smoking Prevention Project are analysed to illustrate the usage of our methods.  相似文献   

15.
Empirical Likelihood-based Inference in Linear Models with Missing Data   总被引:18,自引:0,他引:18  
The missing response problem in linear regression is studied. An adjusted empirical likelihood approach to inference on the mean of the response variable is developed. A non-parametric version of Wilks's theorem for the adjusted empirical likelihood is proved, and the corresponding empirical likelihood confidence interval for the mean is constructed. With auxiliary information, an empirical likelihood-based estimator with asymptotic normality is defined and an adjusted empirical log-likelihood function with asymptotic χ2 is derived. A simulation study is conducted to compare the adjusted empirical likelihood methods and the normal approximation methods in terms of coverage accuracies and average lengths of the confidence intervals. Based on biases and standard errors, a comparison is also made between the empirical likelihood-based estimator and related estimators by simulation. Our simulation indicates that the adjusted empirical likelihood methods perform competitively and the use of auxiliary information provides improved inferences.  相似文献   

16.
We consider parametric regression problems with some covariates missing at random. It is shown that the regression parameter remains identifiable under natural conditions. When the always observed covariates are discrete, we propose a semiparametric maximum likelihood method, which does not require parametric specification of the missing data mechanism or the covariate distribution. The global maximum likelihood estimator (MLE), which maximizes the likelihood over the whole parameter set, is shown to exist under simple conditions. For ease of computation, we also consider a restricted MLE which maximizes the likelihood over covariate distributions supported by the observed values. Under regularity conditions, the two MLEs are asymptotically equivalent and strongly consistent for a class of topologies on the parameter set.  相似文献   

17.
In this article, by using the constant and random selection matrices, several properties of the maximum likelihood (ML) estimates and the ML estimator of a normal distribution with missing data are derived. The constant selection matrix allows us to obtain an explicit form of the ML estimates and the exact relationship between the EM algorithm and the score function. The random selection matrix allows us to clarify how the missing-data mechanism works in the proof of the consistency of the ML estimator, to derive the asymptotic properties of the sequence by the EM algorithm, and to derive the information matrix.  相似文献   

18.
This paper examines the formation of maximum likelihood estimates of cell means in analysis of variance problems for cells with missing observations. Methods of estimating the means for missing cells has a long history which includes iterative maximum likelihood techniques, approximation techniques and ad hoc techniques. The use of the EM algorithm to form maximum likelihood estimates has resolved most of the issues associated with this problem. Implementation of the EM algorithm entails specification of a reduced model. As demonstrated in this paper, when there are several missing cells, it is possible to specify a reduced model that results in an unidentifiable likelihood. The EM algorithm in this case does not converge, although the slow divergence may often be mistaken by the unwary as convergence. This paper presents a simple matrix method of determining whether or not the reduced model results in an identifiable likelihood, and consequently in an EM algorithm that converges. We also show the EM algorithm in this case to be equivalent to a method which yields a closed form solution.  相似文献   

19.
Record linkage databases have been increasingly available and used in pharmacoepidemiology, pharmacoeconomic and outcome studies, where the relationship between drug exposure or intervention and outcome is the main concern. Sometimes the linkage between outcome data and exposure data may be missing so that only a proportion of patients in the outcome database can be linked to other databases. This paper proposes maximum likelihood (ML) and GEE procedures to obtain consistent estimates of parameters in the model relating the outcome and risk factors. Asymptotic variances of the estimates were derived for the situation where the missing rate is estimated from the same dataset. We show that using the estimated missing rate, rather than the known missing rate, may result in more accurate estimates of the parameters. The confidence interval of the predicted occurrence rate, when the missing rate was estimated, was derived. Simulations for different scenarios were performed in order to explore the small-sample behaviour of the ML procedure using the estimated missing rate. The results confirmed the greater efficiency of using the estimated missing rate instead of the true one for large sample sizes. However, this may not be true for small samples. The ML procedure was applied to an analysis of coronary artery bypass operations in patients with acute coronary syndrome.  相似文献   

20.
We propose a method for estimating parameters in generalized linear models when the outcome variable is missing for some subjects and the missing data mechanism is non-ignorable. We assume throughout that the covariates are fully observed. One possible method for estimating the parameters is maximum likelihood with a non-ignorable missing data model. However, caution must be used when fitting non-ignorable missing data models because certain parameters may be inestimable for some models. Instead of fitting a non-ignorable model, we propose the use of auxiliary information in a likelihood approach to reduce the bias, without having to specify a non-ignorable model. The method is applied to a mental health study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号