期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Bayesian Approach in Estimating Transition Probabilities of a Discrete-time Markov Chain for Ignorable Intermittent Missing Data

Junsheng Ma Xiaoying Yu Elaine Symanski Rachelle Doody 《统计学通讯:模拟与计算》2016,45(7):2598-2616

This article focuses on data analyses under the scenario of missing at random within discrete-time Markov chain models. The naive method, nonlinear (NL) method, and Expectation-Maximization (EM) algorithm are discussed. We extend the NL method into a Bayesian framework, using an adjusted rejection algorithm to sample the posterior distribution, and estimating the transition probabilities with a Monte Carlo algorithm. We compare the Bayesian nonlinear (BNL) method with the naive method and the EM algorithm with various missing rates, and comprehensively evaluate estimators in terms of biases, variances, mean square errors, and coverage probabilities (CPs). Our simulation results show that the EM algorithm usually offers smallest variances but with poorest CP, while the BNL method has smaller variances and better/similar CP as compared to the naive method. When the missing rate is low (about 9%, MAR), the three methods are comparable. Whereas when the missing rate is high (about 25%, MAR), overall, the BNL method performs slightly but consistently better than the naive method regarding variances and CP. Data from a longitudinal study of stress level among caregivers of individuals with Alzheimer’s disease is used to illustrate these methods. 相似文献

2.

Proportional hazards regression in the presence of missing study eligibility information

Qing Pan Douglas E. Schaubel 《Lifetime data analysis》2014,20(3):424-443

We consider the study of censored survival times in the situation where the available data consist of both eligible and ineligible subjects, and information distinguishing the two groups is sometimes missing. A complete-case analysis in this context would use only subjects known to be eligible, resulting in inefficient and potentially biased estimators. We propose a two-step procedure which resembles the EM algorithm but is computationally much faster. In the first step, one estimates the conditional expectation of the missing eligibility indicators given the observed data using a logistic regression based on the complete cases (i.e., subjects with non-missing eligibility indicator). In the second step, maximum likelihood estimators are obtained from a weighted Cox proportional hazards model, with the weights being either observed eligibility indicators or estimated conditional expectations thereof. Under ignorable missingness, the estimators from the second step are proven to be consistent and asymptotically normal, with explicit variance estimators. We demonstrate through simulation that the proposed methods perform well for moderate sized samples and are robust in the presence of eligibility indicators that are missing not at random. The proposed procedure is more efficient and more robust than the complete case analysis and, unlike the EM algorithm, does not require time-consuming iteration. Although the proposed methods are applicable generally, they would be most useful for large data sets (e.g., administrative data), for which the computational savings outweigh the price one has to pay for making various approximations in avoiding iteration. We apply the proposed methods to national kidney transplant registry data. 相似文献

3.

Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

J. G. Ibrahim S. R. Lipsitz & M.-H. Chen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):173-190

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed. 相似文献

4.

INCOMPLETE DATA IN GENERALIZED LINEAR MODELS WITH CONTINUOUS COVARIATES

Joseph G. Brahim Sanford Weisberg 《Australian & New Zealand Journal of Statistics》1992,34(3):461-470

This paper proposes a method for estimating the parameters in a generalized linear model with missing covariates. The missing covariates are assumed to come from a continuous distribution, and are assumed to be missing at random. In particular, Gaussian quadrature methods are used on the E-step of the EM algorithm, leading to an approximate EM algorithm. The parameters are then estimated using the weighted EM procedure given in Ibrahim (1990). This approximate EM procedure leads to approximate maximum likelihood estimates, whose standard errors and asymptotic properties are given. The proposed procedure is illustrated on a data set. 相似文献

5.

Stochastic EM algorithm of a finite mixture model from hurdle Poisson distribution with missing responses

Ying-zi Fu 《统计学通讯:理论与方法》2013,42(20):5918-5932

ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology. 相似文献

6.

The em algorithm for the quasi-likelihood regression model

Myunghee Cho Paik 《统计学通讯:理论与方法》2013,42(6):1403-1430

The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method can be useful when we can make first and second moment assumptions only; in addition, it can be helpful when the EM algorithm applied to the actual likelihood becomes overly complicated. First we derive a loss function for the observed data using an exponential family density which has the same mean and variance structure of the complete data. This loss function is the counterpart of the quasi-deviance for the observed data. Then the loss function is minimized using the EM algorithm. The use of the EM algorithm guarantees a decrease in the loss function at every iteration. When the observed data can be expressed as a deterministic linear transformation of the complete data, or when data are missing completely at random, the proposed method yields consistent estimators. Examples are given for overdispersed polytomous data, linear random effects models, and linear regression with missing covariates. Simulation results for the linear regression model with missing covariates show that the proposed estimates are more efficient than estimates based on completely observed units, even when outcomes are bimodal or skewed. 相似文献

7.

Cure rate survival models with missing covariates: a simulation study

Renata Santana Fonseca Heleno Bolfarine 《Journal of Statistical Computation and Simulation》2013,83(1):97-113

In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence. 相似文献

8.

Evaluation of incomplete multiple diagnostic tests,with an application in the colon cancer family registry study

Yi Zhang Haitao Chu Donglin Zeng 《Journal of applied statistics》2014,41(3):688-700

Accurate diagnosis of a molecularly defined subtype of cancer is often an important step toward its effective control and treatment. For the diagnosis of some subtypes of a cancer, a gold standard with perfect sensitivity and specificity may be unavailable. In those scenarios, tumor subtype status is commonly measured by multiple imperfect diagnostic markers. Additionally, in many such studies, some subjects are only measured by a subset of diagnostic tests and the missing probabilities may depend on the unknown disease status. In this paper, we present statistical methods based on the EM algorithm to evaluate incomplete multiple imperfect diagnostic tests under a missing at random assumption and one missing not at random scenario. We apply the proposed methods to a real data set from the National Cancer Institute (NCI) colon cancer family registry on diagnosing microsatellite instability for hereditary non-polyposis colorectal cancer to estimate diagnostic accuracy parameters (i.e. sensitivities and specificities), prevalence, and potential differential missing probabilities for 11 biomarker tests. Simulations are also conducted to evaluate the small-sample performance of our methods. 相似文献

9.

On the use of the selection matrix in the maximum likelihood estimation of normal distribution models with missing data

Keiji Takai 《统计学通讯:理论与方法》2018,47(14):3392-3407

In this article, by using the constant and random selection matrices, several properties of the maximum likelihood (ML) estimates and the ML estimator of a normal distribution with missing data are derived. The constant selection matrix allows us to obtain an explicit form of the ML estimates and the exact relationship between the EM algorithm and the score function. The random selection matrix allows us to clarify how the missing-data mechanism works in the proof of the consistency of the ML estimator, to derive the asymptotic properties of the sequence by the EM algorithm, and to derive the information matrix. 相似文献

10.

AN ALTERNATIVE PARAMETRIC APPROACH FOR DISCRETE MISSING DATA PROBLEMS

《统计学通讯:理论与方法》2013,42(10):1969-1988

We propose an iterative method of estimation for discrete missing data problems that is conceptually different from the Expectation–Maximization (EM) algorithm and that does not in general yield the observed data maximum likelihood estimate (MLE). The proposed approach is based conceptually upon weighting the set of possible complete-data MLEs. Its implementation avoids the expectation step of EM, which can sometimes be problematic. In the simple case of Bernoulli trials missing completely at random, the iterations of the proposed algorithm are equivalent to the EM iterations. For a familiar genetics-oriented multinomial problem with missing count data and for the motivating example with epidemiologic applications that involves a mixture of a left censored normal distribution with a point mass at zero, we investigate the finite sample performance of the proposed estimator and find it to be competitive with that of the MLE. We give some intuitive justification for the method, and we explore an interesting connection between our algorithm and multiple imputation in order to suggest an approach for estimating standard errors. 相似文献

11.

Likelihood estimation of missing cell means in the fixed model analysis of variance

G.W. Fellingham H.D. Tolley D.T. Scott 《统计学通讯:理论与方法》2013,42(9):2429-2447

This paper examines the formation of maximum likelihood estimates of cell means in analysis of variance problems for cells with missing observations. Methods of estimating the means for missing cells has a long history which includes iterative maximum likelihood techniques, approximation techniques and ad hoc techniques. The use of the EM algorithm to form maximum likelihood estimates has resolved most of the issues associated with this problem. Implementation of the EM algorithm entails specification of a reduced model. As demonstrated in this paper, when there are several missing cells, it is possible to specify a reduced model that results in an unidentifiable likelihood. The EM algorithm in this case does not converge, although the slow divergence may often be mistaken by the unwary as convergence. This paper presents a simple matrix method of determining whether or not the reduced model results in an identifiable likelihood, and consequently in an EM algorithm that converges. We also show the EM algorithm in this case to be equivalent to a method which yields a closed form solution. 相似文献

12.

Some Further Issues Concerning Likelihood Inference for Left Truncated and Right Censored Lognormal Data

N. Balakrishnan Debanjan Mitra 《统计学通讯:模拟与计算》2013,42(2):400-416

The maximum likelihood estimates (MLEs) of the parameters of a two-parameter lognormal distribution with left truncation and right censoring are developed through the Expectation Maximization (EM) algorithm. For comparative purpose, the MLEs are also obtained by the Newton–Raphson method. The asymptotic variance-covariance matrix of the MLEs is obtained by using the missing information principle, under the EM framework. Then, using asymptotic normality of the MLEs, asymptotic confidence intervals for the parameters are constructed. Asymptotic confidence intervals are also obtained using the estimated variance of the MLEs by the observed information matrix, and by using parametric bootstrap technique. Different confidence intervals are then compared in terms of coverage probabilities, through a Monte Carlo simulation study. A prediction problem concerning the future lifetime of a right censored unit is also considered. A numerical example is given to illustrate all the inferential methods developed here. 相似文献

13.

A monte carlo comparison of the smoothing,scoring and em algorithms for dispersion matrix estimation with incomplete growth curve data

《Journal of Statistical Computation and Simulation》2012,82(1-2):77-92

Incomplete growth curve data often result from missing or mistimed observations in a repeated measures design. Virtually all methods of analysis rely on the dispersion matrix estimates. A Monte Carlo simulation was used to compare three methods of estimation of dispersion matrices for incomplete growth curve data. The three methods were: 1) maximum likelihood estimation with a smoothing algorithm, which finds the closest positive semidefinite estimate of the pairwise estimated dispersion matrix; 2) a mixed effects model using the EM (estimation maximization) algorithm; and 3) a mixed effects model with the scoring algorithm. The simulation included 5 dispersion structures, 20 or 40 subjects with 4 or 8 observations per subject and 10 or 30% missing data. In all the simulations, the smoothing algorithm was the poorest estimator of the dispersion matrix. In most cases, there were no significant differences between the scoring and EM algorithms. The EM algorithm tended to be better than the scoring algorithm when the variances of the random effects were close to zero, especially for the simulations with 4 observations per subject and two random effects. 相似文献

14.

Fitting finite mixture models using iterative Monte Carlo classification

Jing Xu Jun Ma 《统计学通讯:理论与方法》2017,46(13):6684-6693

Parameters of a finite mixture model are often estimated by the expectation–maximization (EM) algorithm where the observed data log-likelihood function is maximized. This paper proposes an alternative approach for fitting finite mixture models. Our method, called the iterative Monte Carlo classification (IMCC), is also an iterative fitting procedure. Within each iteration, it first estimates the membership probabilities for each data point, namely the conditional probability of a data point belonging to a particular mixing component given that the data point value is obtained, it then classifies each data point into a component distribution using the estimated conditional probabilities and the Monte Carlo method. It finally updates the parameters of each component distribution based on the classified data. Simulation studies were conducted to compare IMCC with some other algorithms for fitting mixture normal, and mixture t, densities. 相似文献

15.

On the estimation of the extreme value and normal distribution parameters based on progressive type-II hybrid-censored data

《Journal of Statistical Computation and Simulation》2012,82(3):569-596

A progressive hybrid censoring scheme is a mixture of type-I and type-II progressive censoring schemes. In this paper, we mainly consider the analysis of progressive type-II hybrid-censored data when the lifetime distribution of the individual item is the normal and extreme value distributions. Since the maximum likelihood estimators (MLEs) of these parameters cannot be obtained in the closed form, we propose to use the expectation and maximization (EM) algorithm to compute the MLEs. Also, the Newton–Raphson method is used to estimate the model parameters. The asymptotic variance–covariance matrix of the MLEs under EM framework is obtained by Fisher information matrix using the missing information and asymptotic confidence intervals for the parameters are then constructed. This study will end up with comparing the two methods of estimation and the asymptotic confidence intervals of coverage probabilities corresponding to the missing information principle and the observed information matrix through a simulation study, illustrated examples and real data analysis. 相似文献

16.

Inference based on progressive Type I interval censored data from log-normal distribution

Soumya Roy E. V. Gijo Biswabrata Pradhan 《统计学通讯:模拟与计算》2017,46(8):6495-6512

This article considers inference for the log-normal distribution based on progressive Type I interval censored data by both frequentist and Bayesian methods. First, the maximum likelihood estimates (MLEs) of the unknown model parameters are computed by expectation-maximization (EM) algorithm. The asymptotic standard errors (ASEs) of the MLEs are obtained by applying the missing information principle. Next, the Bayes’ estimates of the model parameters are obtained by Gibbs sampling method under both symmetric and asymmetric loss functions. The Gibbs sampling scheme is facilitated by adopting a similar data augmentation scheme as in EM algorithm. The performance of the MLEs and various Bayesian point estimates is judged via a simulation study. A real dataset is analyzed for the purpose of illustration. 相似文献

17.

Inference methods for saturated models in longitudinal clinical trials with incomplete binary data

Song JX 《Pharmaceutical statistics》2006,5(4):295-304

In the longitudinal studies with binary response, it is often of interest to estimate the percentage of positive responses at each time point and the percentage of having at least one positive response by each time point. When missing data exist, the conventional method based on observed percentages could result in erroneous estimates. This study demonstrates two methods of using expectation-maximization (EM) and data augmentation (DA) algorithms in the estimation of the marginal and cumulative probabilities for incomplete longitudinal binary response data. Both methods provide unbiased estimates when the missingness mechanism is missing at random (MAR) assumption. Sensitivity analyses have been performed for cases when the MAR assumption is in question. 相似文献

18.

Missing values: sparse inverse covariance estimation and?an?extension to sparse regression

Nicolas St?dler Peter Bühlmann 《Statistics and Computing》2012,22(1):219-235

We propose an ℓ ₁-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data. 相似文献

19.

The one-step-late PXEM algorithm

Van Dyk David A. Tang Ruoxi 《Statistics and Computing》2003,13(2):137-152

The EM algorithm is a popular method for computing maximum likelihood estimates or posterior modes in models that can be formulated in terms of missing data or latent structure. Although easy implementation and stable convergence help to explain the popularity of the algorithm, its convergence is sometimes notoriously slow. In recent years, however, various adaptations have significantly improved the speed of EM while maintaining its stability and simplicity. One especially successful method for maximum likelihood is known as the parameter expanded EM or PXEM algorithm. Unfortunately, PXEM does not generally have a closed form M-step when computing posterior modes, even when the corresponding EM algorithm is in closed form. In this paper we confront this problem by adapting the one-step-late EM algorithm to PXEM to establish a fast closed form algorithm that improves on the one-step-late EM algorithm by insuring monotone convergence. We use this algorithm to fit a probit regression model and a variety of dynamic linear models, showing computational savings of as much as 99.9%, with the biggest savings occurring when the EM algorithm is the slowest to converge. 相似文献

20.

Random effects regression models for count data with excess zeros in caries research

D. Todem Y. Zhang A. Ismail W. Sohn 《Journal of applied statistics》2010,37(10):1661-1679

We extend the family of Poisson and negative binomial models to derive the joint distribution of clustered count outcomes with extra zeros. Two random effects models are formulated. The first model assumes a shared random effects term between the conditional probability of perfect zeros and the conditional mean of the imperfect state. The second formulation relaxes the shared random effects assumption by relating the conditional probability of perfect zeros and the conditional mean of the imperfect state to two different but correlated random effects variables. Under the conditional independence and the missing data at random assumption, a direct optimization of the marginal likelihood and an EM algorithm are proposed to fit the proposed models. Our proposed models are fitted to dental caries counts of children under the age of six in the city of Detroit. 相似文献