首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This article focuses on data analyses under the scenario of missing at random within discrete-time Markov chain models. The naive method, nonlinear (NL) method, and Expectation-Maximization (EM) algorithm are discussed. We extend the NL method into a Bayesian framework, using an adjusted rejection algorithm to sample the posterior distribution, and estimating the transition probabilities with a Monte Carlo algorithm. We compare the Bayesian nonlinear (BNL) method with the naive method and the EM algorithm with various missing rates, and comprehensively evaluate estimators in terms of biases, variances, mean square errors, and coverage probabilities (CPs). Our simulation results show that the EM algorithm usually offers smallest variances but with poorest CP, while the BNL method has smaller variances and better/similar CP as compared to the naive method. When the missing rate is low (about 9%, MAR), the three methods are comparable. Whereas when the missing rate is high (about 25%, MAR), overall, the BNL method performs slightly but consistently better than the naive method regarding variances and CP. Data from a longitudinal study of stress level among caregivers of individuals with Alzheimer’s disease is used to illustrate these methods.  相似文献   

2.
This article considers a discrete-time Markov chain for modeling transition probabilities when multiple successive observations are missing at random between two observed outcomes using three methods: a na\"?ve analog of complete-case analysis using the observed one-step transitions alone, a non data-augmentation method (NL) by solving nonlinear equations, and a data-augmentation method, the Expectation-Maximization (EM) algorithm. The explicit form of the conditional log-likelihood given the observed information as required by the E step is provided, and the iterative formula in the M step is expressed in a closed form. An empirical study was performed to examine the accuracy and precision of the estimates obtained in the three methods under ignorable missing mechanisms of missing completely at random and missing at random. A dataset from the mental health arena was used for illustration. It was found that both data-augmentation and nonaugmentation methods provide accurate and precise point estimation, and that the na\"?ve method resulted in estimates of the transition probabilities with similar bias but larger MSE. The NL method and the EM algorithm in general provide similar results whereas the latter provides conditional expected row margins leading to smaller standard errors.  相似文献   

3.
We propose a method for estimating parameters in generalized linear models when the outcome variable is missing for some subjects and the missing data mechanism is non-ignorable. We assume throughout that the covariates are fully observed. One possible method for estimating the parameters is maximum likelihood with a non-ignorable missing data model. However, caution must be used when fitting non-ignorable missing data models because certain parameters may be inestimable for some models. Instead of fitting a non-ignorable model, we propose the use of auxiliary information in a likelihood approach to reduce the bias, without having to specify a non-ignorable model. The method is applied to a mental health study.  相似文献   

4.
This paper deals with the prediction of time series with missing data using an alternative formulation for Holt's model with additive errors. This formulation simplifies both the calculus of maximum likelihood estimators of all the unknowns in the model and the calculus of point forecasts. In the presence of missing data, the EM algorithm is used to obtain maximum likelihood estimates and point forecasts. Based on this application we propose a leave-one-out algorithm for the data transformation selection problem which allows us to analyse Holt's model with multiplicative errors. Some numerical results show the performance of these procedures for obtaining robust forecasts.  相似文献   

5.
In this article, by using the constant and random selection matrices, several properties of the maximum likelihood (ML) estimates and the ML estimator of a normal distribution with missing data are derived. The constant selection matrix allows us to obtain an explicit form of the ML estimates and the exact relationship between the EM algorithm and the score function. The random selection matrix allows us to clarify how the missing-data mechanism works in the proof of the consistency of the ML estimator, to derive the asymptotic properties of the sequence by the EM algorithm, and to derive the information matrix.  相似文献   

6.
The magnitude-frequency distribution (MFD) of earthquake is a fundamental statistic in seismology. The so-called b-value in the MFD is of particular interest in geophysics. A continuous time hidden Markov model (HMM) is proposed for characterizing the variability of b-values. The HMM-based approach to modeling the MFD has some appealing properties over the widely used sliding-window approach. Often, large variability appears in the estimation of b-value due to window size tuning, which may cause difficulties in interpretation of b-value heterogeneities. Continuous-time hidden Markov models (CT-HMMs) are widely applied in various fields. It bears some advantages over its discrete time counterpart in that it can characterize heterogeneities appearing in time series in a finer time scale, particularly for highly irregularly-spaced time series, such as earthquake occurrences. We demonstrate an expectation–maximization algorithm for the estimation of general exponential family CT-HMM. In parallel with discrete-time hidden Markov models, we develop a continuous time version of Viterbi algorithm to retrieve the overall optimal path of the latent Markov chain. The methods are applied to New Zealand deep earthquakes. Before the analysis, we first assess the completeness of catalogue events to assure the analysis is not biased by missing data. The estimation of b-value is stable over the selection of magnitude thresholds, which is ideal for the interpretation of b-value variability.  相似文献   

7.
Consider an ergodic Markov chain X(t) in continuous time with an infinitesimal matrix Q = (qij) defined on a finite state space {0, 1,…, N}. In this note, we prove that if X(t) is skip-free positive (negative, respectively), i.e., qij, = 0 for j > i+ 1 (i > j+ 1), then the transition probability pij(t) = Pr[X(t)=j | X(0) =i] can be represented as a linear combination of p0N(t) (p(m)(N0)(t)), 0 ≤ m ≤N, where f(m)(t) denotes the mth derivative of a function f(t) with f(0)(t) =f(t). If X(t) is a birth-death process, then pij(t) is represented as a linear combination of p0N(m)(t), 0 ≤mN - |i-j|.  相似文献   

8.
The mixture transition distribution (MTD) model was introduced by Raftery to face the need for parsimony in the modeling of high-order Markov chains in discrete time. The particularity of this model comes from the fact that the effect of each lag upon the present is considered separately and additively, so that the number of parameters required is drastically reduced. However, the efficiency for the MTD parameter estimations proposed up to date still remains problematic on account of the large number of constraints on the parameters. In this article, an iterative procedure, commonly known as expectation–maximization (EM) algorithm, is developed cooperating with the principle of maximum likelihood estimation (MLE) to estimate the MTD parameters. Some applications of modeling MTD show the proposed EM algorithm is easier to be used than the algorithm developed by Berchtold. Moreover, the EM estimations of parameters for high-order MTD models led on DNA sequences outperform the corresponding fully parametrized Markov chain in terms of Bayesian information criterion. A software implementation of our algorithm is available in the library seq++at http://stat.genopole.cnrs.fr/seqpp.  相似文献   

9.
Summary.  Social data often contain missing information. The problem is inevitably severe when analysing historical data. Conventionally, researchers analyse complete records only. Listwise deletion not only reduces the effective sample size but also may result in biased estimation, depending on the missingness mechanism. We analyse household types by using population registers from ancient China (618–907 AD) by comparing a simple classification, a latent class model of the complete data and a latent class model of the complete and partially missing data assuming four types of ignorable and non-ignorable missingness mechanisms. The findings show that either a frequency classification or a latent class analysis using the complete records only yielded biased estimates and incorrect conclusions in the presence of partially missing data of a non-ignorable mechanism. Although simply assuming ignorable or non-ignorable missing data produced consistently similarly higher estimates of the proportion of complex households, a specification of the relationship between the latent variable and the degree of missingness by a row effect uniform association model helped to capture the missingness mechanism better and improved the model fit.  相似文献   

10.
ABSTRACT

In this paper, we investigate the consistency of the Expectation Maximization (EM) algorithm-based information criteria for model selection with missing data. The criteria correspond to a penalization of the conditional expectation of the complete data log-likelihood given the observed data and with respect to the missing data conditional density. We present asymptotic properties related to maximum likelihood estimation in the presence of incomplete data and we provide sufficient conditions for the consistency of model selection by minimizing the information criteria. Their finite sample performance is illustrated through simulation and real data studies.  相似文献   

11.
When data from several independent Markov chains are aggregated over each time point, least square estimation of transition probabilities faces the problem of multi-collinearity. We propose here an estimation procedure which involves use of ridge regression for the ordinary least square estimators. Performance of this estimator is then compared with that of the ordinary least squares.  相似文献   

12.
Maximum likelihood (ML) estimation with spatial econometric models is a long-standing problem that finds application in several areas of economic importance. The problem is particularly challenging in the presence of missing data, since there is an implied dependence between all units, irrespective of whether they are observed or not. Out of the several approaches adopted for ML estimation in this context, that of LeSage and Pace [Models for spatially dependent missing data. J Real Estate Financ Econ. 2004;29(2):233–254] stands out as one of the most commonly used with spatial econometric models due to its ability to scale with the number of units. Here, we review their algorithm, and consider several similar alternatives that are also suitable for large datasets. We compare the methods through an extensive empirical study and conclude that, while the approximate approaches are suitable for large sampling ratios, for small sampling ratios the only reliable algorithms are those that yield exact ML or restricted ML estimates.  相似文献   

13.
We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑1=∑2, (2) multivariate normal with ∑1≠∑2 and (3) multivariate non-normal with ∑1=∑2. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement.  相似文献   

14.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

15.
It is well-known that the nonparametric maximum likelihood estimator (NPMLE) of a survival function may severely underestimate the survival probabilities at very early times for left truncated data. This problem might be overcome by instead computing a smoothed nonparametric estimator (SNE) via the EMS algorithm. The close connection between the SNE and the maximum penalized likelihood estimator is also established. Extensive Monte Carlo simulations demonstrate the superior performance of the SNE over that of the NPMLE, in terms of either bias or variance, even for moderately large Samples. The methodology is illustrated with an application to the Massachusetts Health Care Panel Study dataset to estimate the probability of being functionally independent for non-poor male and female groups rcspectively.  相似文献   

16.
The occurrence of missing data is an often unavoidable consequence of repeated measures studies. Fortunately, multivariate general linear models such as growth curve models and linear mixed models with random effects have been well developed to analyze incomplete normally-distributed repeated measures data. Most statistical methods have assumed that the missing data occur at random. This assumption may include two types of missing data mechanism: missing completely at random (MCAR) and missing at random (MAR) in the sense of Rubin (1976). In this paper, we develop a test procedure for distinguishing these two types of missing data mechanism for incomplete normally-distributed repeated measures data. The proposed test is similar in spiril to the test of Park and Davis (1992). We derive the test for incomplete normally-distribrlted repeated measures data using linear mixed models. while Park and Davis (1992) cleirved thr test for incomplete repeatctl categorical data in the framework of Grizzle Starmer. and Koch (1969). Thr proposed procedure can be applied easily to any other multivariate general linear model which allow for missing data. The test is illustrated using the hip-replacernent patient.data from Crowder and Hand (1990).  相似文献   

17.
For multivariate normal data with non-monotone (i.e. arbitrary) missing data patterns, lattice conditional independence (LCI) models determined by the observed data patterns can be used to obtain closed-form MLEs (Andersson and Perlman, 1991, 1993). In this paper, three procedures — LCI models, the EM algorithm, and the complete-data method — are compared by means of a Monte Carlo experiment. When the LCI model is accepted by the LR test, the LCI estimate is more efficient than those based on the EM algorithm and the complete-data method. When the LCI model is not accepted, the LCI estimate may lose efficiency but still may be more efficient than the EM estimate if the observed data is sparse. When the LCI model appears too restrictive, it may be possible to obtain a less restrictive LCI model by.discarding only a small portion of the incomplete observations. LCI models appear to be especially useful when the observed data is sparse, even in cases where the suitability of the LCI model is uncertain.  相似文献   

18.
In some fields, we are forced to work with missing data in multivariate time series. Unfortunately, the data analysis in this context cannot be carried out in the same way as in the case of complete data. To deal with this problem, a Bayesian analysis of multivariate threshold autoregressive models with exogenous inputs and missing data is carried out. In this paper, Markov chain Monte Carlo methods are used to obtain samples from the involved posterior distributions, including threshold values and missing data. In order to identify autoregressive orders, we adapt the Bayesian variable selection method in this class of multivariate process. The number of regimes is estimated using marginal likelihood or product parameter-space strategies.  相似文献   

19.
Abstract

This article develops quasi-likelihood estimation for generalized varying coefficient partially linear models when the response is not always observable. This article considers two estimation methods and shows that under the assumption of selection on the observables the resulting estimators are asymptotically normal. As an application of these results this article proposes a new estimator for the average treatment effect parameter. A simulation study illustrates the finite sample properties of the proposed estimators.  相似文献   

20.
Large cohort studies are commonly launched to study the risk effect of genetic variants or other risk factors on a chronic disorder. In these studies, family data are often collected to provide additional information for the purpose of improving the inference results. Statistical analysis of the family data can be very challenging due to the missing observations of genotypes, incomplete records of disease occurrences in family members, and the complicated dependence attributed to the shared genetic background and environmental factors. In this article, we investigate a class of logistic models with family-shared random effects to tackle these challenges, and develop a robust regression method based on the conditional logistic technique for statistical inference. An expectation–maximization (EM) algorithm with fast computation speed is developed to handle the missing genotypes. The proposed estimators are shown to be consistent and asymptotically normal. Additionally, a score test based on the proposed method is derived to test the genetic effect. Extensive simulation studies demonstrate that the proposed method performs well in finite samples in terms of estimate accuracy, robustness and computational speed. The proposed procedure is applied to an Alzheimer's disease study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号