期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Bayesian Approach in Estimating Transition Probabilities of a Discrete-time Markov Chain for Ignorable Intermittent Missing Data

Junsheng Ma Xiaoying Yu Elaine Symanski Rachelle Doody 《统计学通讯:模拟与计算》2016,45(7):2598-2616

This article focuses on data analyses under the scenario of missing at random within discrete-time Markov chain models. The naive method, nonlinear (NL) method, and Expectation-Maximization (EM) algorithm are discussed. We extend the NL method into a Bayesian framework, using an adjusted rejection algorithm to sample the posterior distribution, and estimating the transition probabilities with a Monte Carlo algorithm. We compare the Bayesian nonlinear (BNL) method with the naive method and the EM algorithm with various missing rates, and comprehensively evaluate estimators in terms of biases, variances, mean square errors, and coverage probabilities (CPs). Our simulation results show that the EM algorithm usually offers smallest variances but with poorest CP, while the BNL method has smaller variances and better/similar CP as compared to the naive method. When the missing rate is low (about 9%, MAR), the three methods are comparable. Whereas when the missing rate is high (about 25%, MAR), overall, the BNL method performs slightly but consistently better than the naive method regarding variances and CP. Data from a longitudinal study of stress level among caregivers of individuals with Alzheimer’s disease is used to illustrate these methods. 相似文献

2.

Using auxiliary data for parameter estimation with non-ignorably missing outcomes

Joseph G. Ibrahim Stuart R. Lipsitz & Nick Horton 《Journal of the Royal Statistical Society. Series C, Applied statistics》2001,50(3):361-373

We propose a method for estimating parameters in generalized linear models when the outcome variable is missing for some subjects and the missing data mechanism is non-ignorable. We assume throughout that the covariates are fully observed. One possible method for estimating the parameters is maximum likelihood with a non-ignorable missing data model. However, caution must be used when fitting non-ignorable missing data models because certain parameters may be inestimable for some models. Instead of fitting a non-ignorable model, we propose the use of auxiliary information in a likelihood approach to reduce the bias, without having to specify a non-ignorable model. The method is applied to a mental health study. 相似文献

3.

Forecasting time series with missing data using Holt's model

José D. Bermúdez Ana Corberán-Vallet Enriqueta Vercher 《Journal of statistical planning and inference》2009

This paper deals with the prediction of time series with missing data using an alternative formulation for Holt's model with additive errors. This formulation simplifies both the calculus of maximum likelihood estimators of all the unknowns in the model and the calculus of point forecasts. In the presence of missing data, the EM algorithm is used to obtain maximum likelihood estimates and point forecasts. Based on this application we propose a leave-one-out algorithm for the data transformation selection problem which allows us to analyse Holt's model with multiplicative errors. Some numerical results show the performance of these procedures for obtaining robust forecasts. 相似文献

4.

Shaochuan Lu 《Journal of applied statistics》2017,44(1):71-88

The magnitude-frequency distribution (MFD) of earthquake is a fundamental statistic in seismology. The so-called b-value in the MFD is of particular interest in geophysics. A continuous time hidden Markov model (HMM) is proposed for characterizing the variability of b-values. The HMM-based approach to modeling the MFD has some appealing properties over the widely used sliding-window approach. Often, large variability appears in the estimation of b-value due to window size tuning, which may cause difficulties in interpretation of b-value heterogeneities. Continuous-time hidden Markov models (CT-HMMs) are widely applied in various fields. It bears some advantages over its discrete time counterpart in that it can characterize heterogeneities appearing in time series in a finer time scale, particularly for highly irregularly-spaced time series, such as earthquake occurrences. We demonstrate an expectation–maximization algorithm for the estimation of general exponential family CT-HMM. In parallel with discrete-time hidden Markov models, we develop a continuous time version of Viterbi algorithm to retrieve the overall optimal path of the latent Markov chain. The methods are applied to New Zealand deep earthquakes. Before the analysis, we first assess the completeness of catalogue events to assure the analysis is not biased by missing data. The estimation of b-value is stable over the selection of magnitude thresholds, which is ideal for the interpretation of b-value variability. 相似文献

5.

ON TRANSITION PROBABILITIES OF SKIP-FREE MARKOV CHAINS

Masaaki Kijima 《Australian & New Zealand Journal of Statistics》1989,31(2):309-314

Consider an ergodic Markov chain X(t) in continuous time with an infinitesimal matrix Q = (q_ij) defined on a finite state space {0, 1,…, N}. In this note, we prove that if X(t) is skip-free positive (negative, respectively), i.e., q_ij, = 0 for j > i+ 1 (i > j+ 1), then the transition probability p_ij(t) = Pr[X(t)=j | X(0) =i] can be represented as a linear combination of p_0N(t) (p^(m)_(N0)(t)), 0 ≤ m ≤N, where f^(m)(t) denotes the mth derivative of a function f(t) with f⁽⁰⁾(t) =f(t). If X(t) is a birth-death process, then p_ij(t) is represented as a linear combination of p_0N^(m)(t), 0 ≤m≤N - |i-j|. 相似文献

6.

Estimating household structure in ancient China by using historical data: a latent class analysis of partially missing patterns

Tim Futing Liao 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(1):125-139

Summary. Social data often contain missing information. The problem is inevitably severe when analysing historical data. Conventionally, researchers analyse complete records only. Listwise deletion not only reduces the effective sample size but also may result in biased estimation, depending on the missingness mechanism. We analyse household types by using population registers from ancient China (618–907 AD) by comparing a simple classification, a latent class model of the complete data and a latent class model of the complete and partially missing data assuming four types of ignorable and non-ignorable missingness mechanisms. The findings show that either a frequency classification or a latent class analysis using the complete records only yielded biased estimates and incorrect conclusions in the presence of partially missing data of a non-ignorable mechanism. Although simply assuming ignorable or non-ignorable missing data produced consistently similarly higher estimates of the proportion of complex households, a specification of the relationship between the latent variable and the degree of missingness by a row effect uniform association model helped to capture the missingness mechanism better and improved the model fit. 相似文献

7.

An EM algorithm for estimation in the mixture transition distribution model

《Journal of Statistical Computation and Simulation》2012,82(8):713-729

The mixture transition distribution (MTD) model was introduced by Raftery to face the need for parsimony in the modeling of high-order Markov chains in discrete time. The particularity of this model comes from the fact that the effect of each lag upon the present is considered separately and additively, so that the number of parameters required is drastically reduced. However, the efficiency for the MTD parameter estimations proposed up to date still remains problematic on account of the large number of constraints on the parameters. In this article, an iterative procedure, commonly known as expectation–maximization (EM) algorithm, is developed cooperating with the principle of maximum likelihood estimation (MLE) to estimate the MTD parameters. Some applications of modeling MTD show the proposed EM algorithm is easier to be used than the algorithm developed by Berchtold. Moreover, the EM estimations of parameters for high-order MTD models led on DNA sequences outperform the corresponding fully parametrized Markov chain in terms of Bayesian information criterion. A software implementation of our algorithm is available in the library seq++at http://stat.genopole.cnrs.fr/seqpp. 相似文献

8.

Inderdeep Kaur M. B. Rajarshi 《统计学通讯:模拟与计算》2013,42(4):524-530

When data from several independent Markov chains are aggregated over each time point, least square estimation of transition probabilities faces the problem of multi-collinearity. We propose here an estimation procedure which involves use of ridge regression for the ordinary least square estimators. Performance of this estimator is then compared with that of the ordinary least squares. 相似文献

9.

Comparison of algorithms for replacing missing data in discriminant analysis

J.Twedt Daniel D.S. Gill 《统计学通讯:理论与方法》2013,42(6):1567-1578

We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑₁=∑₂, (2) multivariate normal with ∑₁≠∑₂ and (3) multivariate non-normal with ∑₁=∑₂. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement. 相似文献

10.

Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

J. G. Ibrahim S. R. Lipsitz & M.-H. Chen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):173-190

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed. 相似文献

11.

Estimating survival curves with left truncated and interval censored data via the ems algorithm

Wei Pan Rick Chappell 《统计学通讯:理论与方法》2013,42(4):777-793

It is well-known that the nonparametric maximum likelihood estimator (NPMLE) of a survival function may severely underestimate the survival probabilities at very early times for left truncated data. This problem might be overcome by instead computing a smoothed nonparametric estimator (SNE) via the EMS algorithm. The close connection between the SNE and the maximum penalized likelihood estimator is also established. Extensive Monte Carlo simulations demonstrate the superior performance of the SNE over that of the NPMLE, in terms of either bias or variance, even for moderately large Samples. The methodology is illustrated with an application to the Massachusetts Health Care Panel Study dataset to estimate the probability of being functionally independent for non-poor male and female groups rcspectively. 相似文献

12.

A test of the missing data mechanism for repeated measures data

Taesung Park Seungyeoun Lee Robert F. Woolson 《统计学通讯:理论与方法》2013,42(10):2813-2829

The occurrence of missing data is an often unavoidable consequence of repeated measures studies. Fortunately, multivariate general linear models such as growth curve models and linear mixed models with random effects have been well developed to analyze incomplete normally-distributed repeated measures data. Most statistical methods have assumed that the missing data occur at random. This assumption may include two types of missing data mechanism: missing completely at random (MCAR) and missing at random (MAR) in the sense of Rubin (1976). In this paper, we develop a test procedure for distinguishing these two types of missing data mechanism for incomplete normally-distributed repeated measures data. The proposed test is similar in spiril to the test of Park and Davis (1992). We derive the test for incomplete normally-distribrlted repeated measures data using linear mixed models. while Park and Davis (1992) cleirved thr test for incomplete repeatctl categorical data in the framework of Grizzle Starmer. and Koch (1969). Thr proposed procedure can be applied easily to any other multivariate general linear model which allow for missing data. The test is illustrated using the hip-replacernent patient.data from Crowder and Hand (1990). 相似文献

13.

Efficiency of lattice conditional independence models for multinormal samples with non-monotone missing data

Lang Wu Michael D. Perlman 《统计学通讯:模拟与计算》2013,42(2):481-509

For multivariate normal data with non-monotone (i.e. arbitrary) missing data patterns, lattice conditional independence (LCI) models determined by the observed data patterns can be used to obtain closed-form MLEs (Andersson and Perlman, 1991, 1993). In this paper, three procedures — LCI models, the EM algorithm, and the complete-data method — are compared by means of a Monte Carlo experiment. When the LCI model is accepted by the LR test, the LCI estimate is more efficient than those based on the EM algorithm and the complete-data method. When the LCI model is not accepted, the LCI estimate may lose efficiency but still may be more efficient than the EM estimate if the observed data is sparse. When the LCI model appears too restrictive, it may be possible to obtain a less restrictive LCI model by.discarding only a small portion of the incomplete observations. LCI models appear to be especially useful when the observed data is sparse, even in cases where the suitability of the LCI model is uncertain. 相似文献

14.

Semiparametric quasi-likelihood estimation with missing data

Francesco Bravo 《统计学通讯:理论与方法》2013,42(5):1345-1369

Abstract

This article develops quasi-likelihood estimation for generalized varying coefficient partially linear models when the response is not always observable. This article considers two estimation methods and shows that under the assumption of selection on the observables the resulting estimators are asymptotically normal. As an application of these results this article proposes a new estimator for the average treatment effect parameter. A simulation study illustrates the finite sample properties of the proposed estimators. 相似文献

15.

Robust logistic regression of family data in the presence of missing genotypes

Yanping Qiu 《Journal of applied statistics》2019,46(5):926-945

Large cohort studies are commonly launched to study the risk effect of genetic variants or other risk factors on a chronic disorder. In these studies, family data are often collected to provide additional information for the purpose of improving the inference results. Statistical analysis of the family data can be very challenging due to the missing observations of genotypes, incomplete records of disease occurrences in family members, and the complicated dependence attributed to the shared genetic background and environmental factors. In this article, we investigate a class of logistic models with family-shared random effects to tackle these challenges, and develop a robust regression method based on the conditional logistic technique for statistical inference. An expectation–maximization (EM) algorithm with fast computation speed is developed to handle the missing genotypes. The proposed estimators are shown to be consistent and asymptotically normal. Additionally, a score test based on the proposed method is derived to test the genetic effect. Extensive simulation studies demonstrate that the proposed method performs well in finite samples in terms of estimate accuracy, robustness and computational speed. The proposed procedure is applied to an Alzheimer's disease study. 相似文献

16.

Bayesian comparison of diagnostic tests with largely non-informative missing data

Carlos Daniel Paulino 《Journal of Statistical Computation and Simulation》2019,89(10):1877-1886

This work was motivated by a real problem of comparing binary diagnostic tests based upon a gold standard, where the collected data showed that the large majority of classifications were incomplete and the feedback received from the medical doctors allowed us to consider the missingness as non-informative. Taking into account the degree of data incompleteness, we used a Bayesian approach via MCMC methods for drawing inferences of interest on accuracy measures. Its direct implementation by well-known software demonstrated serious problems of chain convergence. The difficulties were overcome by the proposal of a simple, efficient and easily adaptable data augmentation algorithm, performed through an ad hoc computer program. 相似文献

17.

Estimating the parameters of mixture models with modal estimators

Richard A Redner Richard J Hathaway James C Bezdek 《统计学通讯:理论与方法》2013,42(9):2639-2660

This paper extends some of the work presented in Redner and Walker [I9841 on the maximum likelihood estimate of parameters in a mixture model to a Bayesian modal estimate. The problem of determining the mode of the joint posterior distribution is discussed. Necessary conditions are given for a choice of parameters to be the mode and a numerical scheme based on the EM algorithm is presented. Some theoretical remarks on the resulting iterative scheme and simulation results are also given. 相似文献

18.

Model-based clustering of multivariate skew data with circular components and missing values

Francesco Lagona Marco Picone 《Journal of applied statistics》2012,39(5):927-945

Motivated by classification issues that arise in marine studies, we propose a latent-class mixture model for the unsupervised classification of incomplete quadrivariate data with two linear and two circular components. The model integrates bivariate circular densities and bivariate skew normal densities to capture the association between toroidal clusters of bivariate circular observations and planar clusters of bivariate linear observations. Maximum-likelihood estimation of the model is facilitated by an expectation maximization (EM) algorithm that treats unknown class membership and missing values as different sources of incomplete information. The model is exploited on hourly observations of wind speed and direction and wave height and direction to identify a number of sea regimes, which represent specific distributional shapes that the data take under environmental latent conditions. 相似文献

19.

Multivariate Poisson regression with covariance structure 总被引：1，自引：0，他引：1

Dimitris?Karlis Email author Loukia?Meligkotsidou 《Statistics and Computing》2005,15(4):255-265

In recent years the applications of multivariate Poisson models have increased, mainly because of the gradual increase in computer performance. The multivariate Poisson model used in practice is based on a common covariance term for all the pairs of variables. This is rather restrictive and does not allow for modelling the covariance structure of the data in a flexible way. In this paper we propose inference for a multivariate Poisson model with larger structure, i.e. different covariance for each pair of variables. Maximum likelihood estimation, as well as Bayesian estimation methods are proposed. Both are based on a data augmentation scheme that reflects the multivariate reduction derivation of the joint probability function. In order to enlarge the applicability of the model we allow for covariates in the specification of both the mean and the covariance parameters. Extension to models with complete structure with many multi-way covariance terms is discussed. The method is demonstrated by analyzing a real life data set. 相似文献

20.

Bias correction in logistic regression with missing categorical covariates

Ujjwal Das Tapabrata Maiti Vivek Pradhan 《Journal of statistical planning and inference》2010

Logistic regression plays an important role in many fields. In practice, we often encounter missing covariates in different applied sectors, particularly in biomedical sciences. Ibrahim (1990) proposed a method to handle missing covariates in generalized linear model (GLM) setup. It is well known that logistic regression estimates using small or medium sized missing data are biased. Considering the missing data that are missing at random, in this paper we have reduced the bias by two methods; first we have derived a closed form bias expression using Cox and Snell (1968), and second we have used likelihood based modification similar to Firth (1993). Here we have analytically shown that the Firth type likelihood modification in Ibrahim led to the second order bias reduction. The proposed methods are simple to apply on an existing method, need no analytical work, with the exception of a little change in the optimization function. We have carried out extensive simulation studies comparing the methods, and our simulation results are also supported by a real world data. 相似文献