首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In presence of interval-censored data, we propose a general three-state disease model with covariates. Such data can arise, for example, in epidemiologic studies of infectious disease where both the times of infection and disease onset are not directly observed, or in cancer studies where the time of disease metastasis is known up to a specified interval. The proposed model allows the distributions of the transition times between states to depend on covariates and the time in the previous state. An estimation procedure for the underlying distributions and the model coefficients is suggested with the EM algorithm. The EMS algorithm (Smoothed EM algorithm) is also considered to obtain smooth estimates of the distributions. The proposed method is illustrated with data from an AIDS study and a study of patients with malignant melanoma.  相似文献   

2.
Gomez and Lagakos (1994) propose a nonparametric method for estimating the distribution of a survival time when the origin and end points defining the survival time suffer interval-censoring and right-censoring, respectively. In some situations, the end point also suffers interval-censoring as well as truncation. In this paper, we consider this general situation and propose a two-step estimation procedure for the estimation of the distribution of a survival time based on doubly interval-censored and truncated data. The proposed method generalizes the methods proposed by DeGruttola and Lagakos (1989) and Sun (1995) and is more efficient than that given in Gomez and Lagakos (1994). The approach is based on self-consistency equations. The method is illustrated by an analysis of an AIDS cohort study.  相似文献   

3.
Semiparametric models: a generalized self-consistency approach   总被引:1,自引:0,他引:1  
Summary. In semiparametric models, the dimension d of the maximum likelihood problem is potentially unlimited. Conventional estimation methods generally behave like O ( d 3). A new O ( d ) estimation procedure is proposed for a large class of semiparametric models. Potentially unlimited dimension is handled in a numerically efficient way through a Nelson–Aalen-like estimator. Discussion of the new method is put in the context of recently developed minorization–maximization algorithms based on surrogate objective functions. The procedure for semiparametric models is used to demonstrate three methods to construct a surrogate objective function: using the difference of two concave functions, the EM way and the new quasi-EM (QEM) approach. The QEM approach is based on a generalization of the EM-like construction of the surrogate objective function so it does not depend on the missing data representation of the model. Like the EM algorithm, the QEM method has a dual interpretation, a result of merging the idea of surrogate maximization with the idea of imputation and self-consistency. The new approach is compared with other possible approaches by using simulations and analysis of real data. The proportional odds model is used as an example throughout the paper.  相似文献   

4.
ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications.  相似文献   

5.
We derive and investigate a variant of AIC, the Akaike information criterion, for model selection in settings where the observed data is incomplete. Our variant is based on the motivation provided for the PDIO (‘predictive divergence for incomplete observation models’) criterion of Shimodaira (1994, in: Selecting Models from Data: Artificial Intelligence and Statistics IV, Lecture Notes in Statistics, vol. 89, Springer, New York, pp. 21–29). However, our variant differs from PDIO in its ‘goodness-of-fit’ term. Unlike AIC and PDIO, which require the computation of the observed-data empirical log-likelihood, our criterion can be evaluated using only complete-data tools, readily available through the EM algorithm and the SEM (‘supplemented’ EM) algorithm of Meng and Rubin (Journal of the American Statistical Association 86 (1991) 899–909). We compare the performance of our AIC variant to that of both AIC and PDIO in simulations where the data being modeled contains missing values. The results indicate that our criterion is less prone to overfitting than AIC and less prone to underfitting than PDIO.  相似文献   

6.
Projections of AIDS incidence are critical for assessing future healthcare needs. This paper focuses on the method of back-calculation for obtaining forecasts. The first problem faced was the need to account for delays and underreporting in reporting of cases and to adjust the incidence data. The method used to estimate the reporting delay distribution is based on Poisson regression and involves cross-classifying each reported case by calendar time of diagnosis and reporting delay. The adjusted AIDS incidence data are then used to obtain short-term projections and lower bounds on the size of the AIDS epidemic. The estimation procedure 'back-calculates' from AIDS incidence data using the incubation period distribution to obtain estimates of the numbers previously infected. These numbers are then projected forward. The problem can be shown to reduce to estimating the size of a multinomial population. The expectation-maximization (EM) algorithm is used to obtain maximum-likelihood estimates when the density of infection times is parametrized as a step function. The methodology is applied to AIDS incidence data in Portugal for four different transmission categories: injecting drug users, sexual transmission (homosexual/bisexual and heterosexual contact) and other, mainly haemophilia and blood transfusion related, to obtain short-term projections and an estimate of the minimum size of the epidemic.  相似文献   

7.
This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as “target clusters” (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation–maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification.  相似文献   

8.
For multivariate normal data with non-monotone (i.e. arbitrary) missing data patterns, lattice conditional independence (LCI) models determined by the observed data patterns can be used to obtain closed-form MLEs (Andersson and Perlman, 1991, 1993). In this paper, three procedures — LCI models, the EM algorithm, and the complete-data method — are compared by means of a Monte Carlo experiment. When the LCI model is accepted by the LR test, the LCI estimate is more efficient than those based on the EM algorithm and the complete-data method. When the LCI model is not accepted, the LCI estimate may lose efficiency but still may be more efficient than the EM estimate if the observed data is sparse. When the LCI model appears too restrictive, it may be possible to obtain a less restrictive LCI model by.discarding only a small portion of the incomplete observations. LCI models appear to be especially useful when the observed data is sparse, even in cases where the suitability of the LCI model is uncertain.  相似文献   

9.
In this article, we consider a competing cause scenario and assume the wider family of Conway–Maxwell–Poisson (COM–Poisson) distribution to model the number of competing causes. Assuming the type of the data to be interval censored, the main contribution is in developing the steps of the expectation maximization (EM) algorithm to determine the maximum likelihood estimates (MLEs) of the model parameters. A profile likelihood approach within the EM framework is proposed to estimate the COM–Poisson shape parameter. An extensive simulation study is conducted to evaluate the performance of the proposed EM algorithm. Model selection within the wider class of COM–Poisson distribution is carried out using likelihood ratio test and information-based criteria. A study to demonstrate the effect of model mis-specification is also carried out. Finally, the proposed estimation method is applied to a data on smoking cessation and a detailed analysis of the obtained results is presented.  相似文献   

10.
We propose a hidden Markov model for longitudinal count data where sources of unobserved heterogeneity arise, making data overdispersed. The observed process, conditionally on the hidden states, is assumed to follow an inhomogeneous Poisson kernel, where the unobserved heterogeneity is modeled in a generalized linear model (GLM) framework by adding individual-specific random effects in the link function. Due to the complexity of the likelihood within the GLM framework, model parameters may be estimated by numerical maximization of the log-likelihood function or by simulation methods; we propose a more flexible approach based on the Expectation Maximization (EM) algorithm. Parameter estimation is carried out using a non-parametric maximum likelihood (NPML) approach in a finite mixture context. Simulation results and two empirical examples are provided.  相似文献   

11.
Abstract.  The expectation-maximization (EM) algorithm is a popular approach for obtaining maximum likelihood estimates in incomplete data problems because of its simplicity and stability (e.g. monotonic increase of likelihood). However, in many applications the stability of EM is attained at the expense of slow, linear convergence. We have developed a new class of iterative schemes, called squared iterative methods (SQUAREM), to accelerate EM, without compromising on simplicity and stability. SQUAREM generally achieves superlinear convergence in problems with a large fraction of missing information. Globally convergent schemes are easily obtained by viewing SQUAREM as a continuation of EM. SQUAREM is especially attractive in high-dimensional problems, and in problems where model-specific analytic insights are not available. SQUAREM can be readily implemented as an 'off-the-shelf' accelerator of any EM-type algorithm, as it only requires the EM parameter updating. We present four examples to demonstrate the effectiveness of SQUAREM. A general-purpose implementation (written in R) is available.  相似文献   

12.
Linear mixed models are regularly applied to animal and plant breeding data to evaluate genetic potential. Residual maximum likelihood (REML) is the preferred method for estimating variance parameters associated with this type of model. Typically an iterative algorithm is required for the estimation of variance parameters. Two algorithms which can be used for this purpose are the expectation‐maximisation (EM) algorithm and the parameter expanded EM (PX‐EM) algorithm. Both, particularly the EM algorithm, can be slow to converge when compared to a Newton‐Raphson type scheme such as the average information (AI) algorithm. The EM and PX‐EM algorithms require specification of the complete data, including the incomplete and missing data. We consider a new incomplete data specification based on a conditional derivation of REML. We illustrate the use of the resulting new algorithm through two examples: a sire model for lamb weight data and a balanced incomplete block soybean variety trial. In the cases where the AI algorithm failed, a REML PX‐EM based on the new incomplete data specification converged in 28% to 30% fewer iterations than the alternative REML PX‐EM specification. For the soybean example a REML EM algorithm using the new specification converged in fewer iterations than the current standard specification of a REML PX‐EM algorithm. The new specification integrates linear mixed models, Henderson's mixed model equations, REML and the REML EM algorithm into a cohesive framework.  相似文献   

13.
This paper proposes a method for estimating the parameters in a generalized linear model with missing covariates. The missing covariates are assumed to come from a continuous distribution, and are assumed to be missing at random. In particular, Gaussian quadrature methods are used on the E-step of the EM algorithm, leading to an approximate EM algorithm. The parameters are then estimated using the weighted EM procedure given in Ibrahim (1990). This approximate EM procedure leads to approximate maximum likelihood estimates, whose standard errors and asymptotic properties are given. The proposed procedure is illustrated on a data set.  相似文献   

14.
This paper proposes a method to assess the local influence in a minor perturbation of a statistical model with incomplete data. The idea is to utilize Cook's approach to the conditional expectation of the complete-data log-likelihood function in the EM algorithm. It is shown that the method proposed produces analytic results that are very similar to those obtained from a classical local influence approach based on the observed data likelihood function and has the potential to assess a variety of complicated models that cannot be handled by existing methods. An application to the generalized linear mixed model is investigated. Some illustrative artificial and real examples are presented.  相似文献   

15.
Scale mixtures of normal distributions form a class of symmetric thick-tailed distributions that includes the normal one as a special case. In this paper we consider local influence analysis for measurement error models (MEM) when the random error and the unobserved value of the covariates jointly follow scale mixtures of normal distributions, providing an appealing robust alternative to the usual Gaussian process in measurement error models. In order to avoid difficulties in estimating the parameter of the mixing variable, we fixed it previously, as recommended by Lange et al. (J Am Stat Assoc 84:881–896, 1989) and Berkane et al. (Comput Stat Data Anal 18:255–267, 1994). The local influence method is used to assess the robustness aspects of the parameter estimates under some usual perturbation schemes. However, as the observed log-likelihood associated with this model involves some integrals, Cook’s well–known approach may be hard to apply to obtain measures of local influence. Instead, we develop local influence measures following the approach of Zhu and Lee (J R Stat Soc Ser B 63:121–126, 2001), which is based on the EM algorithm. Results obtained from a real data set are reported, illustrating the usefulness of the proposed methodology, its relative simplicity, adaptability and practical usage.  相似文献   

16.
Probabilistic Principal Component Analysis   总被引:2,自引:0,他引:2  
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA.  相似文献   

17.
The paper describes the methodology developed to carry out predictions of the acquired immune deficiency syndrome (AIDS) epidemic in Scotland. Information on the human immunodeficiency virus (HIV) epidemic comes from formal case reports of AIDS cases and HIV positive tests, reports from surveillance schemes and from special studies. These sources of information, up to the end of 1994, are reviewed. Prior information on aspects of HIV disease is available from various published and unpublished sources. A simple model of the HIV epidemic in Scotland is proposed and the information is summarized in terms of this model. Bayesian methodology, using Markov chain Monte Carlo methods, is described and used to predict future cases of AIDS in Scotland and people who will be living with AIDS in the years 1995–1999.  相似文献   

18.
Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach.  相似文献   

19.
The expectation maximization (EM) algorithm is a widely used parameter approach for estimating the parameters of multivariate multinomial mixtures in a latent class model. However, this approach has unsatisfactory computing efficiency. This study proposes a fuzzy clustering algorithm (FCA) based on both the maximum penalized likelihood (MPL) for the latent class model and the modified penalty fuzzy c-means (PFCM) for normal mixtures. Numerical examples confirm that the FCA-MPL algorithm is more efficient (that is, requires fewer iterations) and more computationally effective (measured by the approximate relative ratio of accurate classification) than the EM algorithm.  相似文献   

20.
This paper presents an alternative analysis approach to modeling data where a lower detection limit (LOD) and unobserved population heterogeneity exist in a longitudinal data set. Longitudinal data on viral loads in HIV/AIDS studies, for instance, show strong positive skewness and left-censoring. Normalizing such data using a logarithmic transformation seems to be unsuccessful. An alternative to such a transformation is to use a finite mixture model which is suitable for analyzing data which have skewed or multi-modal distributions. There is little work done to simultaneously take into account these features of longitudinal data. This paper develops a growth mixture Tobit model that deals with a LOD and heterogeneity among growth trajectories. The proposed methods are illustrated using simulated and real data from an AIDS clinical study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号