首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Spatio-temporal processes are often high-dimensional, exhibiting complicated variability across space and time. Traditional state-space model approaches to such processes in the presence of uncertain data have been shown to be useful. However, estimation of state-space models in this context is often problematic since parameter vectors and matrices are of high dimension and can have complicated dependence structures. We propose a spatio-temporal dynamic model formulation with parameter matrices restricted based on prior scientific knowledge and/or common spatial models. Estimation is carried out via the expectation–maximization (EM) algorithm or general EM algorithm. Several parameterization strategies are proposed and analytical or computational closed form EM update equations are derived for each. We apply the methodology to a model based on an advection–diffusion partial differential equation in a simulation study and also to a dimension-reduced model for a Palmer Drought Severity Index (PDSI) data set.  相似文献   

2.
The skew-generalized-normal distribution [Arellano-Valle, RB, Gómez, HW, Quintana, FA. A new class of skew-normal distributions. Comm Statist Theory Methods 2004;33(7):1465–1480] is a class of asymmetric normal distributions, which contains the normal and skew-normal distributions as special cases. The main virtues of this distribution is that it is easy to simulate from and it also supplies a genuine expectation–maximization (EM) algorithm for maximum likelihood estimation. In this paper, we extend the EM algorithm for linear regression models assuming skew-generalized-normal random errors and we develop a diagnostics analyses via local influence and generalized leverage, following Zhu and Lee's approach. This is because Cook's well-known approach would be more complicated to use to obtain measures of local influence. Finally, results obtained for a real data set are reported, illustrating the usefulness of the proposed method.  相似文献   

3.
In this paper, we consider two well-known parametric long-term survival models, namely, the Bernoulli cure rate model and the promotion time (or Poisson) cure rate model. Assuming the long-term survival probability to depend on a set of risk factors, the main contribution is in the development of the stochastic expectation maximization (SEM) algorithm to determine the maximum likelihood estimates of the model parameters. We carry out a detailed simulation study to demonstrate the performance of the proposed SEM algorithm. For this purpose, we assume the lifetimes due to each competing cause to follow a two-parameter generalized exponential distribution. We also compare the results obtained from the SEM algorithm with those obtained from the well-known expectation maximization (EM) algorithm. Furthermore, we investigate a simplified estimation procedure for both SEM and EM algorithms that allow the objective function to be maximized to split into simpler functions with lower dimensions with respect to model parameters. Moreover, we present examples where the EM algorithm fails to converge but the SEM algorithm still works. For illustrative purposes, we analyze a breast cancer survival data. Finally, we use a graphical method to assess the goodness-of-fit of the model with generalized exponential lifetimes.  相似文献   

4.
Estimators derived from the expectation‐maximization (EM) algorithm are not robust since they are based on the maximization of the likelihood function. We propose an iterative proximal‐point algorithm based on the EM algorithm to minimize a divergence criterion between a mixture model and the unknown distribution that generates the data. The algorithm estimates in each iteration the proportions and the parameters of the mixture components in two separate steps. Resulting estimators are generally robust against outliers and misspecification of the model. Convergence properties of our algorithm are studied. The convergence of the introduced algorithm is discussed on a two‐component Weibull mixture entailing a condition on the initialization of the EM algorithm in order for the latter to converge. Simulations on Gaussian and Weibull mixture models using different statistical divergences are provided to confirm the validity of our work and the robustness of the resulting estimators against outliers in comparison to the EM algorithm. An application to a dataset of velocities of galaxies is also presented. The Canadian Journal of Statistics 47: 392–408; 2019 © 2019 Statistical Society of Canada  相似文献   

5.
We present an algorithm for multivariate robust Bayesian linear regression with missing data. The iterative algorithm computes an approximative posterior for the model parameters based on the variational Bayes (VB) method. Compared to the EM algorithm, the VB method has the advantage that the variance for the model parameters is also computed directly by the algorithm. We consider three families of Gaussian scale mixture models for the measurements, which include as special cases the multivariate t distribution, the multivariate Laplace distribution, and the contaminated normal model. The observations can contain missing values, assuming that the missing data mechanism can be ignored. A Matlab/Octave implementation of the algorithm is presented and applied to solve three reference examples from the literature.  相似文献   

6.
Parametric incomplete data models defined by ordinary differential equations (ODEs) are widely used in biostatistics to describe biological processes accurately. Their parameters are estimated on approximate models, whose regression functions are evaluated by a numerical integration method. Accurate and efficient estimations of these parameters are critical issues. This paper proposes parameter estimation methods involving either a stochastic approximation EM algorithm (SAEM) in the maximum likelihood estimation, or a Gibbs sampler in the Bayesian approach. Both algorithms involve the simulation of non-observed data with conditional distributions using Hastings–Metropolis (H–M) algorithms. A modified H–M algorithm, including an original local linearization scheme to solve the ODEs, is proposed to reduce the computational time significantly. The convergence on the approximate model of all these algorithms is proved. The errors induced by the numerical solving method on the conditional distribution, the likelihood and the posterior distribution are bounded. The Bayesian and maximum likelihood estimation methods are illustrated on a simulated pharmacokinetic nonlinear mixed-effects model defined by an ODE. Simulation results illustrate the ability of these algorithms to provide accurate estimates.  相似文献   

7.
In this article, we consider a competing cause scenario and assume the wider family of Conway–Maxwell–Poisson (COM–Poisson) distribution to model the number of competing causes. Assuming the type of the data to be interval censored, the main contribution is in developing the steps of the expectation maximization (EM) algorithm to determine the maximum likelihood estimates (MLEs) of the model parameters. A profile likelihood approach within the EM framework is proposed to estimate the COM–Poisson shape parameter. An extensive simulation study is conducted to evaluate the performance of the proposed EM algorithm. Model selection within the wider class of COM–Poisson distribution is carried out using likelihood ratio test and information-based criteria. A study to demonstrate the effect of model mis-specification is also carried out. Finally, the proposed estimation method is applied to a data on smoking cessation and a detailed analysis of the obtained results is presented.  相似文献   

8.
In this article, a general approach to latent variable models based on an underlying generalized linear model (GLM) with factor analysis observation process is introduced. We call these models Generalized Linear Factor Models (GLFM). The observations are produced from a general model framework that involves observed and latent variables that are assumed to be distributed in the exponential family. More specifically, we concentrate on situations where the observed variables are both discretely measured (e.g., binomial, Poisson) and continuously distributed (e.g., gamma). The common latent factors are assumed to be independent with a standard multivariate normal distribution. Practical details of training such models with a new local expectation-maximization (EM) algorithm, which can be considered as a generalized EM-type algorithm, are also discussed. In conjunction with an approximated version of the Fisher score algorithm (FSA), we show how to calculate maximum likelihood estimates of the model parameters, and to yield inferences about the unobservable path of the common factors. The methodology is illustrated by an extensive Monte Carlo simulation study and the results show promising performance.  相似文献   

9.
ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications.  相似文献   

10.
Parameters of a finite mixture model are often estimated by the expectation–maximization (EM) algorithm where the observed data log-likelihood function is maximized. This paper proposes an alternative approach for fitting finite mixture models. Our method, called the iterative Monte Carlo classification (IMCC), is also an iterative fitting procedure. Within each iteration, it first estimates the membership probabilities for each data point, namely the conditional probability of a data point belonging to a particular mixing component given that the data point value is obtained, it then classifies each data point into a component distribution using the estimated conditional probabilities and the Monte Carlo method. It finally updates the parameters of each component distribution based on the classified data. Simulation studies were conducted to compare IMCC with some other algorithms for fitting mixture normal, and mixture t, densities.  相似文献   

11.
Latent variable models are widely used for jointly modeling of mixed data including nominal, ordinal, count and continuous data. In this paper, we consider a latent variable model for jointly modeling relationships between mixed binary, count and continuous variables with some observed covariates. We assume that, given a latent variable, mixed variables of interest are independent and count and continuous variables have Poisson distribution and normal distribution, respectively. As such data may be extracted from different subpopulations, consideration of an unobserved heterogeneity has to be taken into account. A mixture distribution is considered (for the distribution of the latent variable) which accounts the heterogeneity. The generalized EM algorithm which uses the Newton–Raphson algorithm inside the EM algorithm is used to compute the maximum likelihood estimates of parameters. The standard errors of the maximum likelihood estimates are computed by using the supplemented EM algorithm. Analysis of the primary biliary cirrhosis data is presented as an application of the proposed model.  相似文献   

12.
Partially linear models (PLMs) are an important tool in modelling economic and biometric data and are considered as a flexible generalization of the linear model by including a nonparametric component of some covariate into the linear predictor. Usually, the error component is assumed to follow a normal distribution. However, the theory and application (through simulation or experimentation) often generate a great amount of data sets that are skewed. The objective of this paper is to extend the PLMs allowing the errors to follow a skew-normal distribution [A. Azzalini, A class of distributions which includes the normal ones, Scand. J. Statist. 12 (1985), pp. 171–178], increasing the flexibility of the model. In particular, we develop the expectation-maximization (EM) algorithm for linear regression models and diagnostic analysis via local influence as well as generalized leverage, following [H. Zhu and S. Lee, Local influence for incomplete-data models, J. R. Stat. Soc. Ser. B 63 (2001), pp. 111–126]. A simulation study is also conducted to evaluate the efficiency of the EM algorithm. Finally, a suitable transformation is applied in a data set on ragweed pollen concentration in order to fit PLMs under asymmetric distributions. An illustrative comparison is performed between normal and skew-normal errors.  相似文献   

13.
The fitting of age-dependent HIV incidence models to AIDS data is a computationally intensive task, particularly when allowance is made for non-proportional dependence of the infection rate on age. This paper presents a computational alternative to a very intensive method described by Rosenberg (1994). Our approach is to use the EM algorithm on a discretized form of the model used by Rosenberg (1994). The EM approach has certain attractive features including ease of implementation and flexibility. of model specification. It also conveniently generalizes to allow smoothed estimation and less detailed forms of age-specific AIDS data.  相似文献   

14.
The EM algorithm is a popular method for computing maximum likelihood estimates or posterior modes in models that can be formulated in terms of missing data or latent structure. Although easy implementation and stable convergence help to explain the popularity of the algorithm, its convergence is sometimes notoriously slow. In recent years, however, various adaptations have significantly improved the speed of EM while maintaining its stability and simplicity. One especially successful method for maximum likelihood is known as the parameter expanded EM or PXEM algorithm. Unfortunately, PXEM does not generally have a closed form M-step when computing posterior modes, even when the corresponding EM algorithm is in closed form. In this paper we confront this problem by adapting the one-step-late EM algorithm to PXEM to establish a fast closed form algorithm that improves on the one-step-late EM algorithm by insuring monotone convergence. We use this algorithm to fit a probit regression model and a variety of dynamic linear models, showing computational savings of as much as 99.9%, with the biggest savings occurring when the EM algorithm is the slowest to converge.  相似文献   

15.
In lifetime analysis of electric transformers, the maximum likelihood estimation has been proposed with the EM algorithm. However, it is not clear whether the EM algorithm offers a better solution compared to the simpler Newton-Raphson (NR) algorithm. In this article, the first objective is a systematic comparison of the EM algorithm with the NR algorithm in terms of convergence performance. The second objective is to examine the performance of Akaike's information criterion (AIC) for selecting a suitable distribution among candidate models via simulations. These methods are illustrated through the electric power transformer dataset.  相似文献   

16.
In this paper, the Rayleigh–Lindley (RL) distribution is introduced, obtained by compounding the Rayleigh and Lindley discrete distributions, where the compounding procedure follows an approach similar to the one previously studied by Adamidis and Loukas in some other contexts. The resulting distribution is a two-parameter model, which is competitive with other parsimonious models such as the gamma and Weibull distributions. We study some properties of this new model such as the moments and the mean residual life. The estimation was approached via EM algorithm. The behavior of these estimators was studied in finite samples through a simulation study. Finally, we report two real data illustrations in order to show the performance of the proposed model versus other common two-parameter models in the literature. The main conclusion is that the model proposed can be a valid alternative to other competing models well established in the literature.  相似文献   

17.
Linear mixed models are regularly applied to animal and plant breeding data to evaluate genetic potential. Residual maximum likelihood (REML) is the preferred method for estimating variance parameters associated with this type of model. Typically an iterative algorithm is required for the estimation of variance parameters. Two algorithms which can be used for this purpose are the expectation‐maximisation (EM) algorithm and the parameter expanded EM (PX‐EM) algorithm. Both, particularly the EM algorithm, can be slow to converge when compared to a Newton‐Raphson type scheme such as the average information (AI) algorithm. The EM and PX‐EM algorithms require specification of the complete data, including the incomplete and missing data. We consider a new incomplete data specification based on a conditional derivation of REML. We illustrate the use of the resulting new algorithm through two examples: a sire model for lamb weight data and a balanced incomplete block soybean variety trial. In the cases where the AI algorithm failed, a REML PX‐EM based on the new incomplete data specification converged in 28% to 30% fewer iterations than the alternative REML PX‐EM specification. For the soybean example a REML EM algorithm using the new specification converged in fewer iterations than the current standard specification of a REML PX‐EM algorithm. The new specification integrates linear mixed models, Henderson's mixed model equations, REML and the REML EM algorithm into a cohesive framework.  相似文献   

18.
The three-parameter asymmetric Laplace distribution (ALD) has received increasing attention in the field of quantile regression due to an important feature between its location and asymmetric parameters. On the basis of the representation of the ALD as a normal-variance–mean mixture with an exponential mixing distribution, this article develops EM and generalized EM algorithms, respectively, for computing regression quantiles of linear and nonlinear regression models. It is interesting to show that the proposed EM algorithm and the MM (Majorization–Minimization) algorithm for quantile regressions are really the same in terms of computation, since the updating formula of them are the same. This provides a good example that connects the EM and MM algorithms. Simulation studies show that the EM algorithm can successfully recover the true parameters in quantile regressions.  相似文献   

19.
In most applications, the parameters of a mixture of linear regression models are estimated by maximum likelihood using the expectation maximization (EM) algorithm. In this article, we propose the comparison of three algorithms to compute maximum likelihood estimates of the parameters of these models: the EM algorithm, the classification EM algorithm and the stochastic EM algorithm. The comparison of the three procedures was done through a simulation study of the performance (computational effort, statistical properties of estimators and goodness of fit) of these approaches on simulated data sets.

Simulation results show that the choice of the approach depends essentially on the configuration of the true regression lines and the initialization of the algorithms.  相似文献   

20.
The paper is focussing on some recent developments in nonparametric mixture distributions. It discusses nonparametric maximum likelihood estimation of the mixing distribution and will emphasize gradient type results, especially in terms of global results and global convergence of algorithms such as vertex direction or vertex exchange method. However, the NPMLE (or the algorithms constructing it) provides also an estimate of the number of components of the mixing distribution which might be not desirable for theoretical reasons or might be not allowed from the physical interpretation of the mixture model. When the number of components is fixed in advance, the before mentioned algorithms can not be used and globally convergent algorithms do not exist up to now. Instead, the EM algorithm is often used to find maximum likelihood estimates. However, in this case multiple maxima are often occuring. An example from a meta-analyis of vitamin A and childhood mortality is used to illustrate the considerable, inferential importance of identifying the correct global likelihood. To improve the behavior of the EM algorithm we suggest a combination of gradient function steps and EM steps to achieve global convergence leading to the EM algorithm with gradient function update (EMGFU). This algorithms retains the number of components to be exactly k and typically converges to the global maximum. The behavior of the algorithm is highlighted at hand of several examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号