首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Weak consistency and asymptotic normality is shown for a stochastic EM algorithm for censored data from a mixture of distributions under lognormal assumptions. The asymptotic properties hold for all parameters of the distributions, including the mixing parameter. In order to make parameter estimation meaningful it is necessary to know that the censored mixture distribution is identifiable. General conditions under which this is the case are given. The stochastic EM algorithm addressed in this paper is used for estimation of wood fibre length distributions based on optically measured data from cylindric wood samples (increment cores).  相似文献   

2.
The paper considers the clustering of two large sets of Internet traffic data consisting of information measured from headers of transmission control protocol packets collected on a busy arc of a university network connecting with the Internet. Packets are grouped into 'flows' thought to correspond to particular movements of information between one computer and another. The clustering is based on representing the flows as each sampled from one of a finite number of multinomial distributions and seeks to identify clusters of flows containing similar packet‐length distributions. The clustering uses the EM algorithm, and the data‐analytic and computational details are given.  相似文献   

3.
In this paper, we introduce a bivariate Kumaraswamy (BVK) distribution whose marginals are Kumaraswamy distributions. The cumulative distribution function of this bivariate model has absolutely continuous and singular parts. Representations for the cumulative and density functions are presented and properties such as marginal and conditional distributions, product moments and conditional moments are obtained. We show that the BVK model can be obtained from the Marshall and Olkin survival copula and obtain a tail dependence measure. The estimation of the parameters by maximum likelihood is discussed and the Fisher information matrix is determined. We propose an EM algorithm to estimate the parameters. Some simulations are presented to verify the performance of the direct maximum-likelihood estimation and the proposed EM algorithm. We also present a method to generate bivariate distributions from our proposed BVK distribution. Furthermore, we introduce a BVK distribution which has only an absolutely continuous part and discuss some of its properties. Finally, a real data set is analysed for illustrative purposes.  相似文献   

4.
In presence of interval-censored data, we propose a general three-state disease model with covariates. Such data can arise, for example, in epidemiologic studies of infectious disease where both the times of infection and disease onset are not directly observed, or in cancer studies where the time of disease metastasis is known up to a specified interval. The proposed model allows the distributions of the transition times between states to depend on covariates and the time in the previous state. An estimation procedure for the underlying distributions and the model coefficients is suggested with the EM algorithm. The EMS algorithm (Smoothed EM algorithm) is also considered to obtain smooth estimates of the distributions. The proposed method is illustrated with data from an AIDS study and a study of patients with malignant melanoma.  相似文献   

5.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

6.
For the data from multivariate t distributions, it is very hard to make an influence analysis based on the probability density function since its expression is intractable. In this paper, we present a technique for influence analysis based on the mixture distribution and EM algorithm. In fact, the multivariate t distribution can be considered as a particular Gaussian mixture by introducing the weights from the Gamma distribution. We treat the weights as the missing data and develop the influence analysis for the data from multivariate t distributions based on the conditional expectation of the complete-data log-likelihood function in the EM algorithm. Several case-deletion measures are proposed for detecting influential observations from multivariate t distributions. Two numerical examples are given to illustrate our methodology.  相似文献   

7.
We present a maximum likelihood estimation procedure for the multivariate frailty model. The estimation is based on a Monte Carlo EM algorithm. The expectation step is approximated by averaging over random samples drawn from the posterior distribution of the frailties using rejection sampling. The maximization step reduces to a standard partial likelihood maximization. We also propose a simple rule based on the relative change in the parameter estimates to decide on sample size in each iteration and a stopping time for the algorithm. An important new concept is acquiring absolute convergence of the algorithm through sample size determination and an efficient sampling technique. The method is illustrated using a rat carcinogenesis dataset and data on vase lifetimes of cut roses. The estimation results are compared with approximate inference based on penalized partial likelihood using these two examples. Unlike the penalized partial likelihood estimation, the proposed full maximum likelihood estimation method accounts for all the uncertainty while estimating standard errors for the parameters.  相似文献   

8.
In this paper, we consider three different mixture models based on the Birnbaum-Saunders (BS) distribution, viz., (1) mixture of two different BS distributions, (2) mixture of a BS distribution and a length-biased version of another BS distribution, and (3) mixture of a BS distribution and its length-biased version. For all these models, we study their characteristics including the shape of their density and hazard rate functions. For the maximum likelihood estimation of the model parameters, we use the EM algorithm. For the purpose of illustration, we analyze two data sets related to enzyme and depressive condition problems. In the case of the enzyme data, it is shown that Model 1 provides the best fit, while for the depressive condition data, it is shown all three models fit well with Model 3 providing the best fit.  相似文献   

9.
The maximum likelihood estimation of parameters of the Poisson binomial distribution, based on a sample with exact and grouped observations, is considered by applying the EM algorithm (Dempster et al, 1977). The results of Louis (1982) are used in obtaining the observed information matrix and accelerating the convergence of the EM algorithm substantially. The maximum likelihood estimation from samples consisting entirely of complete (Sprott, 1958) or grouped observations are treated as special cases of the estimation problem mentioned above. A brief account is given for the implementation of the EM algorithm when the sampling distribution is the Neyman Type A since the latter is a limiting form of the Poisson binomial. Numerical examples based on real data are included.  相似文献   

10.
This paper proposes a method for estimating the parameters in a generalized linear model with missing covariates. The missing covariates are assumed to come from a continuous distribution, and are assumed to be missing at random. In particular, Gaussian quadrature methods are used on the E-step of the EM algorithm, leading to an approximate EM algorithm. The parameters are then estimated using the weighted EM procedure given in Ibrahim (1990). This approximate EM procedure leads to approximate maximum likelihood estimates, whose standard errors and asymptotic properties are given. The proposed procedure is illustrated on a data set.  相似文献   

11.
The EM algorithm is a popular method for parameter estimation in situations where the data can be viewed as being incomplete. As each E-step visits each data point on a given iteration, the EM algorithm requires considerable computation time in its application to large data sets. Two versions, the incremental EM (IEM) algorithm and a sparse version of the EM algorithm, were proposed recently by Neal R.M. and Hinton G.E. in Jordan M.I. (Ed.), Learning in Graphical Models, Kluwer, Dordrecht, 1998, pp. 355–368 to reduce the computational cost of applying the EM algorithm. With the IEM algorithm, the available n observations are divided into B (B n) blocks and the E-step is implemented for only a block of observations at a time before the next M-step is performed. With the sparse version of the EM algorithm for the fitting of mixture models, only those posterior probabilities of component membership of the mixture that are above a specified threshold are updated; the remaining component-posterior probabilities are held fixed. In this paper, simulations are performed to assess the relative performances of the IEM algorithm with various number of blocks and the standard EM algorithm. In particular, we propose a simple rule for choosing the number of blocks with the IEM algorithm. For the IEM algorithm in the extreme case of one observation per block, we provide efficient updating formulas, which avoid the direct calculation of the inverses and determinants of the component-covariance matrices. Moreover, a sparse version of the IEM algorithm (SPIEM) is formulated by combining the sparse E-step of the EM algorithm and the partial E-step of the IEM algorithm. This SPIEM algorithm can further reduce the computation time of the IEM algorithm.  相似文献   

12.
We consider mixtures of general angular central Gaussian distributions as models for multimodal directional data. We prove consistency of the maximum‐likelihood estimates of model parameters and convergence of their numerical approximations based on an expectation–maximization algorithm. Then, we focus on mixtures of special angular central Gaussian distributions and discuss the details of a fast numerical algorithm, which allows to fit multimodal distributions to massive data, occurring, for example, in the study of the microstructure of materials. We illustrate the applicability with some data from fibre composites and from ceramic foams.  相似文献   

13.
The skew-generalized-normal distribution [Arellano-Valle, RB, Gómez, HW, Quintana, FA. A new class of skew-normal distributions. Comm Statist Theory Methods 2004;33(7):1465–1480] is a class of asymmetric normal distributions, which contains the normal and skew-normal distributions as special cases. The main virtues of this distribution is that it is easy to simulate from and it also supplies a genuine expectation–maximization (EM) algorithm for maximum likelihood estimation. In this paper, we extend the EM algorithm for linear regression models assuming skew-generalized-normal random errors and we develop a diagnostics analyses via local influence and generalized leverage, following Zhu and Lee's approach. This is because Cook's well-known approach would be more complicated to use to obtain measures of local influence. Finally, results obtained for a real data set are reported, illustrating the usefulness of the proposed method.  相似文献   

14.
The mixture distribution models are more useful than pure distributions in modeling of heterogeneous data sets. The aim of this paper is to propose mixture of Weibull–Poisson (WP) distributions to model heterogeneous data sets for the first time. So, a powerful alternative mixture distribution is created for modeling of the heterogeneous data sets. In the study, many features of the proposed mixture of WP distributions are examined. Also, the expectation maximization (EM) algorithm is used to determine the maximum-likelihood estimates of the parameters, and the simulation study is conducted for evaluating the performance of the proposed EM scheme. Applications for two real heterogeneous data sets are given to show the flexibility and potentiality of the new mixture distribution.  相似文献   

15.
Matrix-analytic Models and their Analysis   总被引:2,自引:0,他引:2  
We survey phase-type distributions and Markovian point processes, aspects of how to use such models in applied probability calculations and how to fit them to observed data. A phase-type distribution is defined as the time to absorption in a finite continuous time Markov process with one absorbing state. This class of distributions is dense and contains many standard examples like all combinations of exponential in series/parallel. A Markovian point process is governed by a finite continuous time Markov process (typically ergodic), such that points are generated at a Poisson intensity depending on the underlying state and at transitions; a main special case is a Markov-modulated Poisson process. In both cases, the analytic formulas typically contain matrix-exponentials, and the matrix formalism carried over when the models are used in applied probability calculations as in problems in renewal theory, random walks and queueing. The statistical analysis is typically based upon the EM algorithm, viewing the whole sample path of the background Markov process as the latent variable.  相似文献   

16.
This article aims to put forward a new method to solve the linear quantile regression problems based on EM algorithm using a location-scale mixture of the asymmetric Laplace error distribution. A closed form of the estimator of the unknown parameter vector β based on EM algorithm, is obtained. In addition, some simulations are conducted to illustrate the performance of the proposed method. Simulation results demonstrate that the proposed algorithm performs well. Finally, the classical Engel data is fitted and the Bootstrap confidence intervals for estimators are provided.  相似文献   

17.
In this paper, the Rayleigh–Lindley (RL) distribution is introduced, obtained by compounding the Rayleigh and Lindley discrete distributions, where the compounding procedure follows an approach similar to the one previously studied by Adamidis and Loukas in some other contexts. The resulting distribution is a two-parameter model, which is competitive with other parsimonious models such as the gamma and Weibull distributions. We study some properties of this new model such as the moments and the mean residual life. The estimation was approached via EM algorithm. The behavior of these estimators was studied in finite samples through a simulation study. Finally, we report two real data illustrations in order to show the performance of the proposed model versus other common two-parameter models in the literature. The main conclusion is that the model proposed can be a valid alternative to other competing models well established in the literature.  相似文献   

18.
In this paper, a new censoring scheme named by adaptive progressively interval censoring scheme is introduced. The competing risks data come from Marshall–Olkin extended Chen distribution under the new censoring scheme with random removals. We obtain the maximum likelihood estimators of the unknown parameters and the reliability function by using the EM algorithm based on the failure data. In addition, the bootstrap percentile confidence intervals and bootstrap-t confidence intervals of the unknown parameters are obtained. To test the equality of the competing risks model, the likelihood ratio tests are performed. Then, Monte Carlo simulation is conducted to evaluate the performance of the estimators under different sample sizes and removal schemes. Finally, a real data set is analyzed for illustration purpose.  相似文献   

19.
Zero-inflated models are commonly used for modeling count and continuous data with extra zeros. Inflations at one point or two points apart from zero for modeling continuous data have been discussed less than that of zero inflation. In this article, inflation at an arbitrary point α as a semicontinuous distribution is presented and the mean imputation for a continuous response is discussed as a cause of having semicontinuous data. Also, inflation at two points and generally at k arbitrary points and their relation to cell-mean imputation in the mixture of continuous distributions are studied. To analyze the imputed data, a mixture of semicontinuous distributions is used. The effects of covariates on the dependent variable in a mixture of k semicontinuous distributions with inflation at k points are also investigated. In order to find the parameter estimates, the method of expectation–maximization (EM) algorithm is used. In a real data of Iranian Households Income and Expenditure Survey (IHIES), it is shown how to obtain a proper estimate of the population variance when continuous missing at random responses are mean imputed.  相似文献   

20.
In this article, we propose mixtures of skew Laplace normal (SLN) distributions to model both skewness and heavy-tailedness in the neous data set as an alternative to mixtures of skew Student-t-normal (STN) distributions. We give the expectation–maximization (EM) algorithm to obtain the maximum likelihood (ML) estimators for the parameters of interest. We also analyze the mixture regression model based on the SLN distribution and provide the ML estimators of the parameters using the EM algorithm. The performance of the proposed mixture model is illustrated by a simulation study and two real data examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号