首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this work, the multinomial mixture model is studied, through a maximum likelihood approach. The convergence of the maximum likelihood estimator to a set with characteristics of interest is shown. A method to select the number of mixture components is developed based on the form of the maximum likelihood estimator. A simulation study is then carried out to verify its behavior. Finally, two applications on real data of multinomial mixtures are presented.  相似文献   

2.
The paper considers the clustering of two large sets of Internet traffic data consisting of information measured from headers of transmission control protocol packets collected on a busy arc of a university network connecting with the Internet. Packets are grouped into 'flows' thought to correspond to particular movements of information between one computer and another. The clustering is based on representing the flows as each sampled from one of a finite number of multinomial distributions and seeks to identify clusters of flows containing similar packet‐length distributions. The clustering uses the EM algorithm, and the data‐analytic and computational details are given.  相似文献   

3.
This article presents a method for modeling endogenous selectivity in count data. As in the case of the switching-regression model, two regimes are distinguished with potentially different data-generating processes. The regime choice is allowed to be correlated with the observed count in each of the regimes. An estimable model is obtained by transforming the underlying processes to the bivariate normal distribution. An empirical application on trip count is provided.  相似文献   

4.
We consider the use of an EM algorithm for fitting finite mixture models when mixture component size is known. This situation can occur in a number of settings, where individual membership is unknown but aggregate membership is known. When the mixture component size, i.e., the aggregate mixture component membership, is known, it is common practice to treat only the mixing probability as known. This approach does not, however, entirely account for the fact that the number of observations within each mixture component is known, which may result in artificially incorrect estimates of parameters. By fully capitalizing on the available information, the proposed EM algorithm shows robustness to the choice of starting values and exhibits numerically stable convergence properties.  相似文献   

5.
The paper develops methods for the statistical analysis of outcomes of methadone maintenance treatment (MMT). Subjects for this study were a cohort of patients entering MMT in Sydney in 1986. Urine drug tests on these subjects were performed weekly during MMT, and were reported as either positive or negative for morphine, the marker of recent heroin use. To allow correlation between the repeated binary measurements, a marginal logistic model was fitted using the generalized estimating equation (GEE) approach and the alternating logistic regression approach. Conditional logistic models are also considered. Results of separate fitting to each patient and score tests suggest that there is substantial between-patient variation in response to MMT. To account for the population heterogeneity and to facilitate subject-specific inference, the conditional logistic model is extended by introducing random intercepts. The two, three and four group mixture models are also investigated. The model of best fit is a three group mixture model, in which about a quarter of the subjects have a poor response to MMT, with continued heroin use independent of daily dose of methadone; about a quarter of the subjects have a very good response, with little or no heroin use, again independent of dose; and about half the subjects responded in a dose-dependent fashion, with reduced heroin use while receiving higher doses of methadone. These findings are consistent with clinical experience. There is also an association between reduced drug use and increased duration in treatment. The mixture model is recommended since it is quite tractable in terms of estimation and model selection as well as being supported by clinical experience.  相似文献   

6.
We propose a latent variable model for informative missingness in longitudinal studies which is an extension of latent dropout class model. In our model, the value of the latent variable is affected by the missingness pattern and it is also used as a covariate in modeling the longitudinal response. So the latent variable links the longitudinal response and the missingness process. In our model, the latent variable is continuous instead of categorical and we assume that it is from a normal distribution. The EM algorithm is used to obtain the estimates of the parameter we are interested in and Gauss–Hermite quadrature is used to approximate the integration of the latent variable. The standard errors of the parameter estimates can be obtained from the bootstrap method or from the inverse of the Fisher information matrix of the final marginal likelihood. Comparisons are made to the mixed model and complete-case analysis in terms of a clinical trial dataset, which is Weight Gain Prevention among Women (WGPW) study. We use the generalized Pearson residuals to assess the fit of the proposed latent variable model.  相似文献   

7.
A developmental trajectory describes the course of behavior over time. Identifying multiple trajectories within an overall developmental process permits a focus on subgroups of particular interest. We introduce a framework for identifying trajectories by using the Expectation-Maximization (EM) algorithm to fit semiparametric mixtures of logistic distributions to longitudinal binary data. For performance comparison, we consider full maximization algorithms (PROC TRAJ in SAS), standard EM, and two other EM-based algorithms for speeding up convergence. Simulation shows that EM methods produce more accurate parameter estimates. The EM methodology is illustrated with a longitudinal dataset involving adolescents smoking behaviors.  相似文献   

8.
Among the diverse frameworks that have been proposed for regression analysis of angular data, the projected multivariate linear model provides a particularly appealing and tractable methodology. In this model, the observed directional responses are assumed to correspond to the angles formed by latent bivariate normal random vectors that are assumed to depend upon covariates through a linear model. This implies an angular normal distribution for the observed angles, and incorporates a regression structure through a familiar and convenient relationship. In this paper we extend this methodology to accommodate clustered data (e.g., longitudinal or repeated measures data) by formulating a marginal version of the model and basing estimation on an EM‐like algorithm in which correlation among within‐cluster responses is taken into account by incorporating a working correlation matrix into the M step. A sandwich estimator is used for the parameter estimates’ covariance matrix. The methodology is motivated and illustrated using an example involving clustered measurements of microbril angle on loblolly pine (Pinus taeda L.) Simulation studies are presented that evaluate the finite sample properties of the proposed fitting method. In addition, the relationship between within‐cluster correlation on the latent Euclidean vectors and the corresponding correlation structure for the observed angles is explored.  相似文献   

9.
《统计学通讯:理论与方法》2012,41(16-17):3079-3093
The paper presents an extension of a new class of multivariate latent growth models (Bianconcini and Cagnone, 2012) to allow for covariate effects on manifest, latent variables and random effects. The new class of models combines: (i) multivariate latent curves that describe the temporal behavior of the responses, and (ii) a factor model that specifies the relationship between manifest and latent variables. Based on the Generalized Linear and Latent Variable Model framework (Bartholomew and Knott, 1999), the response variables are assumed to follow different distributions of the exponential family, with item-specific linear predictors depending on both latent variables and measurement errors. A full maximum likelihood method is used to estimate all the model parameters simultaneously. Data coming from the Data WareHouse of the University of Bologna are used to illustrate the methodology.  相似文献   

10.
This article considers a class of estimators for the location and scale parameters in the location-scale model based on ‘synthetic data’ when the observations are randomly censored on the right. The asymptotic normality of the estimators is established using counting process and martingale techniques when the censoring distribution is known and unknown, respectively. In the case when the censoring distribution is known, we show that the asymptotic variances of this class of estimators depend on the data transformation and have a lower bound which is not achievable by this class of estimators. However, in the case that the censoring distribution is unknown and estimated by the Kaplan–Meier estimator, this class of estimators has the same asymptotic variance and attains the lower bound for variance for the case of known censoring distribution. This is different from censored regression analysis, where asymptotic variances depend on the data transformation. Our method has three valuable advantages over the method of maximum likelihood estimation. First, our estimators are available in a closed form and do not require an iterative algorithm. Second, simulation studies show that our estimators being moment-based are comparable to maximum likelihood estimators and outperform them when sample size is small and censoring rate is high. Third, our estimators are more robust to model misspecification than maximum likelihood estimators. Therefore, our method can serve as a competitive alternative to the method of maximum likelihood in estimation for location-scale models with censored data. A numerical example is presented to illustrate the proposed method.  相似文献   

11.
This article proposes a mixture double autoregressive model by introducing the flexibility of mixture models to the double autoregressive model, a novel conditional heteroscedastic model recently proposed in the literature. To make it more flexible, the mixing proportions are further assumed to be time varying, and probabilistic properties including strict stationarity and higher order moments are derived. Inference tools including the maximum likelihood estimation, an expectation–maximization (EM) algorithm for searching the estimator and an information criterion for model selection are carefully studied for the logistic mixture double autoregressive model, which has two components and is encountered more frequently in practice. Monte Carlo experiments give further support to the new models, and the analysis of an empirical example is also reported.  相似文献   

12.
In this article, we consider the order estimation of autoregressive models with incomplete data using the expectation–maximization (EM) algorithm-based information criteria. The criteria take the form of a penalization of the conditional expectation of the log-likelihood. The evaluation of the penalization term generally involves numerical differentiation and matrix inversion. We introduce a simplification of the penalization term for autoregressive model selection and we propose a penalty factor based on a resampling procedure in the criteria formula. The simulation results show the improvements yielded by the proposed method when compared with the classical information criteria for model selection with incomplete data.  相似文献   

13.
We present an algorithm for multivariate robust Bayesian linear regression with missing data. The iterative algorithm computes an approximative posterior for the model parameters based on the variational Bayes (VB) method. Compared to the EM algorithm, the VB method has the advantage that the variance for the model parameters is also computed directly by the algorithm. We consider three families of Gaussian scale mixture models for the measurements, which include as special cases the multivariate t distribution, the multivariate Laplace distribution, and the contaminated normal model. The observations can contain missing values, assuming that the missing data mechanism can be ignored. A Matlab/Octave implementation of the algorithm is presented and applied to solve three reference examples from the literature.  相似文献   

14.
A model for survival analysis is studied that is relevant for samples which are subject to multiple types of failure. In comparison with a more standard approach, through the appropriate use of hazard functions and transition probabilities, the model allows for a more accurate study of cause-specific failure with regard to both the timing and type of failure. A semiparametric specification of a mixture model is employed that is able to adjust for concomitant variables and allows for the assessment of their effects on the probabilities of eventual causes of failure through a generalized logistic model, and their effects on the corresponding conditional hazard functions by employing the Cox proportional hazards model. A carefully formulated estimation procedure is presented that uses an EM algorithm based on a profile likelihood construction. The methods discussed, which could also be used for reliability analysis, are applied to a prostate cancer data set.  相似文献   

15.
In this paper, a stochastic individual data model is considered. It accommodates occurrence times, reporting, and settlement delays and severity of every individual claims. This formulation gives rise to a model for the corresponding aggregate data under which classical chain ladder and Bornhuetter–Ferguson algorithms apply. A claims reserving algorithm is developed under this individual data model and comparisons of its performance with chain ladder and Bornhuetter–Ferguson algorithms are made to reveal the effects of using individual data to instead aggregate data. The research findings indicate a remarkable promotion in accuracy of loss reserving, especially when the claims amounts are not too heavy-tailed.  相似文献   

16.
In this paper we discuss graphical models for mixed types of continuous and discrete variables with incomplete data. We use a set of hyperedges to represent an observed data pattern. A hyperedge is a set of variables observed for a group of individuals. In a mixed graph with two types of vertices and two types of edges, dots and circles represent discrete and continuous variables respectively. A normal graph represents a graphical model and a hypergraph represents an observed data pattern. In terms of the mixed graph, we discuss decomposition of mixed graphical models with incomplete data, and we present a partial imputation method which can be used in the EM algorithm and the Gibbs sampler to speed their convergence. For a given mixed graphical model and an observed data pattern, we try to decompose a large graph into several small ones so that the original likelihood can be factored into a product of likelihoods with distinct parameters for small graphs. For the case that a graph cannot be decomposed due to its observed data pattern, we can impute missing data partially so that the graph can be decomposed.  相似文献   

17.
This article deals with the study of some properties of a mixture periodically correlated n-variate vector autoregressive (MPVAR) time series model, which extends the mixture time invariant parameter n-vector autoregressive (MVAR) model that has been recently studied by Fong et al. (2007 Fong, P.W., Li, W.K., Yau, C.W., Wong, C.S. (2007). On a mixture vector autoregressive model. The Canadian Journal of Statistics 35:135150.[Crossref], [Web of Science ®] [Google Scholar]). Our main contributions here are, on the one side, the obtaining of the second moment periodically stationary condition for a n-variate MPVARS(n; K; 2, …, 2) model; furthermore, the closed-form of the second moment is obtained and, on the other side, the estimation, via the Expectation-Maximization (EM) algorithm, of the coefficient matrices and the error variance matrix.  相似文献   

18.
In this article, we consider a model allowing the analysis of multivariate data, which can contain data attributes of different types (e.g., continuous, discrete, binary). This model is a two-level hierarchical model which supports a wide range of correlation structures and can accommodate overdispersed data. Maximum likelihood estimation of the model parameters is achieved with an automated Monte Carlo expectation maximization algorithm. Our method is tested in a simulation study in the bivariate case and applied to a data set dealing with beehive activity.  相似文献   

19.
We consider bivariate current status data with death which often occur in animal tumorigenicity experiments. Instead of observing exact tumor onset time, the existence of tumor is known at death time or sacrifice time. Such an incomplete data structure makes it difficult to investigate the effect of treatment on tumor onset times. Furthermore, when tumor onsets occur at two sites, information for the order of their onsets is unknown. A multistate model is applied to incorporate the sequential occurrence of events. For the inference of parameters, an EM algorithm is applied and a real NTP (National Toxicology Program) dataset is analyzed as an illustrative example.  相似文献   

20.
《随机性模型》2013,29(2):235-254
We propose a family of extended thinning operators, indexed by a parameter γ in [0, 1), with the boundary case of γ=0 corresponding to the well-known binomial thinning operator. The extended thinning operators can be used to construct a class of continuous-time Markov processes for modeling count time series data. The class of stationary distributions of these processes is called generalized discrete self-decomposable, denoted by DSD (γ). We obtain characterization results for the DSD (γ) class and investigate relationships among the classes for different γ's.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号