共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider the use of an EM algorithm for fitting finite mixture models when mixture component size is known. This situation can occur in a number of settings, where individual membership is unknown but aggregate membership is known. When the mixture component size, i.e., the aggregate mixture component membership, is known, it is common practice to treat only the mixing probability as known. This approach does not, however, entirely account for the fact that the number of observations within each mixture component is known, which may result in artificially incorrect estimates of parameters. By fully capitalizing on the available information, the proposed EM algorithm shows robustness to the choice of starting values and exhibits numerically stable convergence properties. 相似文献
2.
Bimodal truncated count distributions are frequently observed in aggregate survey data and in user ratings when respondents are mixed in their opinion. They also arise in censored count data, where the highest category might create an additional mode. Modeling bimodal behavior in discrete data is useful for various purposes, from comparing shapes of different samples (or survey questions) to predicting future ratings by new raters. The Poisson distribution is the most common distribution for fitting count data and can be modified to achieve mixtures of truncated Poisson distributions. However, it is suitable only for modeling equidispersed distributions and is limited in its ability to capture bimodality. The Conway–Maxwell–Poisson (CMP) distribution is a two-parameter generalization of the Poisson distribution that allows for over- and underdispersion. In this work, we propose a mixture of CMPs for capturing a wide range of truncated discrete data, which can exhibit unimodal and bimodal behavior. We present methods for estimating the parameters of a mixture of two CMP distributions using an EM approach. Our approach introduces a special two-step optimization within the M step to estimate multiple parameters. We examine computational and theoretical issues. The methods are illustrated for modeling ordered rating data as well as truncated count data, using simulated and real examples. 相似文献
3.
Jennifer S.K. Chan Anthony Y.C. Kuk James Bell & Charles Mc Gilchrist 《Australian & New Zealand Journal of Statistics》1998,40(1):1-10
The paper develops methods for the statistical analysis of outcomes of methadone maintenance treatment (MMT). Subjects for this study were a cohort of patients entering MMT in Sydney in 1986. Urine drug tests on these subjects were performed weekly during MMT, and were reported as either positive or negative for morphine, the marker of recent heroin use. To allow correlation between the repeated binary measurements, a marginal logistic model was fitted using the generalized estimating equation (GEE) approach and the alternating logistic regression approach. Conditional logistic models are also considered. Results of separate fitting to each patient and score tests suggest that there is substantial between-patient variation in response to MMT. To account for the population heterogeneity and to facilitate subject-specific inference, the conditional logistic model is extended by introducing random intercepts. The two, three and four group mixture models are also investigated. The model of best fit is a three group mixture model, in which about a quarter of the subjects have a poor response to MMT, with continued heroin use independent of daily dose of methadone; about a quarter of the subjects have a very good response, with little or no heroin use, again independent of dose; and about half the subjects responded in a dose-dependent fashion, with reduced heroin use while receiving higher doses of methadone. These findings are consistent with clinical experience. There is also an association between reduced drug use and increased duration in treatment. The mixture model is recommended since it is quite tractable in terms of estimation and model selection as well as being supported by clinical experience. 相似文献
4.
J. Portela 《统计学通讯:理论与方法》2013,42(20):3250-3263
In this work, the multinomial mixture model is studied, through a maximum likelihood approach. The convergence of the maximum likelihood estimator to a set with characteristics of interest is shown. A method to select the number of mixture components is developed based on the form of the maximum likelihood estimator. A simulation study is then carried out to verify its behavior. Finally, two applications on real data of multinomial mixtures are presented. 相似文献
5.
This article proposes a mixture double autoregressive model by introducing the flexibility of mixture models to the double autoregressive model, a novel conditional heteroscedastic model recently proposed in the literature. To make it more flexible, the mixing proportions are further assumed to be time varying, and probabilistic properties including strict stationarity and higher order moments are derived. Inference tools including the maximum likelihood estimation, an expectation–maximization (EM) algorithm for searching the estimator and an information criterion for model selection are carefully studied for the logistic mixture double autoregressive model, which has two components and is encountered more frequently in practice. Monte Carlo experiments give further support to the new models, and the analysis of an empirical example is also reported. 相似文献
6.
When a vector of sample proportions is not obtained through a simple random sampling, the covariance matrix for the sample vector can differ substantially from the one corresponding to the multinomial model (Wilson 1989). For example, clustering effects of subject effects in repeated-measure experiments can cause the variance of the observed proportions to be much larger than variances under the multinomial model. The phenomenon is generally referred to as overdispersion. Tallis (1962) proposed a model for identically distributed multinomials with a common measure of correlation and referred to it as the generalized multinomial model. This generalized multinomial model is extended in this article to account for overdispersion by allowing the vectors of proportions to vary according to a Dirichlet distribution. The generalized Dirichlet-multinomial model (as it is referred to here) allows for a second order of pairwise correlation among units, a type of assumption found reasonable in some biological data (Kupper and Haseman 1978) and introduced here to business data. An alternative derivation allowing for two kinds of variation is also considered. Asymptotic normal properties of parameter estimators are used to construct Wald statistics for testing hypotheses. The methods are illustrated with applications to performance evaluation monthly data and an integrated circuit yield analysis. 相似文献
7.
Maddalena Cavicchioli 《Scandinavian Journal of Statistics》2016,43(4):1192-1213
In this paper, we reconsider the mixture vector autoregressive model, which was proposed in the literature for modelling non‐linear time series. We complete and extend the stationarity conditions, derive a matrix formula in closed form for the autocovariance function of the process and prove a result on stable vector autoregressive moving‐average representations of mixture vector autoregressive models. For these results, we apply techniques related to a Markovian representation of vector autoregressive moving‐average processes. Furthermore, we analyse maximum likelihood estimation of model parameters by using the expectation–maximization algorithm and propose a new iterative algorithm for getting the maximum likelihood estimates. Finally, we study the model selection problem and testing procedures. Several examples, simulation experiments and an empirical application based on monthly financial returns illustrate the proposed procedures. 相似文献
8.
Selection of the important variables is one of the most important model selection problems in statistical applications. In this article, we address variable selection in finite mixture of generalized semiparametric models. To overcome computational burden, we introduce a class of variable selection procedures for finite mixture of generalized semiparametric models using penalized approach for variable selection. Estimation of nonparametric component will be done via multivariate kernel regression. It is shown that the new method is consistent for variable selection and the performance of proposed method will be assessed via simulation. 相似文献
9.
高维稀疏数据的特征选择是互联网舆情文本聚类分析的关键。借鉴罚模型思想,利用罚多项混合模型,给不显著影响聚类结果的特征予较重惩罚的方式实现特征选择,可有效选出代表舆情各类观点的典型词汇,实证应用中有较为理想的表现。 相似文献
10.
Mixture separation for mixed-mode data 总被引:3,自引:0,他引:3
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance. 相似文献
11.
Mahdi Teimouri 《Journal of applied statistics》2021,48(7):1154
Grouped data are frequently used in several fields of study. In this work, we use the expectation-maximization (EM) algorithm for fitting the skew-normal (SN) mixture model to the grouped data. Implementing the EM algorithm requires computing the one-dimensional integrals for each group or class. Our simulation study and real data analyses reveal that the EM algorithm not only always converges but also can be implemented in just a few seconds even when the number of components is large, contrary to the Bayesian paradigm that is computationally expensive. The accuracy of the EM algorithm and superiority of the SN mixture model over the traditional normal mixture model in modelling grouped data are demonstrated through the simulation and three real data illustrations. For implementing the EM algorithm, we use the package called ForestFit developed for R environment available at https://cran.r-project.org/web/packages/ForestFit/index.html. 相似文献
12.
When censored time-to-event data are used to map quantitative trait loci (QTL), the existence of nonsusceptible subjects entails
extra challenges. If the heterogeneous susceptibility is ignored or inappropriately handled, we may either fail to detect
the responsible genetic factors or find spuriously significant locations. In this article, an interval mapping method based
on parametric mixture cure models is proposed, which takes into consideration of nonsusceptible subjects. The proposed model
can be used to detect the QTL that are responsible for differential susceptibility and/or time-to-event trait distribution.
In particular, we propose a likelihood-based testing procedure with genome-wide significance levels calculated using a resampling
method. The performance of the proposed method and the importance of considering the heterogeneous susceptibility are demonstrated
by simulation studies and an application to survival data from an experiment on mice infected with Listeria monocytogenes. 相似文献
13.
This article deals with the study of some properties of a mixture periodically correlated n-variate vector autoregressive (MPVAR) time series model, which extends the mixture time invariant parameter n-vector autoregressive (MVAR) model that has been recently studied by Fong et al. (2007). Our main contributions here are, on the one side, the obtaining of the second moment periodically stationary condition for a n-variate MPVARS(n; K; 2, …, 2) model; furthermore, the closed-form of the second moment is obtained and, on the other side, the estimation, via the Expectation-Maximization (EM) algorithm, of the coefficient matrices and the error variance matrix. 相似文献
14.
Hunt (1996) implemented the finite mixture model approach to clustering in a program called MULTIMIX. The program is designed to cluster multivariate data that have categorical and continuous variables and that possibly contain missing values. This paper describes the approach taken to design MULTIMIX and how some of the statistical problems were dealt with. As an example, the program is used to cluster a large medical dataset. 相似文献
15.
AbstractWeibull mixture models are widely used in a variety of fields for modeling phenomena caused by heterogeneous sources. We focus on circumstances in which original observations are not available, and instead the data comes in the form of a grouping of the original observations. We illustrate EM algorithm for fitting Weibull mixture models for grouped data and propose a bootstrap likelihood ratio test (LRT) for determining the number of subpopulations in a mixture model. The effectiveness of the LRT methods are investigated via simulation. We illustrate the utility of these methods by applying them to two grouped data applications. 相似文献
16.
In this paper we discuss graphical models for mixed types of continuous and discrete variables with incomplete data. We use a set of hyperedges to represent an observed data pattern. A hyperedge is a set of variables observed for a group of individuals. In a mixed graph with two types of vertices and two types of edges, dots and circles represent discrete and continuous variables respectively. A normal graph represents a graphical model and a hypergraph represents an observed data pattern. In terms of the mixed graph, we discuss decomposition of mixed graphical models with incomplete data, and we present a partial imputation method which can be used in the EM algorithm and the Gibbs sampler to speed their convergence. For a given mixed graphical model and an observed data pattern, we try to decompose a large graph into several small ones so that the original likelihood can be factored into a product of likelihoods with distinct parameters for small graphs. For the case that a graph cannot be decomposed due to its observed data pattern, we can impute missing data partially so that the graph can be decomposed. 相似文献
17.
We revisit the problem of estimating the proportion π of true null hypotheses where a large scale of parallel hypothesis tests are performed independently. While the proportion is a quantity of interest in its own right in applications, the problem has arisen in assessing or controlling an overall false discovery rate. On the basis of a Bayes interpretation of the problem, the marginal distribution of the p-value is modeled in a mixture of the uniform distribution (null) and a non-uniform distribution (alternative), so that the parameter π of interest is characterized as the mixing proportion of the uniform component on the mixture. In this article, a nonparametric exponential mixture model is proposed to fit the p-values. As an alternative approach to the convex decreasing mixture model, the exponential mixture model has the advantages of identifiability, flexibility, and regularity. A computation algorithm is developed. The new approach is applied to a leukemia gene expression data set where multiple significance tests over 3,051 genes are performed. The new estimate for π with the leukemia gene expression data appears to be about 10% lower than the other three estimates that are known to be conservative. Simulation results also show that the new estimate is usually lower and has smaller bias than the other three estimates. 相似文献
18.
Manufacturers want to assess the quality andreliability of their products. Specifically, they want to knowthe exact number of failures from the sales transacted duringa particular month. Information available today is sometimesincomplete as many companies analyze their failure data simplycomparing sales for a total month from a particular departmentwith the total number of claims registered for that given month.This information—called marginal count data—is, thus,incomplete as it does not give the exact number of failures ofthe specific products that were sold in a particular month. Inthis paper we discuss nonparametric estimation of the mean numbersof failures for repairable products and the failure probabilitiesfor nonrepairable products. We present a nonhomogeneous Poissonprocess model for repairable products and a multinomial modeland its Poisson approximation for nonrepairable products. A numericalexample is given and a simulation is carried out to evaluatethe proposed methods of estimating failure probabilities undera number of possible situations. 相似文献
19.
Ying-zi Fu 《统计学通讯:理论与方法》2013,42(20):5918-5932
ABSTRACTIn this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology. 相似文献
20.
In this article, we consider the order estimation of autoregressive models with incomplete data using the expectation–maximization (EM) algorithm-based information criteria. The criteria take the form of a penalization of the conditional expectation of the log-likelihood. The evaluation of the penalization term generally involves numerical differentiation and matrix inversion. We introduce a simplification of the penalization term for autoregressive model selection and we propose a penalty factor based on a resampling procedure in the criteria formula. The simulation results show the improvements yielded by the proposed method when compared with the classical information criteria for model selection with incomplete data. 相似文献