首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
The zero truncated inverse Gaussian–Poisson model, obtained by first mixing the Poisson model assuming its expected value has an inverse Gaussian distribution and then truncating the model at zero, is very useful when modelling frequency count data. A Bayesian analysis based on this statistical model is implemented on the word frequency counts of various texts, and its validity is checked by exploring the posterior distribution of the Pearson errors and by implementing posterior predictive consistency checks. The analysis based on this model is useful because it allows one to use the posterior distribution of the model mixing density as an approximation of the posterior distribution of the density of the word frequencies of the vocabulary of the author, which is useful to characterize the style of that author. The posterior distribution of the expectation and of measures of the variability of that mixing distribution can be used to assess the size and diversity of his vocabulary. An alternative analysis is proposed based on the inverse Gaussian-zero truncated Poisson mixture model, which is obtained by switching the order of the mixing and the truncation stages. Even though this second model fits some of the word frequency data sets more accurately than the first model, in practice the analysis based on it is not as useful because it does not allow one to estimate the word frequency distribution of the vocabulary.  相似文献   

2.
Abstract

In this paper we introduce continuous tree mixture model that is the mixture of undirected graphical models with tree structured graphs and is considered as multivariate analysis with a non parametric approach. We estimate its parameters, the component edge sets and mixture proportions through regularized maximum likalihood procedure. Our new algorithm, which uses expectation maximization algorithm and the modified version of Kruskal algorithm, simultaneosly estimates and prunes the mixture component trees. Simulation studies indicate this method performs better than the alternative Gaussian graphical mixture model. The proposed method is also applied to water-level data set and is compared with the results of Gaussian mixture model.  相似文献   

3.
Zero inflated Poisson regression is a model commonly used to analyze data with excessive zeros. Although many models have been developed to fit zero-inflated data, most of them strongly depend on the special features of the individual data. For example, there is a need for new models when dealing with truncated and inflated data. In this paper, we propose a new model that is sufficiently flexible to model inflation and truncation simultaneously, and which is a mixture of a multinomial logistic and a truncated Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts. The truncated Poisson regression models the counts that are assumed to follow a truncated Poisson distribution. The performance of our proposed model is evaluated through simulation studies, and our model is found to have the smallest mean absolute error and best model fit. In the empirical example, the data are truncated with inflated values of zero and fourteen, and the results show that our model has a better fit than the other competing models.  相似文献   

4.
The authors study the asymptotic behaviour of the likelihood ratio statistic for testing homogeneity in the finite mixture models of a general parametric distribution family. They prove that the limiting distribution of this statistic is the squared supremum of a truncated standard Gaussian process. The autocorrelation function of the Gaussian process is explicitly presented. A re‐sampling procedure is recommended to obtain the asymptotic p‐value. Three kernel functions, normal, binomial and Poisson, are used in a simulation study which illustrates the procedure.  相似文献   

5.
An inverse Gaussian mixture of Poisson distributions(the P-IG distribution) is considered as a model for species abundance data,, Minimum chi-square and maximum likelihood methods of estimation for the zero-truncated P-IG distribution are developed, Ihe performance of the P-IG distribution is illustrated and discussed for several well-known sets of insect abundance data.  相似文献   

6.
The development of models and methods for cure rate estimation has recently burgeoned into an important subfield of survival analysis. Much of the literature focuses on the standard mixture model. Recently, process-based models have been suggested. We focus on several models based on first passage times for Wiener processes. Whitmore and others have studied these models in a variety of contexts. Lee and Whitmore (Stat Sci 21(4):501–513, 2006) give a comprehensive review of a variety of first hitting time models and briefly discuss their potential as cure rate models. In this paper, we study the Wiener process with negative drift as a possible cure rate model but the resulting defective inverse Gaussian model is found to provide a poor fit in some cases. Several possible modifications are then suggested, which improve the defective inverse Gaussian. These modifications include: the inverse Gaussian cure rate mixture model; a mixture of two inverse Gaussian models; incorporation of heterogeneity in the drift parameter; and the addition of a second absorbing barrier to the Wiener process, representing an immunity threshold. This class of process-based models is a useful alternative to the standard model and provides an improved fit compared to the standard model when applied to many of the datasets that we have studied. Implementation of this class of models is facilitated using expectation-maximization (EM) algorithms and variants thereof, including the gradient EM algorithm. Parameter estimates for each of these EM algorithms are given and the proposed models are applied to both real and simulated data, where they perform well.  相似文献   

7.
Abstract

This paper introduces a multiscale Gaussian convolution model of Gaussian mixture (MGC-GMM) via the convolution of the GMM and a multiscale Gaussian window function. It is found that the MGC-GMM is still a Gaussian mixture model, and its parameters can be mapped back to the parameters of the GMM. Meanwhile, the multiscale probability density function (MPDF) of the MGC-GMM can be viewed as the mathematical expectation of a random process induced by the Gaussian window function and the GMM, which can be directly estimated by the use of sample data. Based on the estimated MPDF, a novel algorithm denoted by the MGC is proposed for the selection of model and the parameter estimates of the GMM, where the component number and the means of the GMM are respectively determined by the number and the locations of the maximum points of the MPDF, and the numerical algorithms for the weight and variance parameters of the GMM are derived. The MGC is suitable for the GMM with diagonal covariance matrices. A MGC-EM algorithm is also presented for the generalized GMM, where the GMM is estimated using the EM algorithm by taking the estimates from the MGC as initial parameters of the GMM model. The proposed algorithms are tested via a series of simulated sample sets from the given GMM models, and the results show that the proposed algorithms can effectively estimate the GMM model.  相似文献   

8.
This paper presents an EM algorithm for maximum likelihood estimation in generalized linear models with overdispersion. The algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, but with only slight variation it can be used for a completely unknown mixing distribution, giving a straightforward method for the fully non-parametric ML estimation of this distribution. This is of value because the ML estimates of the GLM parameters may be sensitive to the specification of a parametric form for the mixing distribution. A listing of a GLIM4 algorithm for fitting the overdispersed binomial logit model is given in an appendix.A simple method is given for obtaining correct standard errors for parameter estimates when using the EM algorithm.Several examples are discussed.  相似文献   

9.
When Gaussian errors are inappropriate in a multivariate linear regression setting, it is often assumed that the errors are iid from a distribution that is a scale mixture of multivariate normals. Combining this robust regression model with a default prior on the unknown parameters results in a highly intractable posterior density. Fortunately, there is a simple data augmentation (DA) algorithm and a corresponding Haar PX‐DA algorithm that can be used to explore this posterior. This paper provides conditions (on the mixing density) for geometric ergodicity of the Markov chains underlying these Markov chain Monte Carlo algorithms. Letting d denote the dimension of the response, the main result shows that the DA and Haar PX‐DA Markov chains are geometrically ergodic whenever the mixing density is generalized inverse Gaussian, log‐normal, inverted Gamma (with shape parameter larger than d /2) or Fréchet (with shape parameter larger than d /2). The results also apply to certain subsets of the Gamma, F and Weibull families.  相似文献   

10.
Zero-inflated data are more frequent when the data represent counts. However, there are practical situations in which continuous data contain an excess of zeros. In these cases, the zero-inflated Poisson, binomial or negative binomial models are not suitable. In order to reduce this gap, we propose the zero-spiked gamma-Weibull (ZSGW) model by mixing a distribution which is degenerate at zero with the gamma-Weibull distribution, which has positive support. The model attempts to estimate simultaneously the effects of explanatory variables on the response variable and the zero-spiked. We consider a frequentist analysis and a non-parametric bootstrap for estimating the parameters of the ZSGW regression model. We derive the appropriate matrices for assessing local influence on the model parameters. We illustrate the performance of the proposed regression model by means of a real data set (copaiba oil resin production) from a study carried out at the Department of Forest Science of the Luiz de Queiroz School of Agriculture, University of São Paulo. Based on the ZSGW regression model, we determine the explanatory variables that can influence the excess of zeros of the resin oil production and identify influential observations. We also prove empirically that the proposed regression model can be superior to the zero-adjusted inverse Gaussian regression model to fit zero-inflated positive continuous data.  相似文献   

11.
For clustering mixed categorical and continuous data, Lawrence and Krzanowski (1996) proposed a finite mixture model in which component densities conform to the location model. In the graphical models literature the location model is known as the homogeneous Conditional Gaussian model. In this paper it is shown that their model is not identifiable without imposing additional restrictions. Specifically, for g groups and m locations, (g!)m–1 distinct sets of parameter values (not including permutations of the group mixing parameters) produce the same likelihood function. Excessive shrinkage of parameter estimates in a simulation experiment reported by Lawrence and Krzanowski (1996) is shown to be an artifact of the model's non-identifiability. Identifiable finite mixture models can be obtained by imposing restrictions on the conditional means of the continuous variables. These new identified models are assessed in simulation experiments. The conditional mean structure of the continuous variables in the restricted location mixture models is similar to that in the underlying variable mixture models proposed by Everitt (1988), but the restricted location mixture models are more computationally tractable.  相似文献   

12.
Hea-Jung Kim 《Statistics》2013,47(3):325-341
This article derives and studies several types of conditional correlations. The correlations are obtained by a class of two-piece scale mixture skew-normal distributions. The class is obtained by applying a set of nonlinear constraints to the bivariate scale mixture of normal distributions. The correlations of the class are invariant with respect to the choice of the scale mixing function, however, they are dependent upon the type of the nonlinear truncation. Moreover, their respective upper and lower limits are no longer 1.00 and?1.00. They are useful for the truncated data analysis, the multivariate interdependence methods (such as the principal component analysis and the factor analysis), and the random truncation modelling. Some distributional properties and the Bayesian computation of the correlations are considered when developing necessary theories and providing illustrative examples, respectively. Two applications are also given to demonstrate the usefulness of the conditional correlations in a multivariate analysis.  相似文献   

13.
In this paper an expression for the inverse moment of order r is given for the truncated binomial and Poisson distributions. This enables one to obtain inverse moments in a finite series. Some applications and multivariate generalizations are also given. The method also enables one to obtain relations between inverse moments and factorial moments and distributions of sums of variables.  相似文献   

14.
The traditional mixture model assumes that a dataset is composed of several populations of Gaussian distributions. In real life, however, data often do not fit the restrictions of normality very well. It is likely that data from a single population exhibiting either asymmetrical or heavy-tail behavior could be erroneously modeled as two populations, resulting in suboptimal decisions. To avoid these pitfalls, we generalize the mixture model using adaptive kernel density estimators. Because kernel density estimators enforce no functional form, we can adapt to non-normal asymmetric, kurtotic, and tail characteristics in each population independently. This, in effect, robustifies mixture modeling. We adapt two computational algorithms, genetic algorithm with regularized Mahalanobis distance and genetic expectation maximization algorithm, to optimize the kernel mixture model (KMM) and use results from robust estimation theory in order to data-adaptively regularize both. Finally, we likewise extend the information criterion ICOMP to score the KMM. We use these tools to simultaneously select the best mixture model and classify all observations without making any subjective decisions. The performance of the KMM is demonstrated on two medical datasets; in both cases, we recover the clinically determined group structure and substantially improve patient classification rates over the Gaussian mixture model.  相似文献   

15.
The barely known continuous reciprocal inverse Gaussian distribution is used in this paper to introduce the Poisson-reciprocal inverse Gaussian discrete distribution. Several of its most relevant statistical properties are examined, some of them directly inherited from the reciprocal of the inverse Gaussian distribution. Furthermore, a mixed Poisson regression model that uses the reciprocal inverse Gaussian as mixing distribution is presented. Parameters estimation in this regression model is performed via an EM type algorithm. In light of the numerical results displayed in the paper, the distributions introduced in this work are competitive with the classical negative binomial and Poisson-inverse Gaussian distributions.  相似文献   

16.
Count response data often exhibit departures from the assumptions of standard Poisson generalized linear models. In particular, cluster level correlation of the data and truncation at zero are two common characteristics of such data. This paper describes a random components truncated Poisson model that can be applied to clustered and zero‐truncated count data. Residual maximum likelihood method estimators for the parameters of this model are developed and their use is illustrated using a dataset of non‐zero counts of sheets with edge‐strain defects in iron sheets produced by the Mobarekeh Steel Complex, Iran. The paper also reports on a small‐scale simulation study that supports the estimation procedure.  相似文献   

17.
This article considers the adaptive elastic net estimator for regularized mean regression from a Bayesian perspective. Representing the Laplace distribution as a mixture of Bartlett–Fejer kernels with a Gamma mixing density, a Gibbs sampling algorithm for the adaptive elastic net is developed. By introducing slice variables, it is shown that the mixture representation provides a Gibbs sampler that can be accomplished by sampling from either truncated normal or truncated Gamma distribution. The proposed method is illustrated using several simulation studies and analyzing a real dataset. Both simulation studies and real data analysis indicate that the proposed approach performs well.  相似文献   

18.
For frequency counts, the situation of extra zeros often arises in biomedical applications. This is demonstrated with count data from a dental epidemiological study in Belo Horizonte (the Belo Horizonte caries prevention study) which evaluated various programmes for reducing caries. Extra zeros, however, violate the variance–mean relationship of the Poisson error structure. This extra-Poisson variation can easily be explained by a special mixture model, the zero-inflated Poisson (ZIP) model. On the basis of the ZIP model, a graphical device is presented which not only summarizes the mixing distribution but also provides visual information about the overall mean. This device can be exploited to evaluate and compare various groups. Ways are discussed to include covariates and to develop an extension of the conventional Poisson regression. Finally, a method to evaluate intervention effects on the basis of the ZIP regression model is described and applied to the data of the Belo Horizonte caries prevention study.  相似文献   

19.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

20.
Daniel Hohmann 《Statistics》2013,47(2):348-362
We consider a two-component location mixture model with symmetric components, one of which is assumed to be known, the other is unknown. We show identifiability under assumptions on the tails of the characteristic function for the true underlying mixture, and also construct asymptotically normal estimates. The model is an extension of the contamination model in Bordes et al. [Semiparametric estimation of a two-component mixture model when a component is known, Scand. J. Statist. 33 (2006), pp. 733–752], and also related to a location mixture of one symmetric density as in Bordes et al. [Semiparametric estimation of a two component mixture model, Ann. Statist. 34 (2006), pp. 1204–1232]. We show by simulation that estimating the additional location parameter leads to a slight loss of efficiency as compared with the contamination model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号