首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Models incorporating “latent” variables have been commonplace in financial, social, and behavioral sciences. Factor model, the most popular latent model, explains the continuous observed variables in a smaller set of latent variables (factors) in a matter of linear relationship. However, complex data often simultaneously display asymmetric dependence, asymptotic dependence, and positive (negative) dependence between random variables, which linearity and Gaussian distributions and many other extant distributions are not capable of modeling. This article proposes a nonlinear factor model that can model the above-mentioned variable dependence features but still possesses a simple form of factor structure. The random variables, marginally distributed as unit Fréchet distributions, are decomposed into max linear functions of underlying Fréchet idiosyncratic risks, transformed from Gaussian copula, and independent shared external Fréchet risks. By allowing the random variables to share underlying (latent) pervasive risks with random impact parameters, various dependence structures are created. This innovates a new promising technique to generate families of distributions with simple interpretations. We dive in the multivariate extreme value properties of the proposed model and investigate maximum composite likelihood methods for the impact parameters of the latent risks. The estimates are shown to be consistent. The estimation schemes are illustrated on several sets of simulated data, where comparisons of performance are addressed. We employ a bootstrap method to obtain standard errors in real data analysis. Real application to financial data reveals inherent dependencies that previous work has not disclosed and demonstrates the model’s interpretability to real data. Supplementary materials for this article are available online.  相似文献   

2.
Regression models play a dominant role in analyzing several data sets arising from areas like agricultural experiment, space experiment, biological experiment, financial modeling, etc. One of the major strings in developing the regression models is the assumption of the distribution of the error terms. It is customary to consider that the error terms follow the Gaussian distribution. However, there are some drawbacks of Gaussian errors such as the distribution being mesokurtic having kurtosis three. In many practical situations the variables under study may not be having mesokurtic but they are platykurtic. Hence, to analyze these sorts of platykurtic variables, a two-variable regression model with new symmetric distributed errors is developed and analyzed. The maximum likelihood (ML) estimators of the model parameters are derived. The properties of the ML estimators with respect to the new symmetrically distributed errors are also discussed. A simulation study is carried out to compare the proposed model with that of Gaussian errors and found that the proposed model performs better when the variables are platykurtic. Some applications of the developed model are also pointed out.  相似文献   

3.
This article deals with a semisupervised learning based on naive Bayes assumption. A univariate Gaussian mixture density is used for continuous input variables whereas a histogram type density is adopted for discrete input variables. The EM algorithm is used for the computation of maximum likelihood estimators of parameters in the model when we fix the number of mixing components for each continuous input variable. We carry out a model selection for choosing a parsimonious model among various fitted models based on an information criterion. A common density method is proposed for the selection of significant input variables. Simulated and real datasets are used to illustrate the performance of the proposed method.  相似文献   

4.
Mixture separation for mixed-mode data   总被引:3,自引:0,他引:3  
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.  相似文献   

5.
6.
In the framework of model-based cluster analysis, finite mixtures of Gaussian components represent an important class of statistical models widely employed for dealing with quantitative variables. Within this class, we propose novel models in which constraints on the component-specific variance matrices allow us to define Gaussian parsimonious clustering models. Specifically, the proposed models are obtained by assuming that the variables can be partitioned into groups resulting to be conditionally independent within components, thus producing component-specific variance matrices with a block diagonal structure. This approach allows us to extend the methods for model-based cluster analysis and to make them more flexible and versatile. In this paper, Gaussian mixture models are studied under the above mentioned assumption. Identifiability conditions are proved and the model parameters are estimated through the maximum likelihood method by using the Expectation-Maximization algorithm. The Bayesian information criterion is proposed for selecting the partition of the variables into conditionally independent groups. The consistency of the use of this criterion is proved under regularity conditions. In order to examine and compare models with different partitions of the set of variables a hierarchical algorithm is suggested. A wide class of parsimonious Gaussian models is also presented by parameterizing the component-variance matrices according to their spectral decomposition. The effectiveness and usefulness of the proposed methodology are illustrated with two examples based on real datasets.  相似文献   

7.
《统计学通讯:理论与方法》2012,41(16-17):3079-3093
The paper presents an extension of a new class of multivariate latent growth models (Bianconcini and Cagnone, 2012) to allow for covariate effects on manifest, latent variables and random effects. The new class of models combines: (i) multivariate latent curves that describe the temporal behavior of the responses, and (ii) a factor model that specifies the relationship between manifest and latent variables. Based on the Generalized Linear and Latent Variable Model framework (Bartholomew and Knott, 1999), the response variables are assumed to follow different distributions of the exponential family, with item-specific linear predictors depending on both latent variables and measurement errors. A full maximum likelihood method is used to estimate all the model parameters simultaneously. Data coming from the Data WareHouse of the University of Bologna are used to illustrate the methodology.  相似文献   

8.
Parsimonious Gaussian mixture models   总被引:3,自引:0,他引:3  
Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases. In particular, a class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm. The class of models includes parsimonious models that have not previously been developed. These models are applied to the analysis of chemical and physical properties of Italian wines and the chemical properties of coffee; the models are shown to give excellent clustering performance.  相似文献   

9.
Abstract. Continuous proportional outcomes are collected from many practical studies, where responses are confined within the unit interval (0,1). Utilizing Barndorff‐Nielsen and Jørgensen's simplex distribution, we propose a new type of generalized linear mixed‐effects model for longitudinal proportional data, where the expected value of proportion is directly modelled through a logit function of fixed and random effects. We establish statistical inference along the lines of Breslow and Clayton's penalized quasi‐likelihood (PQL) and restricted maximum likelihood (REML) in the proposed model. We derive the PQL/REML using the high‐order multivariate Laplace approximation, which gives satisfactory estimation of the model parameters. The proposed model and inference are illustrated by simulation studies and a data example. The simulation studies conclude that the fourth order approximate PQL/REML performs satisfactorily. The data example shows that Aitchison's technique of the normal linear mixed model for logit‐transformed proportional outcomes is not robust against outliers.  相似文献   

10.
A strategy is proposed to initialize the EM algorithm in the multivariate Gaussian mixture context. It consists in randomly drawing, with a low computational cost in many situations, initial mixture parameters in an appropriate space including all possible EM trajectories. This space is simply defined by two relations between the two first empirical moments and the mixture parameters satisfied by any EM iteration. An experimental study on simulated and real data sets clearly shows that this strategy outperforms classical methods, since it has the nice property to widely explore local maxima of the likelihood function.  相似文献   

11.
In this work, we modify finite mixtures of factor analysers to provide a method for simultaneous clustering of subjects and multivariate discrete outcomes. The joint clustering is performed through a suitable reparameterization of the outcome (column)-specific parameters. We develop an expectation–maximization-type algorithm for maximum likelihood parameter estimation where the maximization step is divided into orthogonal sub-blocks that refer to row and column-specific parameters, respectively. Model performance is evaluated via a simulation study with varying sample size, number of outcomes and row/column-specific clustering (partitions). We compare the performance of our model with the performance of standard model-based biclustering approaches. The proposed method is also demonstrated on a benchmark data set where a multivariate binary response is considered.  相似文献   

12.
We propose a new type of multivariate statistical model that permits non‐Gaussian distributions as well as the inclusion of conditional independence assumptions specified by a directed acyclic graph. These models feature a specific factorisation of the likelihood that is based on pair‐copula constructions and hence involves only univariate distributions and bivariate copulas, of which some may be conditional. We demonstrate maximum‐likelihood estimation of the parameters of such models and compare them to various competing models from the literature. A simulation study investigates the effects of model misspecification and highlights the need for non‐Gaussian conditional independence models. The proposed methods are finally applied to modeling financial return data. The Canadian Journal of Statistics 40: 86–109; 2012 © 2012 Statistical Society of Canada  相似文献   

13.
A multivariate generalized Poisson regression model based on the multivariate generalized Poisson distribution is defined and studied. The regression model can be used to describe a count data with any type of dispersion. The model allows for both positive and negative correlation between any pair of the response variables. The parameters of the regression model are estimated by using the maximum likelihood method. Some test statistics are discussed, and two numerical data sets are used to illustrate the applications of the multivariate count data regression model.  相似文献   

14.
In this article, we utilize a scale mixture of Gaussian random field as a tool for modeling spatial ordered categorical data with non-Gaussian latent variables. In fact, we assume a categorical random field is created by truncating a Gaussian Log-Gaussian latent variable model to accommodate heavy tails. Since the traditional likelihood approach for the considered model involves high-dimensional integrations which are computationally intensive, the maximum likelihood estimates are obtained using a stochastic approximation expectation–maximization algorithm. For this purpose, Markov chain Monte Carlo methods are employed to draw from the posterior distribution of latent variables. A numerical example illustrates the methodology.  相似文献   

15.
Dependent multivariate count data occur in several research studies. These data can be modelled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula-based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.  相似文献   

16.
Multivariate Poisson regression with covariance structure   总被引:1,自引:0,他引:1  
In recent years the applications of multivariate Poisson models have increased, mainly because of the gradual increase in computer performance. The multivariate Poisson model used in practice is based on a common covariance term for all the pairs of variables. This is rather restrictive and does not allow for modelling the covariance structure of the data in a flexible way. In this paper we propose inference for a multivariate Poisson model with larger structure, i.e. different covariance for each pair of variables. Maximum likelihood estimation, as well as Bayesian estimation methods are proposed. Both are based on a data augmentation scheme that reflects the multivariate reduction derivation of the joint probability function. In order to enlarge the applicability of the model we allow for covariates in the specification of both the mean and the covariance parameters. Extension to models with complete structure with many multi-way covariance terms is discussed. The method is demonstrated by analyzing a real life data set.  相似文献   

17.
Elimination of a nuisance variable is often non‐trivial and may involve the evaluation of an intractable integral. One approach to evaluate these integrals is to use the Laplace approximation. This paper concentrates on a new approximation, called the partial Laplace approximation, that is useful when the integrand can be partitioned into two multiplicative disjoint functions. The technique is applied to the linear mixed model and shows that the approximate likelihood obtained can be partitioned to provide a conditional likelihood for the location parameters and a marginal likelihood for the scale parameters equivalent to restricted maximum likelihood (REML). Similarly, the partial Laplace approximation is applied to the t‐distribution to obtain an approximate REML for the scale parameter. A simulation study reveals that, in comparison to maximum likelihood, the scale parameter estimates of the t‐distribution obtained from the approximate REML show reduced bias.  相似文献   

18.
Abstract. Maximum likelihood estimation in many classical statistical problems is beset by multimodality. This article explores several variations of deterministic annealing that tend to avoid inferior modes and find the dominant mode. In Bayesian settings, annealing can be tailored to find the dominant mode of the log posterior. Our annealing algorithms involve essentially trivial changes to existing optimization algorithms built on block relaxation or the EM or MM principle. Our examples include estimation with the multivariate t distribution, Gaussian mixture models, latent class analysis, factor analysis, multidimensional scaling and a one‐way random effects model. In the numerical examples explored, the proposed annealing strategies significantly improve the chances for locating the global maximum.  相似文献   

19.
In some situations, the distribution of the error terms of a multivariate linear regression model may depart from normality. This problem has been addressed, for example, by specifying a different parametric distribution family for the error terms, such as multivariate skewed and/or heavy-tailed distributions. A new solution is proposed, which is obtained by modelling the error term distribution through a finite mixture of multi-dimensional Gaussian components. The multivariate linear regression model is studied under this assumption. Identifiability conditions are proved and maximum likelihood estimation of the model parameters is performed using the EM algorithm. The number of mixture components is chosen through model selection criteria; when this number is equal to one, the proposal results in the classical approach. The performances of the proposed approach are evaluated through Monte Carlo experiments and compared to the ones of other approaches. In conclusion, the results obtained from the analysis of a real dataset are presented.  相似文献   

20.
We propose a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from introducing multidimensional instead of univariate scale variables for the mixture of scaled Gaussian family of distributions. In contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examine a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and t tails. For these latter cases, we provide maximum likelihood estimation of the parameters and illustrate their modelling flexibility on simulated and real data clustering examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号