首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications.  相似文献   

2.
Abstract

In this paper we introduce continuous tree mixture model that is the mixture of undirected graphical models with tree structured graphs and is considered as multivariate analysis with a non parametric approach. We estimate its parameters, the component edge sets and mixture proportions through regularized maximum likalihood procedure. Our new algorithm, which uses expectation maximization algorithm and the modified version of Kruskal algorithm, simultaneosly estimates and prunes the mixture component trees. Simulation studies indicate this method performs better than the alternative Gaussian graphical mixture model. The proposed method is also applied to water-level data set and is compared with the results of Gaussian mixture model.  相似文献   

3.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

4.
Daniel Hohmann 《Statistics》2013,47(2):348-362
We consider a two-component location mixture model with symmetric components, one of which is assumed to be known, the other is unknown. We show identifiability under assumptions on the tails of the characteristic function for the true underlying mixture, and also construct asymptotically normal estimates. The model is an extension of the contamination model in Bordes et al. [Semiparametric estimation of a two-component mixture model when a component is known, Scand. J. Statist. 33 (2006), pp. 733–752], and also related to a location mixture of one symmetric density as in Bordes et al. [Semiparametric estimation of a two component mixture model, Ann. Statist. 34 (2006), pp. 1204–1232]. We show by simulation that estimating the additional location parameter leads to a slight loss of efficiency as compared with the contamination model.  相似文献   

5.
In this paper, we consider a new mixture of varying coefficient models, in which each mixture component follows a varying coefficient model and the mixing proportions and dispersion parameters are also allowed to be unknown smooth functions. We systematically study the identifiability, estimation and inference for the new mixture model. The proposed new mixture model is rather general, encompassing many mixture models as its special cases such as mixtures of linear regression models, mixtures of generalized linear models, mixtures of partially linear models and mixtures of generalized additive models, some of which are new mixture models by themselves and have not been investigated before. The new mixture of varying coefficient model is shown to be identifiable under mild conditions. We develop a local likelihood procedure and a modified expectation–maximization algorithm for the estimation of the unknown non‐parametric functions. Asymptotic normality is established for the proposed estimator. A generalized likelihood ratio test is further developed for testing whether some of the unknown functions are constants. We derive the asymptotic distribution of the proposed generalized likelihood ratio test statistics and prove that the Wilks phenomenon holds. The proposed methodology is illustrated by Monte Carlo simulations and an analysis of a CO2‐GDP data set.  相似文献   

6.
In this article, we propose an efficient and robust estimation for the semiparametric mixture model that is a mixture of unknown location-shifted symmetric distributions. Our estimation is derived by minimizing the profile Hellinger distance (MPHD) between the model and a nonparametric density estimate. We propose a simple and efficient algorithm to find the proposed MPHD estimation. Monte Carlo simulation study is conducted to examine the finite sample performance of the proposed procedure and to compare it with other existing methods. Based on our empirical studies, the newly proposed procedure works very competitively compared to the existing methods for normal component cases and much better for non-normal component cases. More importantly, the proposed procedure is robust when the data are contaminated with outlying observations. A real data application is also provided to illustrate the proposed estimation procedure.  相似文献   

7.
《Econometric Reviews》2012,31(1):71-91
Abstract

This paper proposes the Bayesian semiparametric dynamic Nelson-Siegel model for estimating the density of bond yields. Specifically, we model the distribution of the yield curve factors according to an infinite Markov mixture (iMM). The model allows for time variation in the mean and covariance matrix of factors in a discrete manner, as opposed to continuous changes in these parameters such as the Time Varying Parameter (TVP) models. Estimating the number of regimes using the iMM structure endogenously leads to an adaptive process that can generate newly emerging regimes over time in response to changing economic conditions in addition to existing regimes. The potential of the proposed framework is examined using US bond yields data. The semiparametric structure of the factors can handle various forms of non-normalities including fat tails and nonlinear dependence between factors using a unified approach by generating new clusters capturing these specific characteristics. We document that modeling parameter changes in a discrete manner increases the model fit as well as forecasting performance at both short and long horizons relative to models with fixed parameters as well as the TVP model with continuous parameter changes. This is mainly due to fact that the discrete changes in parameters suit the typical low frequency monthly bond yields data characteristics better.  相似文献   

8.
ABSTRACT

In some applications, the quality of a process or product is best characterized by a functional relationship between a response variable and one or more explanatory variables. Profile monitoring is used to understand and to check the stability of this relationship or curve over time. In the existing simple linear regression profile models, it is often assumed that the data follow a single mode distribution and consequently the noise of the functional relationship follows a normal distribution. However, in some applications, it is likely that the data may follow a multiple-modes distribution. In this case, it is more appropriate to assume that the data follow a mixture profile. In this study, we focus on a mixture simple linear profile model, and propose new control schemes for Phase II monitoring. The proposed methods are shown to have good performance in a simulation study.  相似文献   

9.
Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach.  相似文献   

10.
Abstract

In this paper we are concerned with variable selection in finite mixture of semiparametric regression models. This task consists of model selection for non parametric component and variable selection for parametric part. Thus, we encountered separate model selections for every non parametric component of each sub model. To overcome this computational burden, we introduced a class of variable selection procedures for finite mixture of semiparametric regression models using penalized approach for variable selection. It is shown that the new method is consistent for variable selection. Simulations show that the performance of proposed method is good, and it consequently improves pervious works in this area and also requires much less computing power than existing methods.  相似文献   

11.
Model-based classification using latent Gaussian mixture models   总被引:1,自引:0,他引:1  
A novel model-based classification technique is introduced based on parsimonious Gaussian mixture models (PGMMs). PGMMs, which were introduced recently as a model-based clustering technique, arise from a generalization of the mixtures of factor analyzers model and are based on a latent Gaussian mixture model. In this paper, this mixture modelling structure is used for model-based classification and the particular area of application is food authenticity. Model-based classification is performed by jointly modelling data with known and unknown group memberships within a likelihood framework and then estimating parameters, including the unknown group memberships, within an alternating expectation-conditional maximization framework. Model selection is carried out using the Bayesian information criteria and the quality of the maximum a posteriori classifications is summarized using the misclassification rate and the adjusted Rand index. This new model-based classification technique gives excellent classification performance when applied to real food authenticity data on the chemical properties of olive oils from nine areas of Italy.  相似文献   

12.
The aim of this paper is to define a new family of probability density functions (MR pdf) based on the multiresolution analysis theory. Each function of this family can be seen as a particular type of density mixture.The MR pdf has advantages with regards to estimation over conventional mixtures and it is suitable to model a large variety of square integrable probability density functions.  相似文献   

13.
Mixture of linear regression models provide a popular treatment for modeling nonlinear regression relationship. The traditional estimation of mixture of regression models is based on Gaussian error assumption. It is well known that such assumption is sensitive to outliers and extreme values. To overcome this issue, a new class of finite mixture of quantile regressions (FMQR) is proposed in this article. Compared with the existing Gaussian mixture regression models, the proposed FMQR model can provide a complete specification on the conditional distribution of response variable for each component. From the likelihood point of view, the FMQR model is equivalent to the finite mixture of regression models based on errors following asymmetric Laplace distribution (ALD), which can be regarded as an extension to the traditional mixture of regression models with normal error terms. An EM algorithm is proposed to obtain the parameter estimates of the FMQR model by combining a hierarchical representation of the ALD. Finally, the iterated weighted least square estimation for each mixture component of the FMQR model is derived. Simulation studies are conducted to illustrate the finite sample performance of the estimation procedure. Analysis of an aphid data set is used to illustrate our methodologies.  相似文献   

14.
Grouped data are commonly encountered in applications. All data from a continuous population are grouped due to rounding of the individual observations. The Bernstein polynomial model is proposed as an approximate model in this paper for estimating a univariate density function based on grouped data. The coefficients of the Bernstein polynomial, as the mixture proportions of beta distributions, can be estimated using an EM algorithm. The optimal degree of the Bernstein polynomial can be determined using a change-point estimation method. The rate of convergence of the proposed density estimate to the true density is proved to be almost parametric by an acceptance–rejection argument used for generating random numbers. The proposed method is compared with some existing methods in a simulation study and is applied to the Chicken Embryo Data.  相似文献   

15.
We propose an alternative estimation method for the semiparametric accelerated failure time mixture cure model by incorporating the profile likelihood into the M-step of the EM algorithm. The proposed method performs as well as the existing methods when the censoring is light and better than the existing methods when the censoring is moderate from the simulation studies. Regarding to the computational time, the proposed method runs faster than the existing methods.  相似文献   

16.
This article addresses the density estimation problem using nonparametric Bayesian approach. It is considered hierarchical mixture models where the uncertainty about the mixing measure is modeled using the Dirichlet process. The main goal is to build more flexible models for density estimation. Extensions of the normal mixture model via Dirichlet process previously introduced in the literature are twofold. First, Dirichlet mixtures of skew-normal distributions are considered, say, in the first stage of the hierarchical model, the normal distribution is replaced by the skew-normal one. We also assume a skew-normal distribution as the center measure in the Dirichlet mixture of normal distributions. Some important results related to Bayesian inference in the location-scale skew-normal family are introduced. In particular, we obtain the stochastic representations for the full conditional distributions of the location and skewness parameters. The algorithm introduced by MacEachern and Müller in 1998 MacEachern, S.N., Müller, P. (1998). Estimating mixture of Dirichlet Process models. J. Computat. Graph. Statist. 7(2):223238.[Taylor & Francis Online], [Web of Science ®] [Google Scholar] is used to sample from the posterior distributions. The models are compared considering simulated data sets. Finally, the well-known Old Faithful Geyser data set is analyzed using the proposed models and the Dirichlet mixture of normal distributions. The model based on Dirichlet mixture of skew-normal distributions captured the data bimodality and skewness shown in the empirical distribution.  相似文献   

17.
Abstract

Weibull mixture models are widely used in a variety of fields for modeling phenomena caused by heterogeneous sources. We focus on circumstances in which original observations are not available, and instead the data comes in the form of a grouping of the original observations. We illustrate EM algorithm for fitting Weibull mixture models for grouped data and propose a bootstrap likelihood ratio test (LRT) for determining the number of subpopulations in a mixture model. The effectiveness of the LRT methods are investigated via simulation. We illustrate the utility of these methods by applying them to two grouped data applications.  相似文献   

18.
Consider data (x 1,y 1),...,(x n,y n), where each x i may be vector valued, and the distribution of y i given x i is a mixture of linear regressions. This provides a generalization of mixture models which do not include covariates in the mixture formulation. This mixture of linear regressions formulation has appeared in the computer science literature under the name Hierarchical Mixtures of Experts model.This model has been considered from both frequentist and Bayesian viewpoints. We focus on the Bayesian formulation. Previously, estimation of the mixture of linear regression model has been done through straightforward Gibbs sampling with latent variables. This paper contributes to this field in three major areas. First, we provide a theoretical underpinning to the Bayesian implementation by demonstrating consistency of the posterior distribution. This demonstration is done by extending results in Barron, Schervish and Wasserman (Annals of Statistics 27: 536–561, 1999) on bracketing entropy to the regression setting. Second, we demonstrate through examples that straightforward Gibbs sampling may fail to effectively explore the posterior distribution and provide alternative algorithms that are more accurate. Third, we demonstrate the usefulness of the mixture of linear regressions framework in Bayesian robust regression. The methods described in the paper are applied to two examples.  相似文献   

19.
Abstract

An improved forecasting model by merging two different computational models in predicting future volatility was proposed. The model integrates wavelet and EGARCH model where the pre-processing activity based on wavelet transform is performed with de-noising technique to eliminate noise in observed signal. The denoised signal is then feed into EGARCH model to forecast the volatility. The predictive capability of the proposed model is compared with the existing EGARCH model. The results show that the hybrid model has increased the accuracy of forecasting future volatility.  相似文献   

20.
A model-based classification technique is developed, based on mixtures of multivariate t-factor analyzers. Specifically, two related mixture models are developed and their classification efficacy studied. An AECM algorithm is used for parameter estimation, and convergence of these algorithms is determined using Aitken's acceleration. Two different techniques are proposed for model selection: the BIC and the ICL. Our classification technique is applied to data on red wine samples from Italy and to fatty acid measurements on Italian olive oils. These results are discussed and compared to more established classification techniques; under this comparison, our mixture models give excellent classification performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号