期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Robust mixture model cluster analysis using adaptive kernels

J. Andrew Howe Hamparsum Bozdogan 《Journal of applied statistics》2013,40(2):320-336

The traditional mixture model assumes that a dataset is composed of several populations of Gaussian distributions. In real life, however, data often do not fit the restrictions of normality very well. It is likely that data from a single population exhibiting either asymmetrical or heavy-tail behavior could be erroneously modeled as two populations, resulting in suboptimal decisions. To avoid these pitfalls, we generalize the mixture model using adaptive kernel density estimators. Because kernel density estimators enforce no functional form, we can adapt to non-normal asymmetric, kurtotic, and tail characteristics in each population independently. This, in effect, robustifies mixture modeling. We adapt two computational algorithms, genetic algorithm with regularized Mahalanobis distance and genetic expectation maximization algorithm, to optimize the kernel mixture model (KMM) and use results from robust estimation theory in order to data-adaptively regularize both. Finally, we likewise extend the information criterion ICOMP to score the KMM. We use these tools to simultaneously select the best mixture model and classify all observations without making any subjective decisions. The performance of the KMM is demonstrated on two medical datasets; in both cases, we recover the clinically determined group structure and substantially improve patient classification rates over the Gaussian mixture model. 相似文献

2.

On some mixture models based on the Birnbaum-Saunders distribution and associated inference

N. BalakrishnanRamesh C. Gupta Debasis KunduVíctor Leiva Antonio Sanhueza 《Journal of statistical planning and inference》2011,141(7):2175-2190

In this paper, we consider three different mixture models based on the Birnbaum-Saunders (BS) distribution, viz., (1) mixture of two different BS distributions, (2) mixture of a BS distribution and a length-biased version of another BS distribution, and (3) mixture of a BS distribution and its length-biased version. For all these models, we study their characteristics including the shape of their density and hazard rate functions. For the maximum likelihood estimation of the model parameters, we use the EM algorithm. For the purpose of illustration, we analyze two data sets related to enzyme and depressive condition problems. In the case of the enzyme data, it is shown that Model 1 provides the best fit, while for the depressive condition data, it is shown all three models fit well with Model 3 providing the best fit. 相似文献

3.

A diagnostic tool for mixture models

《Journal of Statistical Computation and Simulation》2012,82(4):293-313

Mixture models are used in a large number of applications yet there remain difficulties with maximum likelihood estimation. For instance, the likelihood surface for finite normal mixtures often has a large number of local maximizers, some of which do not give a good representation of the underlying features of the data. In this paper we present diagnostics that can be used to check the quality of an estimated mixture distribution. Particular attention is given to normal mixture models since they frequently arise in practice. We use the diagnostic tools for finite normal mixture problems and in the nonparametric setting where the difficult problem of determining a scale parameter for a normal mixture density estimate is considered. A large sample justification for the proposed methodology will be provided and we illustrate its implementation through several examples 相似文献

4.

In Memoriam: Bernard George Greenberg 1919-1985

James R. Abernathy Pranab K. Sen 《The American statistician》2013,67(3):183-184

The bivariate normal density with unit variance and correlation ρ is well known. We show that by integrating out ρ, the result is a function of the maximum norm. The Bayesian interpretation of this result is that if we put a uniform prior over ρ, then the marginal bivariate density depends only on the maximal magnitude of the variables. The square-shaped isodensity contour of this resulting marginal bivariate density can also be regarded as the equally weighted mixture of bivariate normal distributions over all possible correlation coefficients. This density links to the Khintchine mixture method of generating random variables. We use this method to construct the higher dimensional generalizations of this distribution. We further show that for each dimension, there is a unique multivariate density that is a differentiable function of the maximum norm and is marginally normal, and the bivariate density from the integral over ρ is its special case in two dimensions. 相似文献

5.

Simulating from irregular data: Kernel Carlo Simulation

J. Andrew Howe 《Journal of Statistical Computation and Simulation》2013,83(3):446-457

In this paper, we address the problem of simulating from a data-generating process for which the observed data do not follow a regular probability distribution. One existing method for doing this is bootstrapping, but it is incapable of interpolating between observed data. For univariate or bivariate data, in which a mixture structure can easily be identified, we could instead simulate from a Gaussian mixture model. In general, though, we would have the problem of identifying and estimating the mixture model. Instead of these, we introduce a non-parametric method for simulating datasets like this: Kernel Carlo Simulation. Our algorithm begins by using kernel density estimation to build a target probability distribution. Then, an envelope function that is guaranteed to be higher than the target distribution is created. We then use simple accept–reject sampling. Our approach is more flexible than others, can simulate intelligently across gaps in the data, and requires no subjective modelling decisions. With several univariate and multivariate examples, we show that our method returns simulated datasets that, compared with the observed data, retain the covariance structures and have distributional characteristics that are remarkably similar. 相似文献

6.

Mixture Models and the Krätzel Integral Transform

T. Princy 《统计学通讯:理论与方法》2013,42(2):390-405

Continuous mixture Weibull models arise in many areas of sciences such as reliability studies, communications theory, etc. Due to its wide applicability, we introduce a class of continuous mixture Weibull models which is a combination of Weibull and generalized gamma distributions. Some characteristics of the distribution are obtained. It is seen that Krätzel integral enters into the model naturally, and then the model can be called as a Krätzel density. Applications of the density function related to fading channels and ultrasonic backscatter signals modeling are discussed. A real data analysis is given to illustrate the use of this distribution. 相似文献

7.

Estimating the prevalence of low-lumbar spine bone mineral density in older men with or at risk for HIV infection using normal mixture models

Yungtai Lo 《Journal of applied statistics》2012,39(10):2247-2258

Bone mineral density decreases naturally as we age because existing bone tissue is reabsorbed by the body faster than new bone tissue is synthesized. When this occurs, bones lose calcium and other minerals. What is normal bone mineral density for men 50 years and older? Suitable diagnostic cutoff values for men are less well defined than for women. In this paper, we propose using normal mixture models to estimate the prevalence of low-lumbar spine bone mineral density in men 50 years and older with or at risk for human immunodeficiency virus infection when normal values of bone mineral density are not generally known. The Box–Cox power transformation is used to determine which transformation best suits normal mixture distributions. Parametric bootstrap tests are used to determine the number of mixture components and to determine whether the mixture components are homoscedastic or heteroscedastic. 相似文献

8.

Bernstein conditional density estimation with application to conditional distribution and regression functions

Mohamed Belalia Taoufik Bouezmarni Alexandre Leblanc 《Journal of the Korean Statistical Society》2019,48(3):356-383

In this paper we propose a smooth nonparametric estimation for the conditional probability density function based on a Bernstein polynomial representation. Our estimator can be written as a finite mixture of beta densities with data-driven weights. Using the Bernstein estimator of the conditional density function, we derive new estimators for the distribution function and conditional mean. We establish the asymptotic properties of the proposed estimators, by proving their asymptotic normality and by providing their asymptotic bias and variance. Simulation results suggest that the proposed estimators can outperform the Nadaraya–Watson estimator and, in some specific setups, the local linear kernel estimators. Finally, we use our estimators for modeling the income in Italy, conditional on year from 1951 to 1998, and have another look at the well known Old Faithful Geyser data. 相似文献

9.

Bayesian density estimation using bernstein polynomials

Sonia Petrone 《Revue canadienne de statistique》1999,27(1):105-126

We propose a Bayesian nonparametric procedure for density estimation, for data in a closed, bounded interval, say [0,1]. To this aim, we use a prior based on Bemstein polynomials. This corresponds to expressing the density of the data as a mixture of given beta densities, with random weights and a random number of components. The density estimate is then obtained as the corresponding predictive density function. Comparison with classical and Bayesian kernel estimates is provided. The proposed procedure is illustrated in an example; an MCMC algorithm for approximating the estimate is also discussed. 相似文献

10.

Asymptotic Normality in Mixtures of Power Series Distributions 总被引：1，自引：0，他引：1

DANKMAR BÖHNING VALENTIN PATILEA 《Scandinavian Journal of Statistics》2005,32(1):115-131

Abstract. The problem of estimating the individual probabilities of a discrete distribution is considered. The true distribution of the independent observations is a mixture of a family of power series distributions. First, we ensure identifiability of the mixing distribution assuming mild conditions. Next, the mixing distribution is estimated by non-parametric maximum likelihood and an estimator for individual probabilities is obtained from the corresponding marginal mixture density. We establish asymptotic normality for the estimator of individual probabilities by showing that, under certain conditions, the difference between this estimator and the empirical proportions is asymptotically negligible. Our framework includes Poisson, negative binomial and logarithmic series as well as binomial mixture models. Simulations highlight the benefit in achieving normality when using the proposed marginal mixture density approach instead of the empirical one, especially for small sample sizes and/or when interest is in the tail areas. A real data example is given to illustrate the use of the methodology. 相似文献

11.

Selection of suitable prior for the Bayesian mixture of a class of lifetime distributions under type-I censored datasets

Syed Mohsin Ali Kazmi Muhammad Aslam Sajid Ali Nasir Abbas 《Journal of applied statistics》2013,40(8):1639-1658

This paper explores the study on mixture of a class of probability density functions under type-I censoring scheme. In this paper, we mold a heterogeneous population by means of a two-component mixture of the class of probability density functions. The parameters of the class of mixture density functions are estimated and compared using the Bayes estimates under the squared-error and precautionary loss functions. A censored mixture dataset is simulated by probabilistic mixing for the computational purpose considering particular case of the Maxwell distribution. Closed-form expressions for the Bayes estimators along with their posterior risks are derived for censored as well as complete samples. Some stimulating comparisons and properties of the estimates are presented here. A factual dataset has also been for illustration. 相似文献

12.

Slice sampling mixture models

Maria Kalli Jim E. Griffin Stephen G. Walker 《Statistics and Computing》2011,21(1):93-105

We propose a more efficient version of the slice sampler for Dirichlet process mixture models described by Walker (Commun. Stat., Simul. Comput. 36:45–54, 2007). This new sampler allows for the fitting of infinite mixture models with a wide-range of prior specifications. To illustrate this flexibility we consider priors defined through infinite sequences of independent positive random variables. Two applications are considered: density estimation using mixture models and hazard function estimation. In each case we show how the slice efficient sampler can be applied to make inference in the models. In the mixture case, two submodels are studied in detail. The first one assumes that the positive random variables are Gamma distributed and the second assumes that they are inverse-Gaussian distributed. Both priors have two hyperparameters and we consider their effect on the prior distribution of the number of occupied clusters in a sample. Extensive computational comparisons with alternative “conditional” simulation techniques for mixture models using the standard Dirichlet process prior and our new priors are made. The properties of the new priors are illustrated on a density estimation problem. 相似文献

13.

A new look at the inverse Gaussian distribution with applications to insurance and economic data

Antonio Punzo 《Journal of applied statistics》2019,46(7):1260-1287

Insurance and economic data are often positive, and we need to take into account this peculiarity in choosing a statistical model for their distribution. An example is the inverse Gaussian (IG), which is one of the most famous and considered distributions with positive support. With the aim of increasing the use of the IG distribution on insurance and economic data, we propose a convenient mode-based parameterization yielding the reparametrized IG (rIG) distribution; it allows/simplifies the use of the IG distribution in various branches of statistics, and we give some examples. In nonparametric statistics, we define a smoother based on rIG kernels. By construction, the estimator is well-defined and does not allocate probability mass to unrealistic negative values. We adopt likelihood cross-validation to select the smoothing parameter. In robust statistics, we propose the contaminated IG distribution, a heavy-tailed generalization of the rIG distribution to accommodate mild outliers. Finally, for model-based clustering and semiparametric density estimation, we present finite mixtures of rIG distributions. We use the EM algorithm to obtain maximum likelihood estimates of the parameters of the mixture and contaminated models. We use insurance data about bodily injury claims, and economic data about incomes of Italian households, to illustrate the models. 相似文献

14.

Model-based clustering with non-elliptically contoured distributions

Dimitris Karlis Anais Santourian 《Statistics and Computing》2009,19(1):73-83

The majority of the existing literature on model-based clustering deals with symmetric components. In some cases, especially when dealing with skewed subpopulations, the estimate of the number of groups can be misleading; if symmetric components are assumed we need more than one component to describe an asymmetric group. Existing mixture models, based on multivariate normal distributions and multivariate t distributions, try to fit symmetric distributions, i.e. they fit symmetric clusters. In the present paper, we propose the use of finite mixtures of the normal inverse Gaussian distribution (and its multivariate extensions). Such finite mixture models start from a density that allows for skewness and fat tails, generalize the existing models, are tractable and have desirable properties. We examine both the univariate case, to gain insight, and the multivariate case, which is more useful in real applications. EM type algorithms are described for fitting the models. Real data examples are used to demonstrate the potential of the new model in comparison with existing ones. 相似文献

15.

A graphical evaluation of logistic ridge estimator in mixture experiments

Kadri Ulas Akay 《Journal of applied statistics》2014,41(6):1217-1232

In comparison to other experimental studies, multicollinearity appears frequently in mixture experiments, a special study area of response surface methodology, due to the constraints on the components composing the mixture. In the analysis of mixture experiments by using a special generalized linear model, logistic regression model, multicollinearity causes precision problems in the maximum-likelihood logistic regression estimate. Therefore, effects due to multicollinearity can be reduced to a certain extent by using alternative approaches. One of these approaches is to use biased estimators for the estimation of the coefficients. In this paper, we suggest the use of logistic ridge regression (RR) estimator in the cases where there is multicollinearity during the analysis of mixture experiments using logistic regression. Also, for the selection of the biasing parameter, we use fraction of design space plots for evaluating the effect of the logistic RR estimator with respect to the scaled mean squared error of prediction. The suggested graphical approaches are illustrated on the tumor incidence data set. 相似文献

16.

Bayesian nonparametric estimation of a copula

Juan Wu Xue Wang Stephen G. Walker 《Journal of Statistical Computation and Simulation》2015,85(1):103-116

A copula can fully characterize the dependence of multiple variables. The purpose of this paper is to provide a Bayesian nonparametric approach to the estimation of a copula, and we do this by mixing over a class of parametric copulas. In particular, we show that any bivariate copula density can be arbitrarily accurately approximated by an infinite mixture of Gaussian copula density functions. The model can be estimated by Markov Chain Monte Carlo methods and the model is demonstrated on both simulated and real data sets. 相似文献

17.

Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions

Maleki Mohsen Wraith Darren Arellano-Valle Reinaldo B. 《Statistics and Computing》2019,29(3):415-428

In this paper, we introduce an unrestricted skew-normal generalized hyperbolic (SUNGH) distribution for use in finite mixture modeling or clustering problems. The SUNGH is a broad class of flexible distributions that includes various other well-known asymmetric and symmetric families such as the scale mixtures of skew-normal, the skew-normal generalized hyperbolic and its corresponding symmetric versions. The class of distributions provides a much needed unified framework where the choice of the best fitting distribution can proceed quite naturally through either parameter estimation or by placing constraints on specific parameters and assessing through model choice criteria. The class has several desirable properties, including an analytically tractable density and ease of computation for simulation and estimation of parameters. We illustrate the flexibility of the proposed class of distributions in a mixture modeling context using a Bayesian framework and assess the performance using simulated and real data.

相似文献

18.

Semiparametric Likelihood Based Method for Goodness of Fit Tests and Estimation in Upgraded Mixture Models

Jing Qin 《Scandinavian Journal of Statistics》1998,25(4):681-691

We use Owen's (1988, 1990) empirical likelihood method in upgraded mixture models. Two groups of independent observations are available. One is z ₁, ..., z _n which is observed directly from a distribution F ( z ). The other one is x ₁, ..., x _m which is observed indirectly from F ( z ), where the x _is have density ∫ p ( x | z ) dF ( z ) and p ( x | z ) is a conditional density function. We are interested in testing H ₀: p ( x | z ) = p ( x | z ; θ ), for some specified smooth density function. A semiparametric likelihood ratio based statistic is proposed and it is shown that it converges to a chi-squared distribution. This is a simple method for doing goodness of fit tests, especially when x is a discrete variable with finitely many values. In addition, we discuss estimation of θ and F ( z ) when H ₀ is true. The connection between upgraded mixture models and general estimating equations is pointed out. 相似文献

19.

Bayesian temporal density estimation with autoregressive species sampling models

Youngin Jo Seongil Jo Yung-Seop Lee Jaeyong Lee 《Journal of the Korean Statistical Society》2018,47(3):248-262

We propose a novel Bayesian nonparametric (BNP) model, which is built on a class of species sampling models, for estimating density functions of temporal data. In particular, we introduce species sampling mixture models with temporal dependence. To accommodate temporal dependence, we define dependent species sampling models by modeling random support points and weights through an autoregressive model, and then we construct the mixture models based on the collection of these dependent species sampling models. We propose an algorithm to generate posterior samples and present simulation studies to compare the performance of the proposed models with competitors that are based on Dirichlet process mixture models. We apply our method to the estimation of densities for the price of apartment in Seoul, the closing price in Korea Composite Stock Price Index (KOSPI), and climate variables (daily maximum temperature and precipitation) of around the Korean peninsula. 相似文献

20.

Robust mixture modelling using the t distribution 总被引：2，自引：0，他引：2

Peel D. McLachlan G. J. 《Statistics and Computing》2000,10(4):339-348

Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise. 相似文献