首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 37 毫秒
1.
The K-means algorithm and the normal mixture model method are two common clustering methods. The K-means algorithm is a popular heuristic approach which gives reasonable clustering results if the component clusters are ball-shaped. Currently, there are no analytical results for this algorithm if the component distributions deviate from the ball-shape. This paper analytically studies how the K-means algorithm changes its classification rule as the normal component distributions become more elongated under the homoscedastic assumption and compares this rule with that of the Bayes rule from the mixture model method. We show that the classification rules of both methods are linear, but the slopes of the two classification lines change in the opposite direction as the component distributions become more elongated. The classification performance of the K-means algorithm is then compared to that of the mixture model method via simulation. The comparison, which is limited to two clusters, shows that the K-means algorithm provides poor classification performances consistently as the component distributions become more elongated while the mixture model method can potentially, but not necessarily, take advantage of this change and provide a much better classification performance.  相似文献   

2.
Many study designs yield a variety of outcomes from each subject clustered within an experimental unit. When these outcomes are of mixed data types, it is challenging to jointly model the effects of covariates on the responses using traditional methods. In this paper, we develop a Bayesian approach for a joint regression model of the different outcome variables and show that the fully conditional posterior distributions obtained under the model assumptions allow for estimation of posterior distributions using Gibbs sampling algorithm.  相似文献   

3.
A bayesian approach to dynamic tobit models   总被引:1,自引:0,他引:1  
This paper develops a posterior simulation method for a dynamic Tobit model. The major obstacle rooted in such a problem lies in high dimensional integrals, induced by dependence among censored observations, in the likelihood function. The primary contribution of this study is to develop a practical and efficient sampling scheme for the conditional posterior distributions of the censored (i.e., unobserved) data, so that the Gibbs sampler with the data augmentation algorithm is successfully applied. The substantial differences between this approach and some existing methods are highlighted. The proposed simulation method is investigated by means of a Monte Carlo study and applied to a regression model of Japanese exports of passenger cars to the U.S. subject to a non-tariff trade barrier.  相似文献   

4.
This paper develops a posterior simulation method for a dynamic Tobit model. The major obstacle rooted in such a problem lies in high dimensional integrals, induced by dependence among censored observations, in the likelihood function. The primary contribution of this study is to develop a practical and efficient sampling scheme for the conditional posterior distributions of the censored (i.e., unobserved) data, so that the Gibbs sampler with the data augmentation algorithm is successfully applied. The substantial differences between this approach and some existing methods are highlighted. The proposed simulation method is investigated by means of a Monte Carlo study and applied to a regression model of Japanese exports of passenger cars to the U.S. subject to a non-tariff trade barrier.  相似文献   

5.
We present a maximum likelihood estimation procedure for the multivariate frailty model. The estimation is based on a Monte Carlo EM algorithm. The expectation step is approximated by averaging over random samples drawn from the posterior distribution of the frailties using rejection sampling. The maximization step reduces to a standard partial likelihood maximization. We also propose a simple rule based on the relative change in the parameter estimates to decide on sample size in each iteration and a stopping time for the algorithm. An important new concept is acquiring absolute convergence of the algorithm through sample size determination and an efficient sampling technique. The method is illustrated using a rat carcinogenesis dataset and data on vase lifetimes of cut roses. The estimation results are compared with approximate inference based on penalized partial likelihood using these two examples. Unlike the penalized partial likelihood estimation, the proposed full maximum likelihood estimation method accounts for all the uncertainty while estimating standard errors for the parameters.  相似文献   

6.
The problem of simulating from distributions with intractable normalizing constants has received much attention in recent literature. In this article, we propose an asymptotic algorithm, the so-called double Metropolis–Hastings (MH) sampler, for tackling this problem. Unlike other auxiliary variable algorithms, the double MH sampler removes the need for exact sampling, the auxiliary variables being generated using MH kernels, and thus can be applied to a wide range of problems for which exact sampling is not available. For the problems for which exact sampling is available, it can typically produce the same accurate results as the exchange algorithm, but using much less CPU time. The new method is illustrated by various spatial models.  相似文献   

7.
Non‐random sampling is a source of bias in empirical research. It is common for the outcomes of interest (e.g. wage distribution) to be skewed in the source population. Sometimes, the outcomes are further subjected to sample selection, which is a type of missing data, resulting in partial observability. Thus, methods based on complete cases for skew data are inadequate for the analysis of such data and a general sample selection model is required. Heckman proposed a full maximum likelihood estimation method under the normality assumption for sample selection problems, and parametric and non‐parametric extensions have been proposed. We generalize Heckman selection model to allow for underlying skew‐normal distributions. Finite‐sample performance of the maximum likelihood estimator of the model is studied via simulation. Applications illustrate the strength of the model in capturing spurious skewness in bounded scores, and in modelling data where logarithm transformation could not mitigate the effect of inherent skewness in the outcome variable.  相似文献   

8.
The estimation of percentage defectives using a normal sampling plan will not be appropriate when the assumption of normality is violated. In this paper, we propose a sampling plan based on a more general symmetric family of distributions with the parameters estimated using the modified maximum likelihood (MML) procedures introduced by Tiku and Suresh . This sampling plan works well for most of the symmetric non-normal distributions. Some numerical study has also been carried out to show the superiority of the proposed plan.  相似文献   

9.
This paper focuses on estimating the number of species and the number of abundant species in a specific geographic region and, consequently, draw inferences on the number of rare species. The word 'species' is generic referring to any objects in a population that can be categorized. In the areas of biology, ecology, literature, etc, the species frequency distributions are usually severely skewed, in which case the population contains a few very abundant species and many rare ones. To model a such situation, we develop an asymmetric multinomial-Dirichlet probability model using species frequency data. Posterior distributions on the number of species and the number of abundant species are obtained and posterior inferences are induced using MCMC simulations. Simulations are used to demonstrate and evaluate the developed methodology. We apply the method to a DNA segment data set and a butterfly data set. Comparisons among different approaches to inferring the number of species are also discussed in this paper.  相似文献   

10.
ABSTRACT

In this paper, we consider an effective Bayesian inference for censored Student-t linear regression model, which is a robust alternative to the usual censored Normal linear regression model. Based on the mixture representation of the Student-t distribution, we propose a non-iterative Bayesian sampling procedure to obtain independently and identically distributed samples approximately from the observed posterior distributions, which is different from the iterative Markov Chain Monte Carlo algorithm. We conduct model selection and influential analysis using the posterior samples to choose the best fitted model and to detect latent outliers. We illustrate the performance of the procedure through simulation studies, and finally, we apply the procedure to two real data sets, one is the insulation life data with right censoring and the other is the wage rates data with left censoring, and we get some interesting results.  相似文献   

11.
This article describes a convenient method of selecting Metropolis– Hastings proposal distributions for multinomial logit models. There are two key ideas involved. The first is that multinomial logit models have a latent variable representation similar to that exploited by Albert and Chib (J Am Stat Assoc 88:669–679, 1993) for probit regression. Augmenting the latent variables replaces the multinomial logit likelihood function with the complete data likelihood for a linear model with extreme value errors. While no conjugate prior is available for this model, a least squares estimate of the parameters is easily obtained. The asymptotic sampling distribution of the least squares estimate is Gaussian with known variance. The second key idea in this paper is to generate a Metropolis–Hastings proposal distribution by conditioning on the estimator instead of the full data set. The resulting sampler has many of the benefits of so-called tailored or approximation Metropolis–Hastings samplers. However, because the proposal distributions are available in closed form they can be implemented without numerical methods for exploring the posterior distribution. The algorithm converges geometrically ergodically, its computational burden is minor, and it requires minimal user input. Improvements to the sampler’s mixing rate are investigated. The algorithm is also applied to partial credit models describing ordinal item response data from the 1998 National Assessment of Educational Progress. Its application to hierarchical models and Poisson regression are briefly discussed.  相似文献   

12.
Very often, in psychometric research, as in educational assessment, it is necessary to analyze item response from clustered respondents. The multiple group item response theory (IRT) model proposed by Bock and Zimowski [12] provides a useful framework for analyzing such type of data. In this model, the selected groups of respondents are of specific interest such that group-specific population distributions need to be defined. The usual assumption for parameter estimation in this model, which is that the latent traits are random variables following different symmetric normal distributions, has been questioned in many works found in the IRT literature. Furthermore, when this assumption does not hold, misleading inference can result. In this paper, we consider that the latent traits for each group follow different skew-normal distributions, under the centered parameterization. We named it skew multiple group IRT model. This modeling extends the works of Azevedo et al. [4], Bazán et al. [11] and Bock and Zimowski [12] (concerning the latent trait distribution). Our approach ensures that the model is identifiable. We propose and compare, concerning convergence issues, two Monte Carlo Markov Chain (MCMC) algorithms for parameter estimation. A simulation study was performed in order to evaluate parameter recovery for the proposed model and the selected algorithm concerning convergence issues. Results reveal that the proposed algorithm recovers properly all model parameters. Furthermore, we analyzed a real data set which presents asymmetry concerning the latent traits distribution. The results obtained by using our approach confirmed the presence of negative asymmetry for some latent trait distributions.  相似文献   

13.
Recently Beh and Farver investigated and evaluated three non‐iterative procedures for estimating the linear‐by‐linear parameter of an ordinal log‐linear model. The study demonstrated that these non‐iterative techniques provide estimates that are, for most types of contingency tables, statistically indistinguishable from estimates from Newton's unidimensional algorithm. Here we show how two of these techniques are related using the Box–Cox transformation. We also show that by using this transformation, accurate non‐iterative estimates are achievable even when a contingency table contains sampling zeros.  相似文献   

14.
Consider the exchangeable Bayesian hierarchical model where observations yi are independently distributed from sampling densities with unknown means, the means µi, are a random sample from a distribution g, and the parameters of g are assigned a known distribution h. A simple algorithm is presented for summarizing the posterior distribution based on Gibbs sampling and the Metropolis algorithm. The software program Matlab is used to implement the algorithm and provide a graphical output analysis. An binomial example is used to illustrate the flexibility of modeling possible using this algorithm. Methods of model checking and extensions to hierarchical regression modeling are discussed.  相似文献   

15.
This paper provides a practical simulation-based Bayesian analysis of parameter-driven models for time series Poisson data with the AR(1) latent process. The posterior distribution is simulated by a Gibbs sampling algorithm. Full conditional posterior distributions of unknown variables in the model are given in convenient forms for the Gibbs sampling algorithm. The case with missing observations is also discussed. The methods are applied to real polio data from 1970 to 1983.  相似文献   

16.
The problem of interest is to estimate the home run ability of 12 great major league players. The usual career home run statistics are the total number of home runs hit and the overall rate at which the players hit them. The observed rate provides a point estimate for a player's “true” rate of hitting a home run. However, this point estimate is incomplete in that it ignores sampling errors, it includes seasons where the player has unusually good or poor performances, and it ignores the general pattern of performance of a player over his career. The observed rate statistic also does not distinguish between the peak and career performance of a given player. Given the random effects model of West (1985), one can detect aberrant seasons and estimate parameters of interest by the inspection of various posterior distributions. Posterior moments of interest are easily computed by the application of the Gibbs sampling algorithm (Gelfand and Smith 1990). A player's career performance is modeled using a log-linear model, and peak and career home run measures for the 12 players are estimated.  相似文献   

17.
The authors offer a unified method extending traditional spatial dependence with normally distributed error terms to a new class of spatial models based on the biparametric exponential family of distributions. Joint modeling of the mean and variance (or precision) parameters is proposed in this family of distributions, including spatial correlation. The proposed models are applied for analyzing Colombian land concentration, assuming that the variable of interest follows normal, gamma, and beta distributions. In all cases, the models were fitted using Bayesian methodology with the Markov Chain Monte Carlo (MCMC) algorithm for sampling from joint posterior distribution of the model parameters.  相似文献   

18.
This paper considers a class of densities formed by taking the product of nonnegative polynomials and normal densities. These densities provide a rich class of distributions that can be used in modelling when faced with non-normal characteristics such as skewness and multimodality. In this paper we address inferential and computational issues arising in the practical implementation of this parametric family in the context of the linear model. Exact results are recorded for the conditional analysis of location-scale models and an importance sampling algorithm is developed for the implementation of a conditional analysis for the general linear model when using polynomial-normal distributions for the error.  相似文献   

19.
In this paper, we examine a nonlinear regression (NLR) model with homoscedastic errors which follows a flexible class of two-piece distributions based on the scale mixtures of normal (TP-SMN) family. The objective of using this family is to develop a robust NLR model. The TP-SMN is a rich class of distributions that covers symmetric/asymmetric and lightly/heavy-tailed distributions and is an alternative family to the well-known scale mixtures of skew-normal (SMSN) family studied by Branco and Dey [35]. A key feature of this study is using a new suitable hierarchical representation of the family to obtain maximum-likelihood estimates of model parameters via an EM-type algorithm. The performances of the proposed robust model are demonstrated using simulated and some natural real datasets and also compared to other well-known NLR models.  相似文献   

20.
This paper primarily is concerned with the sampling of the Fisher–Bingham distribution and we describe a slice sampling algorithm for doing this. A by-product of this task gave us an infinite mixture representation of the Fisher–Bingham distribution; the mixing distributions being based on the Dirichlet distribution. Finite numerical approximations are considered and a sampling algorithm based on a finite mixture approximation is compared with the slice sampling algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号