首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The zero truncated inverse Gaussian–Poisson model, obtained by first mixing the Poisson model assuming its expected value has an inverse Gaussian distribution and then truncating the model at zero, is very useful when modelling frequency count data. A Bayesian analysis based on this statistical model is implemented on the word frequency counts of various texts, and its validity is checked by exploring the posterior distribution of the Pearson errors and by implementing posterior predictive consistency checks. The analysis based on this model is useful because it allows one to use the posterior distribution of the model mixing density as an approximation of the posterior distribution of the density of the word frequencies of the vocabulary of the author, which is useful to characterize the style of that author. The posterior distribution of the expectation and of measures of the variability of that mixing distribution can be used to assess the size and diversity of his vocabulary. An alternative analysis is proposed based on the inverse Gaussian-zero truncated Poisson mixture model, which is obtained by switching the order of the mixing and the truncation stages. Even though this second model fits some of the word frequency data sets more accurately than the first model, in practice the analysis based on it is not as useful because it does not allow one to estimate the word frequency distribution of the vocabulary.  相似文献   

2.
The analysis of word frequency count data can be very useful in authorship attribution problems. Zero-truncated generalized inverse Gaussian–Poisson mixture models are very helpful in the analysis of these kinds of data because their model-mixing density estimates can be used as estimates of the density of the word frequencies of the vocabulary. It is found that this model provides excellent fits for the word frequency counts of very long texts, where the truncated inverse Gaussian–Poisson special case fails because it does not allow for the large degree of over-dispersion in the data. The role played by the three parameters of this truncated GIG-Poisson model is also explored. Our second goal is to compare the fit of the truncated GIG-Poisson mixture model with the fit of the model that results from switching the order of the mixing and truncation stages. A heuristic interpretation of the mixing distribution estimates obtained under this alternative GIG-truncated Poisson mixture model is also provided.  相似文献   

3.
Clustered multinomial data with random cluster sizes commonly appear in health, environmental and ecological studies. Traditional approaches for analyzing clustered multinomial data contemplate two assumptions. One of these assumptions is that cluster sizes are fixed, whereas the other demands cluster sizes to be positive. Randomness of the cluster sizes may be the determinant of the within-cluster correlation and between-cluster variation. We propose a baseline-category mixed model for clustered multinomial data with random cluster sizes based on Poisson mixed models. Our orthodox best linear unbiased predictor approach to this model depends only on the moment structure of unobserved distribution-free random effects. Our approach also consolidates the marginal and conditional modeling interpretations. Unlike the traditional methods, our approach can accommodate both random and zero cluster sizes. Two real-life multinomial data examples, crime data and food contamination data, are used to manifest our proposed methodology.  相似文献   

4.
The author extends to the Bayesian nonparametric context the multinomial goodness‐of‐fit tests due to Cressie & Read (1984). Her approach is suitable when the model of interest is a discrete distribution. She provides an explicit form for the tests, which are based on power‐divergence measures between a prior Dirichlet process that is highly concentrated around the model of interest and the corresponding posterior Dirichlet process. In addition to providing interesting special cases and useful approximations, she discusses calibration and the choice of test through examples.  相似文献   

5.
The purpose of this paper is to build a model for aggregate losses which constitutes a crucial step in evaluating premiums for health insurance systems. It aims at obtaining the predictive distribution of the aggregate loss within each age class of insured persons over the time horizon involved in planning employing the Bayesian methodology. The model proposed using the Bayesian approach is a generalization of the collective risk model, a commonly used model for analysing risk of an insurance system. Aggregate loss prediction is based on past information on size of loss, number of losses and size of population at risk. In modelling the frequency and severity of losses, the number of losses is assumed to follow a negative binomial distribution, individual loss sizes are independent and identically distributed exponential random variables, while the number of insured persons in a finite number of possible age groups is assumed to follow the multinomial distribution. Prediction of aggregate losses is based on the Gibbs sampling algorithm which incorporates the missing data approach.  相似文献   

6.
Categorical data frequently arise in applications in the Social Sciences. In such applications, the class of log-linear models, based on either a Poisson or (product) multinomial response distribution, is a flexible model class for inference and prediction. In this paper we consider the Bayesian analysis of both Poisson and multinomial log-linear models. It is often convenient to model multinomial or product multinomial data as observations of independent Poisson variables. For multinomial data, Lindley (1964) [20] showed that this approach leads to valid Bayesian posterior inferences when the prior density for the Poisson cell means factorises in a particular way. We develop this result to provide a general framework for the analysis of multinomial or product multinomial data using a Poisson log-linear model. Valid finite population inferences are also available, which can be particularly important in modelling social data. We then focus particular attention on multivariate normal prior distributions for the log-linear model parameters. Here, an improper prior distribution for certain Poisson model parameters is required for valid multinomial analysis, and we derive conditions under which the resulting posterior distribution is proper. We also consider the construction of prior distributions across models, and for model parameters, when uncertainty exists about the appropriate form of the model. We present classes of Poisson and multinomial models, invariant under certain natural groups of permutations of the cells. We demonstrate that, if prior belief concerning the model parameters is also invariant, as is the case in a ‘reference’ analysis, then the choice of prior distribution is considerably restricted. The analysis of multivariate categorical data in the form of a contingency table is considered in detail. We illustrate the methods with two examples.  相似文献   

7.
The objective of this article is to propose a method of exploring the mechanism of expectation formation based on qualitative survey data. The survey data are regarded as a sample from a multinomial distribution whose parameters are time-variant functions of inflation expectations. The parameters are estimated using a Bayesian recursive approach, which is a generalization of the Kalman filtering technique. For illustrative purposes, the method is applied to Japanese data. One notable finding from the empirical analysis is that the expectation formation process of Japanese enterprises has varied greatly over time.  相似文献   

8.
Studies producing longitudinal multinomial data arise in several subject areas. This article suggests a Bayesian approach to the analysis of such data. Rather than infusing a latent model structure, we develop a prior distribution for the multinomial parameters which reflects the longitudinal nature of the observations. This distribution is constructed by modifying the prior that posits independent Dirichlet distributions for the multinomial parameters across time. Posterior analysis, which is implemented using Monte Carlo methods, can then be used to assess the temporal behaviour of the multinomial parameters underlying the observed data. We test this methodology on simulated data, opinion polling data, and data from a study concerning the development of moral reasoning.  相似文献   

9.
ABSTRACT

The randomized response technique is an effective survey method designed to elicit sensitive information while ensuring the privacy of the respondents. In this article, we present some new results on the randomization response model in situations wherein one or two response variables are assumed to follow a multinomial distribution. For a single sensitive question, we use the well-known Hopkins randomization device to derive estimates, both under the assumption of truthful and untruthful responses, and present a technique for making pairwise comparisons. When there are two sensitive questions of interest, we derive a Pearson product moment correlation estimator based on the multinomial model assumption. This estimator may be used to quantify the linear relationship between two variables when multinomial response data are observed according to a randomized-response protocol.  相似文献   

10.
We propose a new set of test statistics to examine the association between two ordinal categorical variables X and Y after adjusting for continuous and/or categorical covariates Z. Our approach first fits multinomial (e.g., proportional odds) models of X and Y, separately, on Z. For each subject, we then compute the conditional distributions of X and Y given Z. If there is no relationship between X and Y after adjusting for Z, then these conditional distributions will be independent, and the observed value of (X, Y) for a subject is expected to follow the product distribution of these conditional distributions. We consider two simple ways of testing the null of conditional independence, both of which treat X and Y equally, in the sense that they do not require specifying an outcome and a predictor variable. The first approach adds these product distributions across all subjects to obtain the expected distribution of (X, Y) under the null and then contrasts it with the observed unconditional distribution of (X, Y). Our second approach computes "residuals" from the two multinomial models and then tests for correlation between these residuals; we define a new individual-level residual for models with ordinal outcomes. We present methods for computing p-values using either the empirical or asymptotic distributions of our test statistics. Through simulations, we demonstrate that our test statistics perform well in terms of power and Type I error rate when compared to proportional odds models which treat X as either a continuous or categorical predictor. We apply our methods to data from a study of visual impairment in children and to a study of cervical abnormalities in human immunodeficiency virus (HIV)-infected women. Supplemental materials for the article are available online.  相似文献   

11.
In this work, the multinomial mixture model is studied, through a maximum likelihood approach. The convergence of the maximum likelihood estimator to a set with characteristics of interest is shown. A method to select the number of mixture components is developed based on the form of the maximum likelihood estimator. A simulation study is then carried out to verify its behavior. Finally, two applications on real data of multinomial mixtures are presented.  相似文献   

12.
A Bayesian approach is utilized to test for periodicity in a dichotomous time series. Dichotomous data arise in a variety of circumstances when a variable takes on only two possible values. Conjugate and noninformative priors are considered as well as a hierarchical Bayes approach; the latter is considered the superior Bayes methodology. The situation of stochastic period lengths is also discussed. The generalization to the multinomial model is investigated to allow for the case that a variable takes on more than two possible values. In all cases decisions are made based on a Bayes factor. The proposed procedures are demonstrated on earthquake data in the central Virginia seismic zone  相似文献   

13.
In assessing the area under the ROC curve for the accuracy of a diagnostic test, it is imperative to detect and locate multiple abnormalities per image. This approach takes that into account by adopting a statistical model that allows for correlation between the reader scores of several regions of interest (ROI).

The ROI method of partitioning the image is taken. The readers give a score to each ROI in the image and the statistical model takes into account the correlation between the scores of the ROI's of an image in estimating test accuracy. The test accuracy is given by Pr[Y > Z] + (1/2)Pr[Y = Z], where Y is an ordinal diagnostic measurement of an affected ROI, and Z is the diagnostic measurement of an unaffected ROI. This way of measuring test accuracy is equivalent to the area under the ROC curve. The parameters are the parameters of a multinomial distribution, then based on the multinomial distribution, a Bayesian method of inference is adopted for estimating the test accuracy.

Using a multinomial model for the test results, a Bayesian method based on the predictive distribution of future diagnostic scores is employed to find the test accuracy. By resampling from the posterior distribution of the model parameters, samples from the posterior distribution of test accuracy are also generated. Using these samples, the posterior mean, standard deviation, and credible intervals are calculated in order to estimate the area under the ROC curve. This approach is illustrated by estimating the area under the ROC curve for a study of the diagnostic accuracy of magnetic resonance angiography for diagnosis of arterial atherosclerotic stenosis. A generalization to multiple readers and/or modalities is proposed.

A Bayesian way to estimate test accuracy is easy to perform with standard software packages and has the advantage of employing the efficient inclusion of information from prior related imaging studies.  相似文献   

14.
Statistical analyses of recurrent event data have typically been based on the missing at random assumption. One implication of this is that, if data are collected only when patients are on their randomized treatment, the resulting de jure estimator of treatment effect corresponds to the situation in which the patients adhere to this regime throughout the study. For confirmatory analysis of clinical trials, sensitivity analyses are required to investigate alternative de facto estimands that depart from this assumption. Recent publications have described the use of multiple imputation methods based on pattern mixture models for continuous outcomes, where imputation for the missing data for one treatment arm (e.g. the active arm) is based on the statistical behaviour of outcomes in another arm (e.g. the placebo arm). This has been referred to as controlled imputation or reference‐based imputation. In this paper, we use the negative multinomial distribution to apply this approach to analyses of recurrent events and other similar outcomes. The methods are illustrated by a trial in severe asthma where the primary endpoint was rate of exacerbations and the primary analysis was based on the negative binomial model. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

15.
Numerical methods are needed to obtain maximum-likelihood estimates (MLEs) in many problems. Computation time can be an issue for some likelihoods even with modern computing power. We consider one such problem where the assumed model is a random-clumped multinomial distribution. We compute MLEs for this model in parallel using the Toolkit for Advanced Optimization software library. The computations are performed on a distributed-memory cluster with low latency interconnect. We demonstrate that for larger problems, scaling the number of processes improves wall clock time significantly. An illustrative example shows how parallel MLE computation can be useful in a large data analysis. Our experience with a direct numerical approach indicates that more substantial gains may be obtained by making use of the specific structure of the random-clumped model.  相似文献   

16.
A Bayesian method is proposed for estimating the cell probabilities of several multinomial distributions. Parameters of different distributions are taken to be a priori exchangeable. The prior specification is based upon mixtures of a hierarchical distribution, referred to as the multivariate “Dirichlet-Dirichlet” distribution. The analysis is facilitated by a multinomial approximation relating to the multinomial-Dirichlet distribution. The posterior estimates depend upon measures of entropy for the various distributions and shrink the individual observed proportions towards values obtained by pooling the data across the distributions. As well as incorporating prior information they are particularly useful when some of the cell frequencies are zero. We use them to investigate a numerical classification of males of various vocations, according to cause of death.  相似文献   

17.
The zero-inflated Poisson (ZIP) distribution is widely used for modeling a count data set when the frequency of zeros is higher than the one expected under the Poisson distribution. There are many methods for making inferences for the inflation parameter in the ZIP models, e.g. the methods for testing Poisson (the inflation parameter is zero) versus ZIP distribution (the inflation parameter is positive). Most of these methods are based on the maximum likelihood estimators which do not have an explicit expression. However, the estimators which are obtained by the method of moments are powerful enough, easy to obtain and implement. In this paper, we propose an approach based on the method of moments for making inferences about the inflation parameter in the ZIP distribution. Our method is also compared to some recent methods via a simulation study and it is illustrated by an example.  相似文献   

18.
We propose a new type of stochastic ordering which imposes a monotone tendency in differences between one multinomial probability and a known standard one. An estimation procedure is proposed for the constrained maximum likelihood estimate, and then the asymptotic null distribution is derived for the likelihood ratio test statistic for testing equality of two multinomial distributions against the new stochastic ordering. An alternative test is also discussed based on Neyman modified minimum chi-square estimator. These tests are illustrated with a set of heart disease data.  相似文献   

19.
The author considers studies with multiple dependent primary endpoints. Testing hypotheses with multiple primary endpoints may require unmanageably large populations. Composite endpoints consisting of several binary events may be used to reduce a trial to a manageable size. The primary difficulties with composite endpoints are that different endpoints may have different clinical importance and that higher‐frequency variables may overwhelm effects of smaller, but equally important, primary outcomes. To compensate for these inconsistencies, we weight each type of event, and the total number of weighted events is counted. To reflect the mutual dependency of primary endpoints and to make the weighting method effective in small clinical trials, we use the Bayesian approach. We assume a multinomial distribution of multiple endpoints with Dirichlet priors and apply the Bayesian test of noninferiority to the calculation of weighting parameters. We use composite endpoints to test hypotheses of superiority in single‐arm and two‐arm clinical trials. The composite endpoints have a beta distribution. We illustrate this technique with an example. The results provide a statistical procedure for creating composite endpoints. Published 2013. This article is a U.S. Government work and is in the public domain in the USA.  相似文献   

20.
Classification error can lead to substantial biases in the estimation of gross flows from longitudinal data. We propose a method to adjust flow estimates for bias, based on fitting separate multinomial logistic models to the classification error probabilities and the true state transition probabilities using values of auxiliary variables. Our approach has the advantages that it does not require external information on misclassification rates, it permits the identification of factors that are related to misclassification and true transitions and it does not assume independence between classification errors at successive points in time. Constraining the prediction of the stocks to agree with the observed stocks protects against model misspecification. We apply the approach to data on women from the Panel Study of Income Dynamics with three categories of labour force status. The model fitted is shown to have interpretable coefficient estimates and to provide a good fit. Simulation results indicate good performance of the model in predicting the true flows and robustness against departures from the model postulated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号