首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Categorical data frequently arise in applications in the Social Sciences. In such applications, the class of log-linear models, based on either a Poisson or (product) multinomial response distribution, is a flexible model class for inference and prediction. In this paper we consider the Bayesian analysis of both Poisson and multinomial log-linear models. It is often convenient to model multinomial or product multinomial data as observations of independent Poisson variables. For multinomial data, Lindley (1964) [20] showed that this approach leads to valid Bayesian posterior inferences when the prior density for the Poisson cell means factorises in a particular way. We develop this result to provide a general framework for the analysis of multinomial or product multinomial data using a Poisson log-linear model. Valid finite population inferences are also available, which can be particularly important in modelling social data. We then focus particular attention on multivariate normal prior distributions for the log-linear model parameters. Here, an improper prior distribution for certain Poisson model parameters is required for valid multinomial analysis, and we derive conditions under which the resulting posterior distribution is proper. We also consider the construction of prior distributions across models, and for model parameters, when uncertainty exists about the appropriate form of the model. We present classes of Poisson and multinomial models, invariant under certain natural groups of permutations of the cells. We demonstrate that, if prior belief concerning the model parameters is also invariant, as is the case in a ‘reference’ analysis, then the choice of prior distribution is considerably restricted. The analysis of multivariate categorical data in the form of a contingency table is considered in detail. We illustrate the methods with two examples.  相似文献   

2.
In this article we consider the sample size determination problem in the context of robust Bayesian parameter estimation of the Bernoulli model. Following a robust approach, we consider classes of conjugate Beta prior distributions for the unknown parameter. We assume that inference is robust if posterior quantities of interest (such as point estimates and limits of credible intervals) do not change too much as the prior varies in the selected classes of priors. For the sample size problem, we consider criteria based on predictive distributions of lower bound, upper bound and range of the posterior quantity of interest. The sample size is selected so that, before observing the data, one is confident to observe a small value for the posterior range and, depending on design goals, a large (small) value of the lower (upper) bound of the quantity of interest. We also discuss relationships with and comparison to non robust and non informative Bayesian methods.  相似文献   

3.
Bayesian methods are often used to reduce the sample sizes and/or increase the power of clinical trials. The right choice of the prior distribution is a critical step in Bayesian modeling. If the prior not completely specified, historical data may be used to estimate it. In the empirical Bayesian analysis, the resulting prior can be used to produce the posterior distribution. In this paper, we describe a Bayesian Poisson model with a conjugate Gamma prior. The parameters of Gamma distribution are estimated in the empirical Bayesian framework under two estimation schemes. The straightforward numerical search for the maximum likelihood (ML) solution using the marginal negative binomial distribution is unfeasible occasionally. We propose a simplification to the maximization procedure. The Markov Chain Monte Carlo method is used to create a set of Poisson parameters from the historical count data. These Poisson parameters are used to uniquely define the Gamma likelihood function. Easily computable approximation formulae may be used to find the ML estimations for the parameters of gamma distribution. For the sample size calculations, the ML solution is replaced by its upper confidence limit to reflect an incomplete exchangeability of historical trials as opposed to current studies. The exchangeability is measured by the confidence interval for the historical rate of the events. With this prior, the formula for the sample size calculation is completely defined. Published in 2009 by John Wiley & Sons, Ltd.  相似文献   

4.
Cui  Ruifei  Groot  Perry  Heskes  Tom 《Statistics and Computing》2019,29(2):311-333

We consider the problem of causal structure learning from data with missing values, assumed to be drawn from a Gaussian copula model. First, we extend the ‘Rank PC’ algorithm, designed for Gaussian copula models with purely continuous data (so-called nonparanormal models), to incomplete data by applying rank correlation to pairwise complete observations and replacing the sample size with an effective sample size in the conditional independence tests to account for the information loss from missing values. When the data are missing completely at random (MCAR), we provide an error bound on the accuracy of ‘Rank PC’ and show its high-dimensional consistency. However, when the data are missing at random (MAR), ‘Rank PC’ fails dramatically. Therefore, we propose a Gibbs sampling procedure to draw correlation matrix samples from mixed data that still works correctly under MAR. These samples are translated into an average correlation matrix and an effective sample size, resulting in the ‘Copula PC’ algorithm for incomplete data. Simulation study shows that: (1) ‘Copula PC’ estimates a more accurate correlation matrix and causal structure than ‘Rank PC’ under MCAR and, even more so, under MAR and (2) the usage of the effective sample size significantly improves the performance of ‘Rank PC’ and ‘Copula PC.’ We illustrate our methods on two real-world datasets: riboflavin production data and chronic fatigue syndrome data.

  相似文献   

5.
For the model considered by Chaturvedi, Pandey and Gupta (1991), two classes of sequential procedures are developed to construct confidence regions (which may be interval, ellipsoidal or spherical) of ‘pre-assigned width and coverage probability’ for the parameters of interest and for the minimum risk point estimation (taking loss to be quadratic plus linear cost of sampling) of the nuisance parameter. Second-Order approximations are derived for the expected sample size, coverage probability and ‘regret’ associated with the two classes of sequential procedures. A simple and direct method of obtaining the asymptotic distribution of the stopping time is provided. By means of examples, it is illustrated that several estimation problems can be tackled with the help of proposed classes of sequential procedures.  相似文献   

6.
The theoretical foundation for a number of model selection criteria is established in the context of inhomogeneous point processes and under various asymptotic settings: infill, increasing domain and combinations of these. For inhomogeneous Poisson processes we consider Akaike's information criterion and the Bayesian information criterion, and in particular we identify the point process analogue of ‘sample size’ needed for the Bayesian information criterion. Considering general inhomogeneous point processes we derive new composite likelihood and composite Bayesian information criteria for selecting a regression model for the intensity function. The proposed model selection criteria are evaluated using simulations of Poisson processes and cluster point processes.  相似文献   

7.

There has been increasing interest in using semi-supervised learning to form a classifier. As is well known, the (Fisher) information in an unclassified feature with unknown class label is less (considerably less for weakly separated classes) than that of a classified feature which has known class label. Hence in the case where the absence of class labels does not depend on the data, the expected error rate of a classifier formed from the classified and unclassified features in a partially classified sample is greater than that if the sample were completely classified. We propose to treat the labels of the unclassified features as missing data and to introduce a framework for their missingness as in the pioneering work of Rubin (Biometrika 63:581–592, 1976) for missingness in incomplete data analysis. An examination of several partially classified data sets in the literature suggests that the unclassified features are not occurring at random in the feature space, but rather tend to be concentrated in regions of relatively high entropy. It suggests that the missingness of the labels of the features can be modelled by representing the conditional probability of a missing label for a feature via the logistic model with covariate depending on the entropy of the feature or an appropriate proxy for it. We consider here the case of two normal classes with a common covariance matrix where for computational convenience the square of the discriminant function is used as the covariate in the logistic model in place of the negative log entropy. Rather paradoxically, we show that the classifier so formed from the partially classified sample may have smaller expected error rate than that if the sample were completely classified.

  相似文献   

8.
The object of this paper is to explain the role played by the catchability and sampling in the Bayesian estimation of k, the unknown number of classes in a multinomial population. It is shown that the posterior distribution of k increases as the capture probabilities of the classes become more unequal, and that the posterior distribution of k increases with the number of classes observed in the sample and decreases with the sample size. Moreover, it is shown that the posterior mean of k is consistent.  相似文献   

9.
Latent class analysis (LCA) has been found to have important applications in social and behavioural sciences for modelling categorical response variables, and non-response is typical when collecting data. In this study, the non-response mainly included ‘contingency questions’ and real ‘missing data’. The primary objective of this study was to evaluate the effects of some potential factors on model selection indices in LCA with non-response data. We simulated missing data with contingency question and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items.  相似文献   

10.
The choice of prior distributions for the variances can be important and quite difficult in Bayesian hierarchical and variance component models. For situations where little prior information is available, a ‘nonin-formative’ type prior is usually chosen. ‘Noninformative’ priors have been discussed by many authors and used in many contexts. However, care must be taken using these prior distributions as many are improper and thus, can lead to improper posterior distributions. Additionally, in small samples, these priors can be ‘informative’. In this paper, we investigate a proper ‘vague’ prior, the uniform shrinkage prior (Strawder-man 1971; Christiansen & Morris 1997). We discuss its properties and show how posterior distributions for common hierarchical models using this prior lead to proper posterior distributions. We also illustrate the attractive frequentist properties of this prior for a normal hierarchical model including testing and estimation. To conclude, we generalize this prior to the multivariate situation of a covariance matrix.  相似文献   

11.
The generalized Poisson distribution (GPD), studied by many researchers and containing two parameters θ and λ, has been found to fit very well data sets arising in biological, ecological, social and marketing fields. Consul and Shoukri (1985) have shown that for negative values of λ the GPD gets truncated and the model becomes deficient; however, the truncation error becomes less than 0.0005 if the minimum number of non-zero probability classes ≥ 4 for all values of θ and λ and the GPD model can be safely used in all such cases. The problem of admissible maximum likelihood (ML) estimation when the sample mean is larger than the sample variance is considered in this paper which complements the earlier work of Consul and Shoukri (1984) on the existence of unique ML estimators of θ and λ when the sample mean is smaller than or equal to the sample variance.  相似文献   

12.
This study investigates the small sample powers of several tests designed against ordered location alternatives in randomized block experiments. The results are intended to aid the researcher in the selection process. Toward this end the small sample powers of three classes of rank tests — tests based on ‘within-blocks’ rankings (W-tests), ‘among-b locks’ rankings (A-tests), and ‘ranking after alignment’ within blocks (RAA-tests)— are compared and contrasted with the asymptotic properties given by Pirie (1974) as well as with the empirical powers of competing parametric procedures.  相似文献   

13.
This paper provides a practical simulation-based Bayesian analysis of parameter-driven models for time series Poisson data with the AR(1) latent process. The posterior distribution is simulated by a Gibbs sampling algorithm. Full conditional posterior distributions of unknown variables in the model are given in convenient forms for the Gibbs sampling algorithm. The case with missing observations is also discussed. The methods are applied to real polio data from 1970 to 1983.  相似文献   

14.
A sample of n subjects is observed in each of two states, S1-and S2. In each state, a subject is in one of two conditions, X or Y. Thus, a subject may be recorded as showing a change if its condition in the two states is ‘Y,X’ or ‘X,Y’ and, otherwise, the condition is unchanged. We consider a Bayesian test of the null hypothesis that the probability of an ‘X,Y’ change exceeds that of a ‘Y,X’ change by amount kO. That is, we develop the posterior distribution of kO, the difference between the two probabilities and reject the null hypothesis if k lies outside the appropriate posterior probability interval. The performance of the method is assessed by Monte Carlo and other numerical studies and brief tables of exact critical values are presented  相似文献   

15.
A two–sample test statistic for detecting shifts in location is developed for a broad range of underlying distributions using adaptive techniques. The test statistic is a linear rank statistics which uses a simple modification of the Wilcoxon test; the scores are Winsorized ranks where the upper and lower Winsorinzing proportions are estimated in the first stage of the adaptive procedure using sample the first stage of the adaptive procedure using sample measures of the distribution's skewness and tailweight. An empirical relationship between the Winsorizing proportions and the sample skewness and tailweight allows for a ‘continuous’ adaptation of the test statistic to the data. The test has good asymptotic properties, and the small sample results are compared with other populatr parametric, nonparametric, and two–stage tests using Monte Carlo methods. Based on these results, this proposed test procedure is recommended for moderate and larger sample sizes.  相似文献   

16.
In this paper non-parametric tests for homogeneity of several populations against locationtype alternatives are proposed. For this all possible subsamples of fixed size are drawn from each sample and their maxima and minima are computed One class of tests is obtained using these subsample minima whereas other class of tests involves use of sub sample maxima. Tests belonging t o these two classes have been compared with many of the presently available tests in terms of their Pitman asymptotic relative efficiency . Some of the members of these proposed classes of tests prove to robust in terms of efficiency.  相似文献   

17.
The problem of sample size determination in the context of Bayesian analysis is considered. For the familiar and practically important parameter of a geometric distribution with a beta prior, three different Bayesian approaches based on the highest posterior density intervals are discussed. A computer program handles all computational complexities and is available upon request.  相似文献   

18.
For models with random effects or missing data, the likelihood function is sometimes intractable analytically but amenable to Monte Carlo approximation. To get a good approximation, the parameter value that drives the simulations should be sufficiently close to the maximum likelihood estimate (MLE) which unfortunately is unknown. Introducing a working prior distribution, we express the likelihood function as a posterior expectation and approximate it using posterior simulations. If the sample size is large, the sample information is likely to outweigh the prior specification and the posterior simulations will be concentrated around the MLE automatically, leading to good approximation of the likelihood near the MLE. For smaller samples, we propose to use the current posterior as the next prior distribution to make the posterior simulations closer to the MLE and hence improve the likelihood approximation. By using the technique of data duplication, we can simulate from the sharpened posterior distribution without actually updating the prior distribution. The suggested method works well in several test cases. A more complex example involving censored spatial data is also discussed.  相似文献   

19.
For two independent non-homogeneous Poisson processes with unknown intensities we propose a test for testing the hypothesis that the ratio of the intensities is constant versus it is increasing on (0,t]. The existing test procedures for testing such relative trends are based on conditioning on the number of failures observed in (0,t] from the two processes. Our test is unconditional and is based on the original time truncated data which enables us to have meaningful asymptotics. We obtain the asymptotic null distribution (as t becomes large) of the proposed test statistic and show that the proposed test is consistent against several large classes of alternatives. It was observed by Park and Kim (IEEE. Trans. Rehab. 40 (1), 1992, 107–111) that it is difficult to distinguish between the power-law and log-linear processes for certain parameter values. We show that our test is consistent for such alternatives also.  相似文献   

20.
A general framework is proposed for joint modelling of mixed correlated ordinal and continuous responses with missing values for responses, where the missing mechanism for both kinds of responses is also considered. Considering the posterior distribution of unknowns given all available information, a Markov Chain Monte Carlo sampling algorithm via winBUGS is used for estimating the posterior distribution of the parameters. For sensitivity analysis to investigate the perturbation from missing at random to not missing at random, it is shown how one can use some elements of covariance structure. These elements associate responses and their missing mechanisms. Influence of small perturbation of these elements on posterior displacement and posterior estimates is also studied. The model is illustrated using data from a foreign language achievement study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号