首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Cell counts in contingency tables can be smoothed using loglinear models. Recently, sampling-based methods such as Markov chain Monte Carlo (MCMC) have been introduced, making it possible to sample from posterior distributions. The novelty of the approach presented here is that all conditional distributions can be specified directly, so that straight-forward Gibbs sampling is possible. Thus, the model is constructed in a way that makes burn-in and checking convergence a relatively minor issue. The emphasis of this paper is on smoothing cell counts in contingency tables, and not so much on estimation of regression parameters. Therefore, the prior distribution consists of two stages. We rely on a normal nonconjugate prior at the first stage, and a vague prior for hyperparameters at the second stage. The smoothed counts tend to compromise between the observed data and a log-linear model. The methods are demonstrated with a sparse data table taken from a multi-center clinical trial. The research for the first author was supported by Brain Pool program of the Korean Federation of Science and Technology Societies. The research for the second author was partially supported by KOSEF through Statistical Research Center for Complex Systems at Seoul National University.  相似文献   

2.
This paper suggests a Bayesian approach to the reconstruction of a 2 × 2 contingency table where some of the observations are only partially categorized and others are fully categorized. In contrast, most previous Bayesian and non-Bayesian analyses of the partially categorized data problem have been concerned with estimation of the parameters that generated the data. We show in an example that estimates may not be extremely sensitive to the weight placed on prior information relative to the sample data.  相似文献   

3.
4.
Measures of association are often used to describe the relationship between row and column variables in two—dimensional contingency tables. It is not uncommon in biomedical research to categorize continuous variables to obtain a two—dimensional table. In these situations it is desirable that the measure of association not be too sensitive to changes in the number of categories or to the choice of cut points. To accomplish this objective we attempt to find a measure of association that closely approximates the corresponding measure of association for the underlying distribution.Measures that are close to the underlying measure for various table sizes andcutpoints are called stable measures.  相似文献   

5.
A representation of sums and differences of the form 2n log n, the lnn function, is introduced to express likelihood-ratio chi-square test statistics in contingency table analysis. This is a concise explicit form to display when partitioning chi-square statistics in accordance with hierarchical models. The lnn representation gives students insights into the construction of test statistics, and assists in relating identical forms under differing model sets. Hierarchies are presented for independence and equi-probability in two-way tables, for symmetry in correlated square tables, for independence-and-homogeneity of two-way responses across levels of a factor, and for mutual independence in three-way tables, along with relevant partitions of chi-square.  相似文献   

6.
In this paper we present a simulation and graphics-based model checking and model comparison methodology for the Bayesian analysis of contingency tables. We illustrate the approach by testing the hypotheses of independence and symmetry on complete and incomplete simulated tables.  相似文献   

7.
The correspondence analysis (CA) method appears to be an effective tool for analysis of interrelations between rows and columns in two-way contingency data. A discrete version of the method, box clustering, is developed in the paper using an approximation version of the CA model extended to the case when CA factor values are required to be Boolean. Several properties of the proposed SEFIT-BOX algorithm are proved to facilitate interpretation of its output. It is also shown that two known partitioning algorithms (applied within row or column sets only) could be considered as locally optimal algorithms for fitting the model, and extensions of these algorithms to a simultaneous row and column partitioning problem are proposed.  相似文献   

8.
This paper extends an analysis of variance for categorical data (CATANOVA) procedure to multidimensional contingency tables involving several factors and a response variable measured on a nominal scale. Using an appropriate measure of total variation for multinomial data, partial and multiple association measures are developed as R2 quantities which parallel the analogous statistics in multiple linear regression for quantitative data. In addition, test statistics are derived in terms of these R2 criteria. Finally, this CATANOVA approach is illustrated within the context of 2 three-way contingency table from a multicenter clinicaltrial.  相似文献   

9.
In this article, a Bayesian approach is proposed for the estimation of log odds ratios and intraclass correlations over a two-way contingency table, including intraclass correlated cells. Required likelihood functions of log odds ratios are obtained, and determination of prior structures is discussed. Hypothesis testing for log odds ratios and intraclass correlations by using the posterior simulations is outlined. Because the proposed approach includes no asymptotic theory, it is useful for the estimation and hypothesis testing of log odds ratios in the presence of certain intraclass correlation patterns. A family health status and limitations data set is analyzed by using the proposed approach in order to figure out the impact of intraclass correlations on the estimates and hypothesis tests of log odds ratios. Although intraclass correlations are small in the data set, we obtain that even small intraclass correlations can significantly affect the estimates and test results, and our approach is useful for the estimation and testing of log odds ratios in the presence of intraclass correlations.  相似文献   

10.
For high-dimensional data, it is a tedious task to determine anomalies such as outliers. We present a novel outlier detection method for high-dimensional contingency tables. We use the class of decomposable graphical models to model the relationship among the variables of interest, which can be depicted by an undirected graph called the interaction graph. Given an interaction graph, we derive a closed-form expression of the likelihood ratio test (LRT) statistic and an exact distribution for efficient simulation of the test statistic. An observation is declared an outlier if it deviates significantly from the approximated distribution of the test statistic under the null hypothesis. We demonstrate the use of the LRT outlier detection framework on genetic data modeled by Chow–Liu trees.  相似文献   

11.
12.
In the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for strokes, we apply Bayesian model averaging to the selection of variables in Cox proportional hazard models. We use an extension of the leaps-and-bounds algorithm for locating the models that are to be averaged over and make available S-PLUS software to implement the methods. Bayesian model averaging provides a posterior probability that each variable belongs in the model, a more directly interpretable measure of variable importance than a P -value. P -values from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable and do not account for model uncertainty. We introduce the partial predictive score to evaluate predictive performance. For the Cardiovascular Health Study, Bayesian model averaging predictively outperforms standard model selection and does a better job of assessing who is at high risk for a stroke.  相似文献   

13.
The multinomial logistic regression model (MLRM) can be interpreted as a natural extension of the binomial model with logit link function to situations where the response variable can have three or more possible outcomes. In addition, when the categories of the response variable are nominal, the MLRM can be expressed in terms of two or more logistic models and analyzed in both frequentist and Bayesian approaches. However, few discussions about post modeling in categorical data models are found in the literature, and they mainly use Bayesian inference. The objective of this work is to present classic and Bayesian diagnostic measures for categorical data models. These measures are applied to a dataset (status) of patients undergoing kidney transplantation.  相似文献   

14.
In 1991 Marsh and co-workers made the case for a sample of anonymized records (SAR) from the 1991 census of population. The case was accepted by the Office for National Statistics (then the Office of Population Censuses and Surveys) and a request was made by the Economic and Social Research Council to purchase the SARs. Two files were released for Great Britain—a 2% sample of individuals and a 1% sample of households. Subsequently similar samples were released for Northern Ireland. Since their release, the files have been heavily used for research and there has been no known breach of confidentiality. There is a considerable demand for similar files from the 2001 census, with specific requests for a larger sample size and lower population threshold for the individual SAR. This paper reassesses the analysis of Marsh and co-workers of the risk of identification of an individual or household in a sample of microdata from the 1991 census and also uses alternative ways of assessing risks with the 1991 SARs. The results of both the reassessment and the new analyses are reassuring and allow us to take the 1991 SARs as a base-line against which to assess proposals for changes to the size and structure of samples from the 2001 census.  相似文献   

15.
Summary.  The paper establishes a correspondence between statistical disclosure control and forensic statistics regarding their common use of the concept of 'probability of identification'. The paper then seeks to investigate what lessons for disclosure control can be learnt from the forensic identification literature. The main lesson that is considered is that disclosure risk assessment cannot, in general, ignore the search method that is employed by an intruder seeking to achieve disclosure. The effects of using several search methods are considered. Through consideration of the plausibility of assumptions and 'worst case' approaches, the paper suggests how the impact of search method can be handled. The paper focuses on foundations of disclosure risk assessment, providing some justification for some modelling assumptions underlying some existing record level measures of disclosure risk. The paper illustrates the effects of using various search methods in a numerical example based on microdata from a sample from the 2001 UK census.  相似文献   

16.
Standard methods for analyzing binomial regression data rely on asymptotic inferences. Bayesian methods can be performed using simple computations, and they apply for any sample size. We provide a relatively complete discussion of Bayesian inferences for binomial regression with emphasis on inferences for the probability of “success.” Furthermore, we illustrate diagnostic tools, perform model selection among nonnested models, and examine the sensitivity of the Bayesian methods.  相似文献   

17.
In the mid-1950s S.N. Roy and his students contributed two landmark articles to the contingency table literature [Roy, S.N., Kastenbaum, M.A., 1956. On the hypothesis of no “interaction” in a multiway contingency table. Ann. Math. Statist. 27, 749–757; Roy, S.N., Mitra, S.K., 1956. An introduction to some nonparametric generalizations of analysis of variance and multivariate analysis. Biometrika 43, 361–376]. The first article generalized concepts of interaction from 2×2×22×2×2 contingency tables to three-way tables of arbitrary size and to larger tables. In the second article, which is the source of our primary focus, various notions of independence were clarified for three-way contingency tables, Roy's union–intersection test was applied to construct chi-squared tests of hypotheses about the structure of such tables, and the chi-squared statistics were shown not to depend on the distinction between response and explanatory variables. This work pre-dates by many years later developments that expressed such results in the context of loglinear models. It pre-dates by a quarter century the development of graphical models. We summarize the main results in these key articles and discuss the connection between them and the later developments of loglinear modeling and of graphical modeling. We also mention ways in which these later developments have themselves been further generalized.  相似文献   

18.
To gain regulatory approval, a new medicine must demonstrate that its benefits outweigh any potential risks, ie, that the benefit‐risk balance is favourable towards the new medicine. For transparency and clarity of the decision, a structured and consistent approach to benefit‐risk assessment that quantifies uncertainties and accounts for underlying dependencies is desirable. This paper proposes two approaches to benefit‐risk evaluation, both based on the idea of joint modelling of mixed outcomes that are potentially dependent at the subject level. Using Bayesian inference, the two approaches offer interpretability and efficiency to enhance qualitative frameworks. Simulation studies show that accounting for correlation leads to a more accurate assessment of the strength of evidence to support benefit‐risk profiles of interest. Several graphical approaches are proposed that can be used to communicate the benefit‐risk balance to project teams. Finally, the two approaches are illustrated in a case study using real clinical trial data.  相似文献   

19.
This simulation study aims at investigating the performance of maximum likelihood and weighted least-square estimation approaches in growth curve models with categorical data. The goodness-of-fit indices were compared with a number of scenarios (different trajectories, sample sizes, replications, and number of categories). The results show that when the number of categories and replications are small, using weighted least-square estimating methods leads to better goodness-of-fit indices. However, when the number of categories and replications are large, both maximum likelihood and weighted least squares in estimating methods will result in similar fit indices.  相似文献   

20.
This paper presents a Bayesian technique for the estimation of a logistic regression model including variable selection. As in Ou & Penman (1989), the model is used to predict the direction of company earnings, one year ahead, from a large set of accounting variables from financial statements. To estimate the model, the paper presents a Markov chain Monte Carlo sampling scheme that includes the variable selection technique of Smith & Kohn (1996) and the non-Gaussian estimation method of Mira & Tierney (2001). The technique is applied to data for companies in the United States and Australia. The results obtained compare favourably to the technique used by Ou & Penman (1989) for both regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号