首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper considers a connected Markov chain for sampling 3 × 3 ×K contingency tables having fixed two‐dimensional marginal totals. Such sampling arises in performing various tests of the hypothesis of no three‐factor interactions. A Markov chain algorithm is a valuable tool for evaluating P‐values, especially for sparse datasets where large‐sample theory does not work well. To construct a connected Markov chain over high‐dimensional contingency tables with fixed marginals, algebraic algorithms have been proposed. These algorithms involve computations in polynomial rings using Gröbner bases. However, algorithms based on Gröbner bases do not incorporate symmetry among variables and are very time‐consuming when the contingency tables are large. We construct a minimal basis for a connected Markov chain over 3 × 3 ×K contingency tables. The minimal basis is unique. Some numerical examples illustrate the practicality of our algorithms.  相似文献   

2.
Monte Carlo methods for the exact inference have received much attention recently in complete or incomplete contingency table analysis. However, conventional Markov chain Monte Carlo, such as the Metropolis–Hastings algorithm, and importance sampling methods sometimes generate the poor performance by failing to produce valid tables. In this paper, we apply an adaptive Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm (SAMC; Liang, Liu, & Carroll, 2007), to the exact test of the goodness-of-fit of the model in complete or incomplete contingency tables containing some structural zero cells. The numerical results are in favor of our method in terms of quality of estimates.  相似文献   

3.
We consider conditional exact tests of factor effects in design of experiments for discrete response variables. Similarly to the analysis of contingency tables, Markov chain Monte Carlo methods can be used to perform exact tests, especially when large-sample approximations of the null distributions are poor and the enumeration of the conditional sample space is infeasible. In order to construct a connected Markov chain over the appropriate sample space, one approach is to compute a Markov basis. Theoretically, a Markov basis can be characterized as a generator of a well-specified toric ideal in a polynomial ring and is computed by computational algebraic software. However, the computation of a Markov basis sometimes becomes infeasible, even for problems of moderate sizes. In the present article, we obtain the closed-form expression of minimal Markov bases for the main effect models of 2p ? 1 fractional factorial designs of resolution p.  相似文献   

4.
We consider a likelihood ratio test of independence for large two-way contingency tables having both structural (non-random) and sampling (random) zeros in many cells. The solution of this problem is not available using standard likelihood ratio tests. One way to bypass this problem is to remove the structural zeroes from the table and implement a test on the remaining cells which incorporate the randomness in the sampling zeros; the resulting test is a test of quasi-independence of the two categorical variables. This test is based only on the positive counts in the contingency table and is valid when there is at least one sampling (random) zero. The proposed (likelihood ratio) test is an alternative to the commonly used ad hoc procedures of converting the zero cells to positive ones by adding a small constant. One practical advantage of our procedure is that there is no need to know if a zero cell is structural zero or a sampling zero. We model the positive counts using a truncated multinomial distribution. In fact, we have two truncated multinomial distributions; one for the null hypothesis of independence and the other for the unrestricted parameter space. We use Monte Carlo methods to obtain the maximum likelihood estimators of the parameters and also the p-value of our proposed test. To obtain the sampling distribution of the likelihood ratio test statistic, we use bootstrap methods. We discuss many examples, and also empirically compare the power function of the likelihood ratio test relative to those of some well-known test statistics.  相似文献   

5.
A Markov chain is proposed that uses coupling from the past sampling algorithm for sampling m×n contingency tables. This method is an extension of the one proposed by Kijima and Matsui (Rand. Struct. Alg., 29:243–256, 2006). It is not polynomial, as it is based upon a recursion, and includes a rejection phase but can be used for practical purposes on small contingency tables as illustrated in a classical 4×4 example.  相似文献   

6.
Algebraic Markov Bases and MCMC for Two-Way Contingency Tables   总被引:3,自引:0,他引:3  
ABSTRACT.  The Diaconis–Sturmfels algorithm is a method for sampling from conditional distributions, based on the algebraic theory of toric ideals. This algorithm is applied to categorical data analysis through the notion of Markov basis. An application of this algorithm is a non-parametric Monte Carlo approach to the goodness of fit tests for contingency tables. In this paper, we characterize or compute the Markov bases for some log-linear models for two-way contingency tables using techniques from Computational Commutative Algebra, namely Gröbner bases. This applies to a large set of cases including independence, quasi-independence, symmetry, quasi-symmetry. Three examples of quasi-symmetry and quasi-independence from Fingleton ( Models of category counts , Cambridge University Press, Cambridge, 1984) and Agresti ( An Introduction to categorical data analysis , Wiley, New York, 1996) illustrate the practical applicability and the relevance of this algebraic methodology.  相似文献   

7.

This article provides an improvement of the network algorithm for calculating the exact p value of the generalized Fisher's exact test in two-way contingency tables. We give a new exact upper bound and an approximate upper bound for the maximization problems encountered in the network algorithm. The approximate bound has some very desirable computational properties and the meaning is elucidated from a viewpoint of differential geometry. Our proposed procedure performs well regardless of the pattern of marginal totals of data.  相似文献   

8.
Importance sampling and Markov chain Monte Carlo methods have been used in exact inference for contingency tables for a long time, however, their performances are not always very satisfactory. In this paper, we propose a stochastic approximation Monte Carlo importance sampling (SAMCIS) method for tackling this problem. SAMCIS is a combination of adaptive Markov chain Monte Carlo and importance sampling, which employs the stochastic approximation Monte Carlo algorithm (Liang et al., J. Am. Stat. Assoc., 102(477):305–320, 2007) to draw samples from an enlarged reference set with a known Markov basis. Compared to the existing importance sampling and Markov chain Monte Carlo methods, SAMCIS has a few advantages, such as fast convergence, ergodicity, and the ability to achieve a desired proportion of valid tables. The numerical results indicate that SAMCIS can outperform the existing importance sampling and Markov chain Monte Carlo methods: It can produce much more accurate estimates in much shorter CPU time than the existing methods, especially for the tables with high degrees of freedom.  相似文献   

9.
We consider conditional exact tests of factor effects in designed experiments for discrete response variables. Similarly to the analysis of contingency tables, a Markov chain Monte Carlo method can be used for performing exact tests, when large-sample approximations are poor and the enumeration of the conditional sample space is infeasible. For designed experiments with a single observation for each run, we formulate log-linear or logistic models and consider a connected Markov chain over an appropriate sample space. In particular, we investigate fractional factorial designs with 2p-q2p-q runs, noting correspondences to the models for 2p-q2p-q contingency tables.  相似文献   

10.
We develop a Markov chain Monte Carlo algorithm, based on ‘stochastic search variable selection’ (George and McCuUoch, 1993), for identifying promising log-linear models. The method may be used in the analysis of multi-way contingency tables where the set of plausible models is very large.  相似文献   

11.
A new technique for the detection of outliers in contingency tables is introduced, where outliers are unusual cell counts with respect to classical loglinear Poisson models. Subsets of cell counts called minimal patterns are defined, corresponding to non-singular design matrices and leading to potentially uncontaminated maximum-likelihood estimates of the model parameters and thereby the expected cell counts. A criterion to easily produce minimal patterns in the two-way case under independence is derived, based on the analysis of the positions of the chosen cells. A simulation study and a couple of real-data examples are presented to illustrate the performance of the newly developed outlier identification algorithm, and to compare it with other existing methods.  相似文献   

12.
ABSTRACT

This paper extends the classical methods of analysis of a two-way contingency table to the fuzzy environment for two cases: (1) when the available sample of observations is reported as imprecise data, and (2) the case in which we prefer to categorize the variables based on linguistic terms rather than as crisp quantities. For this purpose, the α-cuts approach is used to extend the usual concepts of the test statistic and p-value to the fuzzy test statistic and fuzzy p-value. In addition, some measures of association are extended to the fuzzy version in order to evaluate the dependence in such contingency tables. Some practical examples are provided to explain the applicability of the proposed methods in real-world problems.  相似文献   

13.
Power-divergence test statistics have been considered to test linear by linear association for two-way contingency tables. These test statistics have been compared based on designed simulation study and asymptotic results for 2 × 2, 2 × 3, and 3 × 3 tables. According to the results, there are test statistics with better properties than the well-known likelihood ratio test statistic for small and moderate samples.  相似文献   

14.
A representation of sums and differences of the form 2n log n, the lnn function, is introduced to express likelihood-ratio chi-square test statistics in contingency table analysis. This is a concise explicit form to display when partitioning chi-square statistics in accordance with hierarchical models. The lnn representation gives students insights into the construction of test statistics, and assists in relating identical forms under differing model sets. Hierarchies are presented for independence and equi-probability in two-way tables, for symmetry in correlated square tables, for independence-and-homogeneity of two-way responses across levels of a factor, and for mutual independence in three-way tables, along with relevant partitions of chi-square.  相似文献   

15.
Popular rank-2 and rank-3 models for two-way tables have geometrical properties which can be used as diagnostic keys in screening for an appropriate model. Row and column levels of two-way tables are represented by points in two or three dimensional space, whereupon collinearity and coplanarity of row and column points provide diagnostic keys for informal model choice. Coordinates are obtained from a factorization of the two-way table Y in the matrix product UV T. The rows of U then contain row-point coordinates and the rows of V column-point coordinates. Illustrations of applications of diagnostic biplots in the literature were restricted to data from chemistry and physics with little or no noise. In plant breeding, two-way tables containing substantial amounts of noise regularly arise in the form of genotype by environment tables. To investigate the usefulness of diagnostic biplots for model screening for genotype by environment tables, data tables were generated from a range of two-way models under the addition of various amounts of noise. Chances for correct diagnosis of the generating model depended on the type of model. Diagnostic biplots on their own do not seem to provide a sufficient means for model selection for genotype by environment tables, but in combination with other methods they certainly can provide extra insight into the structure of the data.  相似文献   

16.
Bayesian models for relative archaeological chronology building   总被引:1,自引:0,他引:1  
For many years, archaeologists have postulated that the numbers of various artefact types found within excavated features should give insight about their relative dates of deposition even when stratigraphic information is not present. A typical data set used in such studies can be reported as a cross-classification table (often called an abundance matrix or, equivalently, a contingency table) of excavated features against artefact types. Each entry of the table represents the number of a particular artefact type found in a particular archaeological feature. Methodologies for attempting to identify temporal sequences on the basis of such data are commonly referred to as seriation techniques. Several different procedures for seriation including both parametric and non-parametric statistics have been used in an attempt to reconstruct relative chronological orders on the basis of such contingency tables. We develop some possible model-based approaches that might be used to aid in relative, archaeological chronology building. We use the recently developed Markov chain Monte Carlo method based on Langevin diffusions to fit some of the models proposed. Predictive Bayesian model choice techniques are then employed to ascertain which of the models that we develop are most plausible. We analyse two data sets taken from the literature on archaeological seriation.  相似文献   

17.
ABSTRACT

The display of the data by means of contingency tables is used in different approaches to statistical inference, for example, to broach the test of homogeneity of independent multinomial distributions. We develop a Bayesian procedure to test simple null hypotheses versus bilateral alternatives in contingency tables. Given independent samples of two binomial distributions and taking a mixed prior distribution, we calculate the posterior probability that the proportion of successes in the first population is the same as in the second. This posterior probability is compared with the p-value of the classical method, obtaining a reconciliation between both results, classical and Bayesian. The obtained results are generalized for r × s tables.  相似文献   

18.
In the area of statistical limitation, releasing synthetic data sets has become a popular method for limiting the risks of disclosure of sensitive information and at the same time maintaining analytic utility of data. However, less work has been done on how to create synthetic contingency tables that preserve some summary statistics of the original table. Studies in this area have primarily focused on generating replacement tables that preserve the margins of the original table since the latter support statistical inferences for a large set of parametric tests and models. Yet, not all synthetic tables that preserve a set of margins yield consistent results. In this paper, we propose alternative synthetic table releases. We describe how to generate complete two-way contingency tables that have the same set of observed conditional frequencies by using tools from computational algebra. We study both the disclosure risk and the data utility associated with such synthetic tabular data releases, and compare them to the traditionally released synthetic tables.  相似文献   

19.
Kappa and B assess agreement between two observers independently classifying N units into k categories. We study their behavior under zero cells in the contingency table and unbalanced asymmetric marginal distributions. Zero cells arise when a cross-classification is never endorsed by both observers; biased marginal distributions occur when some categories are preferred differently between the observers. Simulations studied the distributions of the unweighted and weighted statistics for k=4, under fixed proportions of diagonal agreement and different patterns off-diagonal, with various sample sizes, and under various zero cell count scenarios. Marginal distributions were first uniform and homogeneous, and then unbalanced asymmetric distributions. Results for unweighted kappa and B statistics were comparable to work of Muñoz and Bangdiwala, even with zero cells. A slight increased variation was observed as the sample size decreased. Weighted statistics did show greater variation as the number of zero cells increased, with weighted kappa increasing substantially more than weighted B. Under biased marginal distributions, weighted kappa with Cicchetti weights were higher than with squared weights. Both statistics for observer agreement behaved well under zero cells. The weighted B was less variable than the weighted kappa under similar circumstances and different weights. In general, B's performance and graphical interpretation make it preferable to kappa under the studied scenarios.  相似文献   

20.
Graphical methods have been previously proposed for studying the cell contributions to the chi-square statistics in two-way contingency tables. Clustering techniques are suggested for analyzing the differences among the frequency distributions of either the columns or the rows of the contingency cables. Modifications are proposed to the other methods: first, varying widths of bars according to the expected cell counts, then the gannria probability plot of the individual chi-square terms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号