期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Time series analysis of categorical data using auto-mutual information

Atanu Biswas Apratim Guha 《Journal of statistical planning and inference》2009

Despite its importance, there has been little attention in the modeling of time series data of categorical nature in the recent past. In this paper, we present a framework based on the Pegram's [An autoregressive model for multilag Markov chains. Journal of Applied Probabability 17, 350–362] operator that was originally proposed only to construct discrete AR(p

p

) processes. We extend the Pegram's operator to accommodate categorical processes with ARMA representations. We observe that the concept of correlation is not always suitable for categorical data. As a sensible alternative, we use the concept of mutual information, and introduce auto-mutual information to define the time series process of categorical data. Some model selection and inferential aspects are also discussed. We implement the developed methodologies to analyze a time series data set on infant sleep status. 相似文献

2.

The minimum discrimination information approach in analyzing categorical data

D.V Gokhale S. Kullback 《统计学通讯:理论与方法》2013,42(10):987-1005

A brief review of the minimum discrimination information (MDI) approach in analyzing categorical data is presented in a question -answer format, An example is given to bring out situations in which the MDI approach is more useful. No new results are proved. 相似文献

3.

Reliability models for categorical data

Max R. Mickey Claude O. Archer 《统计学通讯:理论与方法》2013,42(15):1851-1869

As assumed hypothetical consensus category corresponding to a case being classified provides a basis for assessment of reliability of judges. Equivalent judges are characterised by the joint probability distribution of the judge assignment and the consensus category. Estimates of the conditional probabilities of judge assignment given consensus category and of consensus category given judge assignments are indices of reliability. All parameters can be estimated if data include classifications of a number of cases by 3 or more judges. Restrictive assumptions are imposed to obtain models for data from classifications by two judges. Maximum likelihood estimation is discussed and illustrated by example for the 3 or more judges case. 相似文献

4.

Latent variable techniques for categorical data

Lancaster Gillian Green Mick 《Statistics and Computing》2002,12(2):153-161

Two useful statistical methods for generating a latent variable are described and extended to incorporate polytomous data and additional covariates. Item response analysis is not well-known outside its area of application, mainly because the procedures to fit the models are computer intensive and not routinely available within general statistical software packages. The linear score technique is less computer intensive, straightforward to implement and has been proposed as a good approximation to item response analysis. Both methods have been implemented in the standard statistical software package GLIM 4.0, and are compared to determine their effectiveness. 相似文献

5.

Gabi Gayer Omer Yaffe 《Econometric Reviews》2019,38(3):263-278

In a large variety of applications, the data for a variable we wish to explain are ordered and categorical. In this paper, we present a new similarity-based model for the scenario and investigate its properties. We establish that the process is ψ-mixing and strictly stationary and derive the explicit form of the autocorrelation function in some special cases. Consistency and asymptotic normality of the maximum likelihood estimator of the model’s parameters are proven. A simulation study supports our findings. The results are applied to the Netflix data set, comprised of a survey on users’ grading of movies. 相似文献

6.

Bayesian inference for categorical data analysis 总被引：1，自引：0，他引：1

Alan Agresti David B. Hitchcock 《Statistical Methods and Applications》2005,14(3):297-330

This article surveys Bayesian methods for categorical data analysis, with primary emphasis on contingency table analysis. Early innovations were proposed by Good (1953, 1956, 1965) for smoothing proportions in contingency tables and by Lindley (1964) for inference about odds ratios. These approaches primarily used conjugate beta and Dirichlet priors. Altham (1969, 1971) presented Bayesian analogs of small-sample frequentist tests for 2 x 2 tables using such priors. An alternative approach using normal priors for logits received considerable attention in the 1970s by Leonard and others (e.g., Leonard 1972). Adopted usually in a hierarchical form, the logit-normal approach allows greater flexibility and scope for generalization. The 1970s also saw considerable interest in loglinear modeling. The advent of modern computational methods since the mid-1980s has led to a growing literature on fully Bayesian analyses with models for categorical data, with main emphasis on generalized linear models such as logistic regression for binary and multi-category response variables. 相似文献

7.

Population-averaged and subjectspecific approaches for clustered categorical data

《Journal of Statistical Computation and Simulation》2012,82(1-3):231-253

Modeling clustered categorical data based on extensions of generalized linear model theory has received much attention in recent years. The rapidly increasing number of approaches suitable for categorical data in which clusters are uncorrelated, but correlations exist within a cluster, has caused uncertainty among applied scientists as to their respective merits and demerits. Upon centering estimation around solving an unbiased estimating function for mean parameters and estimation of covariance parameters describing within-cluster or among-cluster heterogeneity, many approaches can easily be related. This contribution describes a series of algorithms and their implementation in detail, based on a classification of inferential procedures for clustered data. 相似文献

8.

Bayesian analysis of correlated mixed categorical data by incorporating historical prior information

Ming-Hui Chen 《统计学通讯:理论与方法》2013,42(6):1341-1361

In this article, we develop statistical models for analysis of correlated mixed categorical (binary and ordinal) response data arising in medical and epidemi-ologic studies. There is evidence in the literature to suggest that models including correlation structure can lead to substantial improvement in precision of estimation or are more appropriate (accurate). We use a very rich class of scale mixture of multivariate normal (SMMVN) iink functions to accommodate heavy tailed distributions. In order to incorporate available historical information, we propose a unified prior elicitation scheme based on SMMVN-link models. Further, simulation-based techniques are developed to assess model adequacy. Finally, a real data example from prostate cancer studies is used to illustrate the proposed methodologies. 相似文献

9.

CATANOVA for two-way cross classified categorical data

R. Lombardo I. Camminatiello 《Statistics》2013,47(1):57-71

In this article we develop an extension of categorical analysis of variance for one response and two factors, based on a partitioning of a measure of predictability for three-way contingency tables, known as Gray and Williams's index. At the first instance moment the decomposition of this multiple measure of association in partial association measures is shown. Finally, for ordinal-scale variables, we propose an extension of this decomposition using a particular set of orthogonal polynomials. 相似文献

10.

On least squares estimation for categorical data

Robert P. Clickner 《统计学通讯:理论与方法》2013,42(11):1059-1064

We consider a multinomial distribution in which the cell probabilities are known arbitrary functions of a vector parameter θ. It is desired to estimate θ by least squares. Three variations of the least squares approach are investigated, and each is found to be equivalent, in the very strong sense of being algebraically identical, to one of the following estimation procedures: maximum likelihood, minimum χ² and minimum modified χ². Two of these results also apply to the multiple hypergeometric distribution. 相似文献

11.

Randomized response multivariate designs for categorical data

Patrick D. Bourke 《统计学通讯:理论与方法》2013,42(25):2889-2901

Three approaches to multivariate estimation for categorical data using randomized response (RR) are described. In the first approach, practical only for 2×2 contingency tables, a multi-proportions design is used. In the second approach, a separate RR trial is used for each variate and it is noted that the multivariate design matrix of conditional probabilities is given by the Kroneeker product of the univariate design matrices of each trial, provided that the trials are independent of each other in a certain sense. The third approach requires only a single randomization and thus may be viewed as the use of vector response. Finally, a special-purpose bivariate design is presented. 相似文献

12.

Applied categorical and count data analysis

Isaac Dialsingh 《Journal of applied statistics》2014,41(4)

相似文献

13.

Generalized cochran-mantel-haenszel test statistics for correlated categorical data

Jie Zhang Dennis D. Boos 《统计学通讯:理论与方法》2013,42(8):1813-1837

Three new test statistics are introduced for correlated categorical data in stratified R×C tables. They are similar in form to the standard generalized Cochran-Mantel-Haenszel statistics but modified to handle correlated outcomes. Two of these statistics are asymptotically valid in both many-strata (sparse data) and large-strata limiting models. The third one is designed specifically for the many-strata case but is valid even with a small number of strata. This latter statistic is also appropriate when strata are assumed to be random. 相似文献

14.

Two simple measures of variability for categorical data

Erindi Allaj 《Journal of applied statistics》2018,45(8):1497-1516

This paper proposes two new variability measures for categorical data. The first variability measure is obtained as one minus the square root of the sum of the squares of the relative frequencies of the different categories. The second measure is obtained by standardizing the first measure. The measures proposed are functions of the variability measure proposed by Gini [Variabilitá e Mutuabilitá Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche, C. Cuppini, Bologna, 1912] and approximate the coefficient of nominal variation introduced by Kvålseth [Coefficients of variation for nominal and ordinal categorical data, Percept. Motor Skills 80 (1995), pp. 843–847] when the number of categories increases. Different mathematical properties of the proposed variability measures are studied and analyzed. Several examples illustrate how the variability measures can be interpreted and used in practice. 相似文献

15.

Disaggregated spatial modelling for areal unit categorical data

Tassone EC Miranda ML Gelfand AE 《Journal of the Royal Statistical Society. Series C, Applied statistics》2010,59(1):175-190

Summary. We consider joint spatial modelling of areal multivariate categorical data assuming a multiway contingency table for the variables, modelled by using a log-linear model, and connected across units by using spatial random effects. With no distinction regarding whether variables are response or explanatory, we do not limit inference to conditional probabilities, as in customary spatial logistic regression. With joint probabilities we can calculate arbitrary marginal and conditional probabilities without having to refit models to investigate different hypotheses. Flexible aggregation allows us to investigate subgroups of interest; flexible conditioning enables not only the study of outcomes given risk factors but also retrospective study of risk factors given outcomes. A benefit of joint spatial modelling is the opportunity to reveal disparities in health in a richer fashion, e.g. across space for any particular group of cells, across groups of cells at a particular location, and, hence, potential space–group interaction. We illustrate with an analysis of birth records for the state of North Carolina and compare with spatial logistic regression. 相似文献

16.

Bayesian estimation methods for categorical data with misclassifications

Zhi Geng Chooichiro Asano 《统计学通讯:理论与方法》2013,42(8):2935-2954

This article considers Bayesian estimation methods for categorical data with misclassifications. To adjust for misclassification, double sampling schemes are utilized. Observations are represented in a contingency table categorized by error-free categorical variables and error-prone categorical variables. Posterior means of probabilities in cells are considered as estimates. In some cases, the posterior means can be calculated exactly. However,in some cases, the exact calculation may be too difficult to perform, but we can easily use the expectation-maximiza-tion(EM) algorithm to obtain approximate posterior means. 相似文献

17.

A model for comparisons with ordered categorical data

Stephen L. Meeks Ralph B. D' Agostino 《统计学通讯:理论与方法》2013,42(8):895-906

A model developed by Andrich for ordered categorical data is extended to develop tests for treatment effects with paired or matched samples. In particular, this includes analysis for pre-post studies and crossover designs. Some advantages of this model are that it allows for misclassification of subjects, yields reasonable conditional requirements for exact analysis, a normal approximation is good for all but the smallest of sample sizes, and it is relatively simple mathematically. Furthermore, the form of the tests derived are logical extensions of tests for unordered categories. 相似文献

18.

Effective directed tests for models with ordered categorical data

Arthur Cohen David Madigan Harold B. Sackrowitz 《Australian & New Zealand Journal of Statistics》2003,45(3):285-300

This paper offers a new method for testing one‐sided hypotheses in discrete multivariate data models. One‐sided alternatives mean that there are restrictions on the multidimensional parameter space. The focus is on models dealing with ordered categorical data. In particular, applications are concerned with R×C contingency tables. The method has advantages over other general approaches. All tests are exact in the sense that no large sample theory or large sample distribution theory is required. Testing is unconditional although its execution is done conditionally, section by section, where a section is determined by marginal totals. This eliminates any potential nuisance parameter issues. The power of the tests is more robust than the power of the typical linear tests often recommended. Furthermore, computer programs are available to carry out the tests efficiently regardless of the sample sizes or the order of the contingency tables. Both censored data and uncensored data models are discussed. 相似文献

19.

Univaiuate anu multivariate categorical data analysis for block designs

R.P. Bhargava 《统计学通讯:理论与方法》2013,42(11):1209-1231

Analysis for univariate and multivariate categorical data in block designs is given and illustrated through examples. The univariate analysis compares the treatments on the basis of their pooled frequency distributions (pooled over blocks). The test statistic used is called Q after Cochran (1950). The large sample null distribution of Q is a chi-square. Analysis of p-variate categorical data (kth variable having ck classes, K=1,...,p) can be done by treating it as a univariate categorical problem with [d] classes. Very often [d] is large in relation to the size of the experiment. This makes the expected frequencies for some of the cells very small, making the univariate method inapplicable. In these circumstances it may be reasonable to compare the treatments on the basis of marginal distributions up to the mth dimension, 1[d] , which is given in this paper. This method is also illustrated for missing observations 相似文献

20.

A pattern-mixture odds ratio model for incomplete categorical data

Bart Michiels Geert Molenberghs Stuart R. Lipsitz 《统计学通讯:理论与方法》2013,42(12):2843-2869

Most models for incomplete data are formulated within the selection model framework. Pattern-mixture models are increasingly seen as a viable alternative, both from an interpretational as well as from a computational point of view (Little 1993, Hogan and Laird 1997, Ekholm and Skinner 1998). Whereas most applications are either for continuous normally distributed data or for simplified categorical settings such as contingency tables, we show how a multivariate odds ratio model (Molenberghs and Lesaffre 1994, 1998) can be used to fit pattern-mixture models to repeated binary outcomes with continuous covariates. Apart from point estimation, useful methods for interval estimation are presented and data from a clinical study are analyzed to illustrate the methods. 相似文献