首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes two new variability measures for categorical data. The first variability measure is obtained as one minus the square root of the sum of the squares of the relative frequencies of the different categories. The second measure is obtained by standardizing the first measure. The measures proposed are functions of the variability measure proposed by Gini [Variabilitá e Mutuabilitá Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche, C. Cuppini, Bologna, 1912] and approximate the coefficient of nominal variation introduced by Kvålseth [Coefficients of variation for nominal and ordinal categorical data, Percept. Motor Skills 80 (1995), pp. 843–847] when the number of categories increases. Different mathematical properties of the proposed variability measures are studied and analyzed. Several examples illustrate how the variability measures can be interpreted and used in practice.  相似文献   

2.
The analysis of time-indexed categorical data is important in many fields, e.g., in telecommunication network monitoring, manufacturing process control, ecology, etc. Primary interest is in detecting and measuring serial associations and dependencies in such data. For cardinal time series analysis, autocorrelation is a convenient and informative measure of serial association. Yet, for categorical time series analysis an analogous convenient measure and corresponding concepts of weak stationarity have not been provided. For two categorical variables, several ways of measuring association have been suggested. This paper reviews such measures and investigates their properties in a serial context. We discuss concepts of weak stationarity of a categorical time series, in particular of stationarity in association measures. Serial association and weak stationarity are studied in the class of discrete ARMA processes introduced by Jacobs and Lewis (J. Time Ser. Anal. 4(1):19–36, 1983). An intrinsic feature of a time series is that, typically, adjacent observations are dependent. The nature of this dependence among observations of a time series is of considerable practical interest. Time series analysis is concerned with techniques for the analysis of this dependence. (Box et al. 1994p. 1)  相似文献   

3.
Despite its importance, there has been little attention in the modeling of time series data of categorical nature in the recent past. In this paper, we present a framework based on the Pegram's [An autoregressive model for multilag Markov chains. Journal of Applied Probabability 17, 350–362] operator that was originally proposed only to construct discrete AR(pp) processes. We extend the Pegram's operator to accommodate categorical processes with ARMA representations. We observe that the concept of correlation is not always suitable for categorical data. As a sensible alternative, we use the concept of mutual information, and introduce auto-mutual information to define the time series process of categorical data. Some model selection and inferential aspects are also discussed. We implement the developed methodologies to analyze a time series data set on infant sleep status.  相似文献   

4.
The multiple non symmetric correspondence analysis (MNSCA) is a useful technique for analyzing a two-way contingency table. In more complex cases, the predictor variables are more than one. In this paper, the MNSCA, along with the decomposition of the Gray–Williams Tau index, in main effects and interaction term, is used to analyze a contingency table with two predictor categorical variables and an ordinal response variable. The Multiple-Tau index is a measure of association that contains both main effects and interaction term. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition, while the interaction effect represents the combined effect of predictor categorical variables on the ordinal response variable. Moreover, for ordinal scale variables, we propose a further decomposition in order to check the existence of power components by using Emerson's orthogonal polynomials.  相似文献   

5.
6.
Using a multivariate latent variable approach, this article proposes some new general models to analyze the correlated bounded continuous and categorical (nominal or/and ordinal) responses with and without non-ignorable missing values. First, we discuss regression methods for jointly analyzing continuous, nominal, and ordinal responses that we motivated by analyzing data from studies of toxicity development. Second, using the beta and Dirichlet distributions, we extend the models so that some bounded continuous responses are replaced for continuous responses. The joint distribution of the bounded continuous, nominal and ordinal variables is decomposed into a marginal multinomial distribution for the nominal variable and a conditional multivariate joint distribution for the bounded continuous and ordinal variables given the nominal variable. We estimate the regression parameters under the new general location models using the maximum-likelihood method. Sensitivity analysis is also performed to study the influence of small perturbations of the parameters of the missing mechanisms of the model on the maximal normal curvature. The proposed models are applied to two data sets: BMI, Steatosis and Osteoporosis data and Tehran household expenditure budgets.  相似文献   

7.
This paper extends an analysis of variance for categorical data (CATANOVA) procedure to multidimensional contingency tables involving several factors and a response variable measured on a nominal scale. Using an appropriate measure of total variation for multinomial data, partial and multiple association measures are developed as R2 quantities which parallel the analogous statistics in multiple linear regression for quantitative data. In addition, test statistics are derived in terms of these R2 criteria. Finally, this CATANOVA approach is illustrated within the context of 2 three-way contingency table from a multicenter clinicaltrial.  相似文献   

8.
Nonparametric predictive inference (NPI) is a powerful frequentist statistical framework based only on an exchangeability assumption for future and past observations, made possible by the use of lower and upper probabilities. In this article, NPI is presented for ordinal data, which are categorical data with an ordering of the categories. The method uses a latent variable representation of the observations and categories on the real line. Lower and upper probabilities for events involving the next observation are presented, and briefly compared to NPI for non ordered categorical data. As application, the comparison of multiple groups of ordinal data is presented.  相似文献   

9.
In this second part of this paper, reproducibility of discrete ordinal and nominal outcomes is addressed. The first part deals with continuous outcomes, concentrating on intraclass correlation (ρ) in the context of one‐way analysis of variance. For categorical data, the focus has generally not been on a meaningful population parameter such as ρ. However, intraclass correlation has been defined for discrete ordinal data, ρc, and for nominal data, κI. Therefore, a unified approach to reproducibility is proposed. The relevance of these parameters is outlined. Estimation and inferential procedures for ρc and κI are reviewed, together with worked examples. Topics related to reproducibility that are not addressed in either this or the previous paper are highlighted. Considerations for designing reproducibility studies and for interpreting their results are provided. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

10.
Taguchi's statistic has long been known to be a more appropriate measure of association of the dependence for ordinal variables compared to the Pearson chi-squared statistic. Therefore, there is some advantage in using Taguchi's statistic in the correspondence analysis context when a two-way contingency table consists at least of an ordinal categorical variable. The aim of this paper, considering the contingency table with two ordinal categorical variables, is to show a decomposition of Taguchi's index into linear, quadratic and higher-order components. This decomposition has been developed using Emerson's orthogonal polynomials. Moreover, two case studies to explain the methodology have been analyzed.  相似文献   

11.
The multinomial logistic regression model (MLRM) can be interpreted as a natural extension of the binomial model with logit link function to situations where the response variable can have three or more possible outcomes. In addition, when the categories of the response variable are nominal, the MLRM can be expressed in terms of two or more logistic models and analyzed in both frequentist and Bayesian approaches. However, few discussions about post modeling in categorical data models are found in the literature, and they mainly use Bayesian inference. The objective of this work is to present classic and Bayesian diagnostic measures for categorical data models. These measures are applied to a dataset (status) of patients undergoing kidney transplantation.  相似文献   

12.
This study was motivated by the question which type of confidence interval (CI) one should use to summarize sample variance of Goodman and Kruskal's coefficient gamma. In a Monte-Carlo study, we investigated the coverage and computation time of the Goodman–Kruskal CI, the Cliff-consistent CI, the profile likelihood CI, and the score CI for Goodman and Kruskal's gamma, under several conditions. The choice for Goodman and Kruskal's gamma was based on results of Woods [Consistent small-sample variances for six gamma-family measures of ordinal association. Multivar Behav Res. 2009;44:525–551], who found relatively poor coverage for gamma for very small samples compared to other ordinal association measures. The profile likelihood CI and the score CI had the best coverage, close to the nominal value, but those CIs could often not be computed for sparse tables. The coverage of the Goodman–Kruskal CI and the Cliff-consistent CI was often poor. Computation time was fast to reasonably fast for all types of CI.  相似文献   

13.
Likelihood-based marginalized models using random effects have become popular for analyzing longitudinal categorical data. These models permit direct interpretation of marginal mean parameters and characterize the serial dependence of longitudinal outcomes using random effects [12,22]. In this paper, we propose model that expands the use of previous models to accommodate longitudinal nominal data. Random effects using a new covariance matrix with a Kronecker product composition are used to explain serial and categorical dependence. The Quasi-Newton algorithm is developed for estimation. These proposed methods are illustrated with a real data set and compared with other standard methods.  相似文献   

14.
In this paper, we study a nonparametric additive regression model suitable for a wide range of time series applications. Our model includes a periodic component, a deterministic time trend, various component functions of stochastic explanatory variables, and an AR(p) error process that accounts for serial correlation in the regression error. We propose an estimation procedure for the nonparametric component functions and the parameters of the error process based on smooth backfitting and quasimaximum likelihood methods. Our theory establishes convergence rates and the asymptotic normality of our estimators. Moreover, we are able to derive an oracle‐type result for the estimators of the AR parameters: Under fairly mild conditions, the limiting distribution of our parameter estimators is the same as when the nonparametric component functions are known. Finally, we illustrate our estimation procedure by applying it to a sample of climate and ozone data collected on the Antarctic Peninsula.  相似文献   

15.
Summary.  In many areas of pharmaceutical research, there has been increasing use of categorical data and more specifically ordinal responses. In many cases, complex models are required to account for different types of dependences among the responses. The clinical trial that is considered here involved patients who were required to remain in a particular state to enable the doctors to examine their heart. The aim of this trial was to study the relationship between the dose of the drug administered and the time that was spent by the patient in the state permitting examination. The patient's state was measured every second by a continuous Doppler signal which was categorized by the doctors into one of four ordered categories. Hence, the response consisted of repeated ordinal series. These series were of different lengths because the drug effect wore off faster (or slower) on certain patients depending on the drug dose administered and the infusion rate, and therefore the length of drug administration. A general method for generating new ordinal distributions is presented which is sufficiently flexible to handle unbalanced ordinal repeated measurements. It consists of obtaining a cumulative mixture distribution from a Laplace transform and introducing into it the integrated intensity of a binary logistic, continuation ratio or proportional odds model. Then, a multivariate distribution is constructed by a procedure that is similar to the updating process of the Kalman filter. Several types of history dependences are proposed.  相似文献   

16.
Taguchi's statistic has long been known to be a more appropriate measure of association for ordinal variables than the Pearson chi-squared statistic. Therefore, there is some advantage in using Taguchi's statistic for performing correspondence analysis when a two-way contingency table consists of one ordinal categorical variable. This article will explore the development of correspondence analysis using a decomposition of Taguchi's statistic.  相似文献   

17.
This paper reviews existing measures of association for nominal categorical variables, and presents some alternative measures of association, which are symmetric and flexible enough to find partial and multiple association.  相似文献   

18.
It is quite common that raters may need to classify a sample of subjects on a categorical scale. Perfect agreement can rarely be observed partly because of different perceptions about the meanings of the category labels between raters and partly because of factors such as intrarater variability. Usually, category indistinguishability occurs between adjacent categories. In this article, we propose a simple log-linear model combining ordinal scale information and category distinguishability between ordinal categories for modelling agreement between two raters. For the proposed model, no score assignment is required to the ordinal categories. An algorithm and statistical properties will be provided.  相似文献   

19.
For the analysis of square contingency tables with nominal categories, Tomizawa and coworkers have considered measures that represent the degree of departure from symmetry. This paper proposes a measure that represents the degree of asymmetry for square contingency tables with ordered categories (instead of those with nominal categories). The measure proposed is expressed using the Cressie–Read power-divergence or Patil–Taillie diversity index, defined for the cumulative probabilities that an observation falls in row (column) category i or below and column (row) category j (> i ) or above. The measure depends on the order of listing the categories. It should be useful for comparing the degree of asymmetry in several tables with ordered categories. The relationship between the measure and the normal distribution is shown.  相似文献   

20.
In several sciences, especially when dealing with performance evaluation, complex testing problems may arise due in particular to the presence of multidimensional categorical data. In such cases the application of nonparametric methods can represent a reasonable approach. In this paper, we consider the problem of testing whether a “treatment” is stochastically larger than a “control” when univariate and multivariate ordinal categorical data are present. We propose a solution based on the nonparametric combination of dependent permutation tests (Pesarin in Multivariate permutation test with application to biostatistics. Wiley, Chichester, 2001), on variable transformation, and on tests on moments. The solution requires the transformation of categorical response variables into numeric variables and the breaking up of the original problem’s hypotheses into partial sub-hypotheses regarding the moments of the transformed variables. This type of problem is considered to be almost impossible to analyze within likelihood ratio tests, especially in the multivariate case (Wang in J Am Stat Assoc 91:1676–1683, 1996). A comparative simulation study is also presented along with an application example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号