首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cohen's kappa coefficient is traditionally used to quantify the degree of agreement between two raters on a nominal scale. Correlated kappas occur in many settings (e.g., repeated agreement by raters on the same individuals, concordance between diagnostic tests and a gold standard) and often need to be compared. While different techniques are now available to model correlated κ coefficients, they are generally not easy to implement in practice. The present paper describes a simple alternative method based on the bootstrap for comparing correlated kappa coefficients. The method is illustrated by examples and its type I error studied using simulations. The method is also compared with the generalized estimating equations of the second order and the weighted least-squares methods.  相似文献   

2.
Using local kappa coefficients, we develop a method to assess the agreement between two discrete survival times that are measured on the same subject by different raters or methods. We model the marginal distributions for the two event times and local kappa coefficients in terms of covariates. An estimating equation is used for modeling the marginal distributions and a pseudo-likelihood procedure is used to estimate the parameters in the kappa model. The performance of the estimation procedure is examined through simulations. The proposed method can be extended to multivariate discrete survival distributions.  相似文献   

3.
4.
The Cohen kappa is probably the most widely used measure of agreement. Measuring the degree of agreement or disagreement in square contingency tables by two raters is mostly of interest. Modeling the agreement provides more information on the pattern of the agreement rather than summarizing the agreement by kappa coefficient. Additionally, the disagreement models in the literature they mentioned are proposed for the nominal scales. Disagreement and uniform association models are aggregated as a new model for the ordinal scale agreement data, thus in this paper, symmetric disagreement plus uniform association model that aims separating the association from the disagreement is proposed. Proposed model is applied to real uterine cancer data.  相似文献   

5.
The weighted kappa coefficient of a binary diagnostic test is a measure of the beyond-chance agreement between the diagnostic test and the gold standard, and is a measure that allows us to assess and compare the performance of binary diagnostic tests. In the presence of partial disease verification, the comparison of the weighted kappa coefficients of two or more binary diagnostic tests cannot be carried out ignoring the individuals with an unknown disease status, since the estimators obtained would be affected by verification bias. In this article, we propose a global hypothesis test based on the chi-square distribution to simultaneously compare the weighted kappa coefficients when in the presence of partial disease verification the missing data mechanism is ignorable. Simulation experiments have been carried out to study the type I error and the power of the global hypothesis test. The results have been applied to the diagnosis of coronary disease.  相似文献   

6.
It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 JohnWiley & Sons, Ltd.  相似文献   

7.
Cohen’s kappa, a special case of the weighted kappa, is a chance‐corrected index used extensively to quantify inter‐rater agreement in validation and reliability studies. In this paper, it is shown that in inter‐rater agreement for 2 × 2 tables, for two raters having the same number of opposite ratings, the weighted kappa, Cohen’s kappa, Peirce, Yule, Maxwell and Pilliner and Fleiss indices are identical. This implies that the weights in the weighted kappa are less important under such assumptions. Equivalently, it is shown that for two partitions of the same data set, resulting from two clustering algorithms having the same number of clusters with equal cluster sizes, these similarity indices are identical. Hence, an important characterisation is formulated relating equal numbers of clusters with the same cluster sizes to the presence/absence of a trait in a reliability study. Two numerical examples that exemplify the implication of this relationship are presented.  相似文献   

8.
Cohen's kappa statistic is the conventional method that is used widely in measuring agreement between two responses when they are categorical. In this article, we develop a fixed-effects modeling of Cohen's kappa for bivariate multinomial data which reduces to Cohen's kappa under certain conditions and hence can be considered as a generalization of the conventional Cohen's kappa. Also, this method can easily be adapted as a generalization of Cohen's weighted kappa. Properties of the proposed method are provided. Large sample performance is investigated through bootstrap simulation studies followed by two illustrative examples.  相似文献   

9.
10.
Agreement measures are designed to assess consistency between different instruments rating measurements of interest. When the individual responses are correlated with multilevel structure of nestings and clusters, traditional approaches are not readily available to estimate the inter- and intra-agreement for such complex multilevel settings. Our research stems from conformity evaluation between optometric devices with measurements on both eyes, equality tests of agreement in high myopic status between monozygous twins and dizygous twins, and assessment of reliability for different pathologists in dysplasia. In this paper, we focus on applying a Bayesian hierarchical correlation model incorporating adjustment for explanatory variables and nesting correlation structures to assess the inter- and intra-agreement through correlations of random effects for various sources. This Bayesian generalized linear mixed-effects model (GLMM) is further compared with the approximate intra-class correlation coefficients and kappa measures by the traditional Cohen’s kappa statistic and the generalized estimating equations (GEE) approach. The results of comparison studies reveal that the Bayesian GLMM provides a reliable and stable procedure in estimating inter- and intra-agreement simultaneously after adjusting for covariates and correlation structures, in marked contrast to Cohen’s kappa and the GEE approach.  相似文献   

11.
The accelerated failure time (AFT) model is an important regression tool to study the association between failure time and covariates. In this paper, we propose a robust weighted generalized M (GM) estimation for the AFT model with right-censored data by appropriately using the Kaplan–Meier weights in the GM–type objective function to estimate the regression coefficients and scale parameter simultaneously. This estimation method is computationally simple and can be implemented with existing software. Asymptotic properties including the root-n consistency and asymptotic normality are established for the resulting estimator under suitable conditions. We further show that the method can be readily extended to handle a class of nonlinear AFT models. Simulation results demonstrate satisfactory finite sample performance of the proposed estimator. The practical utility of the method is illustrated by a real data example.  相似文献   

12.
13.
Agreement among raters is an important issue in medicine, as well as in education and psychology. The agreement among two raters on a nominal or ordinal rating scale has been investigated in many articles. The multi-rater case with normally distributed ratings has also been explored at length. However, there is a lack of research on multiple raters using an ordinal rating scale. In this simulation study, several methods were compared with analyze rater agreement. The special case that was focused on was the multi-rater case using a bounded ordinal rating scale. The proposed methods for agreement were compared within different settings. Three main ordinal data simulation settings were used (normal, skewed and shifted data). In addition, the proposed methods were applied to a real data set from dermatology. The simulation results showed that the Kendall's W and mean gamma highly overestimated the agreement in data sets with shifts in data. ICC4 for bounded data should be avoided in agreement studies with rating scales<5, where this method highly overestimated the simulated agreement. The difference in bias for all methods under study, except the mean gamma and Kendall's W, decreased as the rating scale increased. The bias of ICC3 was consistent and small for nearly all simulation settings except the low agreement setting in the shifted data set. Researchers should be careful in selecting agreement methods, especially if shifts in ratings between raters exist and may apply more than one method before any conclusions are made.  相似文献   

14.
Abstract

Scott’s pi and Cohen’s kappa are widely used for assessing the degree of agreement between two raters with binary outcomes. However, many authors have pointed out its paradoxical behavior, that comes from the dependence on the prevalence of a trait under study. To overcome the limitation, Gwet [Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology 61(1):29–48] proposed an alternative and more stable agreement coefficient referred to as the AC1. In this article, we discuss a likelihood-based inference of the AC1 in the case of two raters with binary outcomes. Construction of confidence intervals is mainly discussed. In addition, hypothesis testing, and sample size estimation are also presented.  相似文献   

15.
Predictor importance in applied regression modeling gives the main operational tools for managers and decision-makers. The paper considers estimation of predictors' importance in regression using measures introduced in works by Gibson and R. Johnson (GJ), then modified by Green, Carroll, and DeSarbo, and developed further by J. Johnson (JJ). These indices of importance are based on the orthonormal decomposition of the data matrix, and the work shows how to improve this approximation. Using predictor importance, the regression coefficients can also be adjusted to reach the best data fit and to be meaningful and interpretable. The results are compared with the robust to multicollinearity, but computationally difficult, Shapley value regression (SVR). They show that the JJ index is good for importance estimation, but the GJ index outperforms it if both predictor importance and coefficients of regression are needed; hence, this index (GJ) can be used in place of the more computationally intensive estimation by SVR. The results can be easily estimated by the considered approach that is very useful in practical regression modeling and analysis, especially for big data.  相似文献   

16.
A concept of adaptive least squares polynomials is introduced for modelling time series data. A recursion algorithm for updating coefficients of the adaptive polynomial (of a fixed degree) is derived. This concept assumes that the weights w are such that i) the importance of the data values, in terms of their weights, relative to each other stays fixed, and that ii) they satisfy the update property, i.e., the polynomial does not change if a new data value is a polynomial extrapolate. Closed form results are provided for exponential weights as a special case as they are shown to possess the update property when used with polynomials.

The concept of adaptive polynomials is similar to the linear adaptive prediction provided by the Kalman filter or the Least Mean Square algorithm of Widrow and Hoff. They can be useful in interpolating, tracking and analyzing nonstationary data.  相似文献   

17.
Category Distinguishability and Observer Agreement   总被引:1,自引:0,他引:1  
It is common in the medical, biological, and social sciences for the categories into which an object is classified not to have a fully objective definition. Theoretically speaking the categories are therefore not completely distinguishable. The practical extent of their distinguishability can be measured when two expert observers classify the same sample of objects. It is shown, under reasonable assumptions, that the matrix of joint classification probabilities is quasi-symmetric, and that the symmetric matrix component is non-negative definite. The degree of distinguishability between two categories is defined and is used to give a measure of overall category distinguishability. It is argued that the kappa measure of observer agreement is unsatisfactory as a measure of overall category distinguishability.  相似文献   

18.
19.
The authors describe a model‐based kappa statistic for binary classifications which is interpretable in the same manner as Scott's pi and Cohen's kappa, yet does not suffer from the same flaws. They compare this statistic with the data‐driven and population‐based forms of Scott's pi in a population‐based setting where many raters and subjects are involved, and inference regarding the underlying diagnostic procedure is of interest. The authors show that Cohen's kappa and Scott's pi seriously underestimate agreement between experts classifying subjects for a rare disease; in contrast, the new statistic is robust to changes in prevalence. The performance of the three statistics is illustrated with simulations and prostate cancer data.  相似文献   

20.
Sensitivity and specificity are classic parameters to assess the performance of a binary diagnostic test. Another useful parameter to measure the performance of a binary test is the weighted kappa coefficient, which is a measure of the classificatory agreement between the binary test and the gold standard. Various confidence intervals are proposed for the weighted kappa coefficient when the binary test and the gold standard are applied to all of the patients in a random sample. The results have been applied to the diagnosis of coronary artery disease.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号