首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
Agreement measures are designed to assess consistency between different instruments rating measurements of interest. When the individual responses are correlated with multilevel structure of nestings and clusters, traditional approaches are not readily available to estimate the inter- and intra-agreement for such complex multilevel settings. Our research stems from conformity evaluation between optometric devices with measurements on both eyes, equality tests of agreement in high myopic status between monozygous twins and dizygous twins, and assessment of reliability for different pathologists in dysplasia. In this paper, we focus on applying a Bayesian hierarchical correlation model incorporating adjustment for explanatory variables and nesting correlation structures to assess the inter- and intra-agreement through correlations of random effects for various sources. This Bayesian generalized linear mixed-effects model (GLMM) is further compared with the approximate intra-class correlation coefficients and kappa measures by the traditional Cohen’s kappa statistic and the generalized estimating equations (GEE) approach. The results of comparison studies reveal that the Bayesian GLMM provides a reliable and stable procedure in estimating inter- and intra-agreement simultaneously after adjusting for covariates and correlation structures, in marked contrast to Cohen’s kappa and the GEE approach.  相似文献   

4.
Kappa and B assess agreement between two observers independently classifying N units into k categories. We study their behavior under zero cells in the contingency table and unbalanced asymmetric marginal distributions. Zero cells arise when a cross-classification is never endorsed by both observers; biased marginal distributions occur when some categories are preferred differently between the observers. Simulations studied the distributions of the unweighted and weighted statistics for k=4, under fixed proportions of diagonal agreement and different patterns off-diagonal, with various sample sizes, and under various zero cell count scenarios. Marginal distributions were first uniform and homogeneous, and then unbalanced asymmetric distributions. Results for unweighted kappa and B statistics were comparable to work of Muñoz and Bangdiwala, even with zero cells. A slight increased variation was observed as the sample size decreased. Weighted statistics did show greater variation as the number of zero cells increased, with weighted kappa increasing substantially more than weighted B. Under biased marginal distributions, weighted kappa with Cicchetti weights were higher than with squared weights. Both statistics for observer agreement behaved well under zero cells. The weighted B was less variable than the weighted kappa under similar circumstances and different weights. In general, B's performance and graphical interpretation make it preferable to kappa under the studied scenarios.  相似文献   

5.
The kappa coefficient is a widely used measure for assessing agreement on a nominal scale. Weighted kappa is an extension of Cohen's kappa that is commonly used for measuring agreement on an ordinal scale. In this article, it is shown that weighted kappa can be computed as a function of unweighted kappas. The latter coefficients are kappa coefficients that correspond to smaller contingency tables that are obtained by merging categories.  相似文献   

6.
It is customary to use two groups of indices to evaluate a diagnostic method with a binary outcome: validity indices with a standard rater (sensitivity, specificity, and positive or negative predictive values) and reliability indices (positive, negative and overall agreements) without a standard rater. However neither of these classic indices is chance-corrected, and this may distort the analysis of the problem (especially in comparative studies). One way of chance-correcting these indices is by using the Delta model (an alternative to the Kappa model), but this means having to use a computer program to work out the calculations. This paper gives an asymptotic version of the Delta model, thus allowing simple expressions to be obtained for the estimator of each of the above-mentioned chance-corrected indices (as well as for its standard error).  相似文献   

7.
Cohen's kappa coefficient is traditionally used to quantify the degree of agreement between two raters on a nominal scale. Correlated kappas occur in many settings (e.g., repeated agreement by raters on the same individuals, concordance between diagnostic tests and a gold standard) and often need to be compared. While different techniques are now available to model correlated κ coefficients, they are generally not easy to implement in practice. The present paper describes a simple alternative method based on the bootstrap for comparing correlated kappa coefficients. The method is illustrated by examples and its type I error studied using simulations. The method is also compared with the generalized estimating equations of the second order and the weighted least-squares methods.  相似文献   

8.
Abstract

Scott’s pi and Cohen’s kappa are widely used for assessing the degree of agreement between two raters with binary outcomes. However, many authors have pointed out its paradoxical behavior, that comes from the dependence on the prevalence of a trait under study. To overcome the limitation, Gwet [Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology 61(1):29–48] proposed an alternative and more stable agreement coefficient referred to as the AC1. In this article, we discuss a likelihood-based inference of the AC1 in the case of two raters with binary outcomes. Construction of confidence intervals is mainly discussed. In addition, hypothesis testing, and sample size estimation are also presented.  相似文献   

9.
Sensitivity and specificity are classic parameters to assess the performance of a binary diagnostic test. Another useful parameter to measure the performance of a binary test is the weighted kappa coefficient, which is a measure of the classificatory agreement between the binary test and the gold standard. Various confidence intervals are proposed for the weighted kappa coefficient when the binary test and the gold standard are applied to all of the patients in a random sample. The results have been applied to the diagnosis of coronary artery disease.  相似文献   

10.
Abstract Calculation of a confidence interval for intraclass correlation to assess inter‐rater reliability is problematic when the number of raters is small and the rater effect is not negligible. Intervals produced by existing methods are uninformative: the lower bound is often close to zero, even in cases where the reliability is good and the sample size is large. In this paper, we show that this problem is unavoidable without extra assumptions and we propose two new approaches. The first approach assumes that the raters are sufficiently trained and is related to a sensitivity analysis. The second approach is based on a model with fixed rater effect. Using either approach, we obtain conservative and informative confidence intervals even from samples with only two raters. We illustrate our point with data on the development of neuromotor functions in children and adolescents.  相似文献   

11.
The authors describe a model‐based kappa statistic for binary classifications which is interpretable in the same manner as Scott's pi and Cohen's kappa, yet does not suffer from the same flaws. They compare this statistic with the data‐driven and population‐based forms of Scott's pi in a population‐based setting where many raters and subjects are involved, and inference regarding the underlying diagnostic procedure is of interest. The authors show that Cohen's kappa and Scott's pi seriously underestimate agreement between experts classifying subjects for a rare disease; in contrast, the new statistic is robust to changes in prevalence. The performance of the three statistics is illustrated with simulations and prostate cancer data.  相似文献   

12.
Using local kappa coefficients, we develop a method to assess the agreement between two discrete survival times that are measured on the same subject by different raters or methods. We model the marginal distributions for the two event times and local kappa coefficients in terms of covariates. An estimating equation is used for modeling the marginal distributions and a pseudo-likelihood procedure is used to estimate the parameters in the kappa model. The performance of the estimation procedure is examined through simulations. The proposed method can be extended to multivariate discrete survival distributions.  相似文献   

13.
The asympotic normal approximation to the distribution of the estimated measure [kcirc] for evaluating agreement between two raters has been shown to perform poorly for small sample sizes when the true kappa is nonzero. This paper examines the use of skewness corrections and transformations of [kcirc] on the attained confidence levels. Small sample simulations demonstrate the improvement in the agreement between the desired and actual levels of confidence intervals and hypothesis tests that incorporate these corrections.  相似文献   

14.
This article investigates the effects of number of clusters, cluster size, and correction for chance agreement on the distribution of two similarity indices, namely, Jaccard and Rand indices. Skewness and kurtosis are calculated for the two indices and their corrected forms then compared with those of the normal distribution. Three clustering algorithms are implemented: complete linkage, Ward, and K-means. Data were randomly generated from bivariate normal distributions with specified means and variance covariance matrices. Three-way ANOVA is performed to assess the significance of the design factors using skewness and kurtosis of the indices as responses. Test statistics for testing skewness and kurtosis and observed power are calculated. Simulation results showed that independent of the clustering algorithms or the similarity indices used, the interaction effect cluster size x number of clusters and the main effects of cluster size and number of clusters were found always significant for skewness and kurtosis. The three way interaction of cluster size x correction x number of clusters was significant for skewness of Rand and Jaccard indices using all clustering algorithms, but was not significant using Ward's method for both Rand and Jaccard indices, while significant for Jaccard only using complete linkage and K-means algorithms. The correction for chance agreement was significant for skewness and kurtosis using Rand and Jaccard indices when complete linkage method is used. Hence, such design factors must be taken into consideration when studying distribution of such indices.  相似文献   

15.
The weighted kappa coefficient of a binary diagnostic test (BDT) is a measure of performance of a BDT, and is a function of the sensitivity and the specificity of the diagnostic test, of the disease prevalence and the weighting index. Weighting index represents the relative loss between the false positives and the false negatives. In this study, we propose a new measure of performance of a BDT: the average kappa coefficient. This parameter is the average function of the weighted kappa coefficients and does not depend on the weighting index. We have studied three asymptotic confidence intervals (CIs) for the average kappa coefficient, Wald, logit and bias-corrected bootstrap, and we carried out some simulation experiments to study the asymptotic coverage of each of the three CIs. We have written a program in R, called ‘akcbdt’, to estimate the average kappa coefficient of a BDT. This program is available as supplementary material. The results were applied to two examples.  相似文献   

16.
The weighted kappa coefficient of a binary diagnostic test is a measure of the beyond-chance agreement between the diagnostic test and the gold standard, and is a measure that allows us to assess and compare the performance of binary diagnostic tests. In the presence of partial disease verification, the comparison of the weighted kappa coefficients of two or more binary diagnostic tests cannot be carried out ignoring the individuals with an unknown disease status, since the estimators obtained would be affected by verification bias. In this article, we propose a global hypothesis test based on the chi-square distribution to simultaneously compare the weighted kappa coefficients when in the presence of partial disease verification the missing data mechanism is ignorable. Simulation experiments have been carried out to study the type I error and the power of the global hypothesis test. The results have been applied to the diagnosis of coronary disease.  相似文献   

17.
Optimality of equal versus unequal cluster sizes in the context of multilevel intervention studies is examined. A Monte Carlo study is done to examine to what degree asymptotic results on the optimality hold for realistic sample sizes and for different estimation methods. The relative D-criterion, comparing equal versus unequal cluster sizes, almost always exceeded 85%, implying that loss of information due to unequal cluster sizes can be compensated for by increasing the number of clusters by 18%. The simulation results are in line with asymptotic results, showing that, for realistic sample sizes and various estimation methods, the asymptotic results can be used in planning multilevel intervention studies.  相似文献   

18.
Case–control design to assess the accuracy of a binary diagnostic test (BDT) is very frequent in clinical practice. This design consists of applying the diagnostic test to all of the individuals in a sample of those who have the disease and in another sample of those who do not have the disease. The sensitivity of the diagnostic test is estimated from the case sample and the specificity is estimated from the control sample. Another parameter which is used to assess the performance of a BDT is the weighted kappa coefficient. The weighted kappa coefficient depends on the sensitivity and specificity of the diagnostic test, on the disease prevalence and on the weighting index. In this article, confidence intervals are studied for the weighted kappa coefficient subject to a case–control design and a method is proposed to calculate the sample sizes to estimate this parameter. The results obtained were applied to a real example.  相似文献   

19.
Category Distinguishability and Observer Agreement   总被引:1,自引:0,他引:1  
It is common in the medical, biological, and social sciences for the categories into which an object is classified not to have a fully objective definition. Theoretically speaking the categories are therefore not completely distinguishable. The practical extent of their distinguishability can be measured when two expert observers classify the same sample of objects. It is shown, under reasonable assumptions, that the matrix of joint classification probabilities is quasi-symmetric, and that the symmetric matrix component is non-negative definite. The degree of distinguishability between two categories is defined and is used to give a measure of overall category distinguishability. It is argued that the kappa measure of observer agreement is unsatisfactory as a measure of overall category distinguishability.  相似文献   

20.
The Cohen kappa is probably the most widely used measure of agreement. Measuring the degree of agreement or disagreement in square contingency tables by two raters is mostly of interest. Modeling the agreement provides more information on the pattern of the agreement rather than summarizing the agreement by kappa coefficient. Additionally, the disagreement models in the literature they mentioned are proposed for the nominal scales. Disagreement and uniform association models are aggregated as a new model for the ordinal scale agreement data, thus in this paper, symmetric disagreement plus uniform association model that aims separating the association from the disagreement is proposed. Proposed model is applied to real uterine cancer data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号