首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The weighted kappa coefficient of a binary diagnostic test is a measure of the beyond-chance agreement between the diagnostic test and the gold standard, and is a measure that allows us to assess and compare the performance of binary diagnostic tests. In the presence of partial disease verification, the comparison of the weighted kappa coefficients of two or more binary diagnostic tests cannot be carried out ignoring the individuals with an unknown disease status, since the estimators obtained would be affected by verification bias. In this article, we propose a global hypothesis test based on the chi-square distribution to simultaneously compare the weighted kappa coefficients when in the presence of partial disease verification the missing data mechanism is ignorable. Simulation experiments have been carried out to study the type I error and the power of the global hypothesis test. The results have been applied to the diagnosis of coronary disease.  相似文献   

2.
The weighted kappa coefficient of a binary diagnostic test (BDT) is a measure of performance of a BDT, and is a function of the sensitivity and the specificity of the diagnostic test, of the disease prevalence and the weighting index. Weighting index represents the relative loss between the false positives and the false negatives. In this study, we propose a new measure of performance of a BDT: the average kappa coefficient. This parameter is the average function of the weighted kappa coefficients and does not depend on the weighting index. We have studied three asymptotic confidence intervals (CIs) for the average kappa coefficient, Wald, logit and bias-corrected bootstrap, and we carried out some simulation experiments to study the asymptotic coverage of each of the three CIs. We have written a program in R, called ‘akcbdt’, to estimate the average kappa coefficient of a BDT. This program is available as supplementary material. The results were applied to two examples.  相似文献   

3.
The accuracy of a binary diagnostic test is usually measured in terms of its sensitivity and its specificity, or through positive and negative predictive values. Another way to describe the validity of a binary diagnostic test is the risk of error and the kappa coefficient of the risk of error. The risk of error is the average loss that is caused when incorrectly classifying a non-diseased or a diseased patient, and the kappa coefficient of the risk of error is a measure of the agreement between the diagnostic test and the gold standard. In the presence of partial verification of the disease, the disease status of some patients is unknown, and therefore the evaluation of a diagnostic test cannot be carried out through the traditional method. In this paper, we have deduced the maximum likelihood estimators and variances of the risk of error and of the kappa coefficient of the risk of error in the presence of partial verification of the disease. Simulation experiments have been carried out to study the effect of the verification probabilities on the coverage of the confidence interval of the kappa coefficient.  相似文献   

4.
Case–control design to assess the accuracy of a binary diagnostic test (BDT) is very frequent in clinical practice. This design consists of applying the diagnostic test to all of the individuals in a sample of those who have the disease and in another sample of those who do not have the disease. The sensitivity of the diagnostic test is estimated from the case sample and the specificity is estimated from the control sample. Another parameter which is used to assess the performance of a BDT is the weighted kappa coefficient. The weighted kappa coefficient depends on the sensitivity and specificity of the diagnostic test, on the disease prevalence and on the weighting index. In this article, confidence intervals are studied for the weighted kappa coefficient subject to a case–control design and a method is proposed to calculate the sample sizes to estimate this parameter. The results obtained were applied to a real example.  相似文献   

5.
The kappa coefficient is a widely used measure for assessing agreement on a nominal scale. Weighted kappa is an extension of Cohen's kappa that is commonly used for measuring agreement on an ordinal scale. In this article, it is shown that weighted kappa can be computed as a function of unweighted kappas. The latter coefficients are kappa coefficients that correspond to smaller contingency tables that are obtained by merging categories.  相似文献   

6.
The comparison of the accuracy of two binary diagnostic tests has traditionally required knowledge of the disease status in all of the patients in the sample via the application of a gold standard. In practice, the gold standard is not always applied to all patients in a sample, and the problem of partial verification of the disease arises. The accuracy of a binary diagnostic test can be measured in terms of positive and negative predictive values, which represent the accuracy of a diagnostic test when it is applied to a cohort of patients. In this paper, we deduce the maximum likelihood estimators of predictive values (PVs) of two binary diagnostic tests, and the hypothesis tests to compare these measures when, in the presence of partial disease verification, the verification process only depends on the results of the two diagnostic tests. The effect of verification bias on the naïve estimators of PVs of two diagnostic tests is studied, and simulation experiments are performed in order to investigate the small sample behaviour of hypothesis tests. The hypothesis tests which we have deduced can be applied when all of the patients are verified with the gold standard. The results obtained have been applied to the diagnosis of coronary stenosis.  相似文献   

7.
Cohen's kappa coefficient is traditionally used to quantify the degree of agreement between two raters on a nominal scale. Correlated kappas occur in many settings (e.g., repeated agreement by raters on the same individuals, concordance between diagnostic tests and a gold standard) and often need to be compared. While different techniques are now available to model correlated κ coefficients, they are generally not easy to implement in practice. The present paper describes a simple alternative method based on the bootstrap for comparing correlated kappa coefficients. The method is illustrated by examples and its type I error studied using simulations. The method is also compared with the generalized estimating equations of the second order and the weighted least-squares methods.  相似文献   

8.
The assessment of a binary diagnostic test requires a knowledge of the disease status of all the patients in the sample through the application of a gold standard. In practice, the gold standard is not always applied to all of the patients, which leads to the problem of partial verification of the disease. When the accuracy of the diagnostic test is assessed using only those patients whose disease status has been verified using the gold standard, the estimators obtained in this way, known as Naïve estimators, may be biased. In this study, we obtain the explicit expressions of the bias of the Naïve estimators of sensitivity and specificity of a binary diagnostic test. We also carry out simulation experiments in order to study the effect of the verification probabilities on the Naïve estimators of sensitivity and specificity.  相似文献   

9.
Measurement error is a commonly addressed problem in psychometrics and the behavioral sciences, particularly where gold standard data either does not exist or are too expensive. The Bayesian approach can be utilized to adjust for the bias that results from measurement error in tests. Bayesian methods offer other practical advantages for the analysis of epidemiological data including the possibility of incorporating relevant prior scientific information and the ability to make inferences that do not rely on large sample assumptions. In this paper we consider a logistic regression model where both the response and a binary covariate are subject to misclassification. We assume both a continuous measure and a binary diagnostic test are available for the response variable but no gold standard test is assumed available. We consider a fully Bayesian analysis that affords such adjustments, accounting for the sources of error and correcting estimates of the regression parameters. Based on the results from our example and simulations, the models that account for misclassification produce more statistically significant results, than the models that ignore misclassification. A real data example on math disorders is considered.  相似文献   

10.
Verification bias may occur when the test results of not all subjects are verified by using a gold standard. The correction for this bias can be made using different approaches depending on whether missing gold standard test results are random or not. Some of these approaches with binary test and gold standard results include the correction method by Begg and Greenes, lower and upper limits for diagnostic measurements by Zhou, logistic regression method, multiple imputation method, and neural networks. In this study, all these approaches are compared by employing a real and simulated data under different conditions.  相似文献   

11.
Bayesian sample size estimation for equivalence and non-inferiority tests for diagnostic methods is considered. The goal of the study is to test whether a new screening test of interest is equivalent to, or not inferior to the reference test, which may or may not be a gold standard. Sample sizes are chosen by the model performance criteria of average posterior variance, length and coverage probability. In the absence of a gold standard, sample sizes are evaluated by the ratio of marginal probabilities of the two screening tests; whereas in the presence of gold standard, sample sizes are evaluated by the measures of sensitivity and specificity.  相似文献   

12.
The Cohen kappa is probably the most widely used measure of agreement. Measuring the degree of agreement or disagreement in square contingency tables by two raters is mostly of interest. Modeling the agreement provides more information on the pattern of the agreement rather than summarizing the agreement by kappa coefficient. Additionally, the disagreement models in the literature they mentioned are proposed for the nominal scales. Disagreement and uniform association models are aggregated as a new model for the ordinal scale agreement data, thus in this paper, symmetric disagreement plus uniform association model that aims separating the association from the disagreement is proposed. Proposed model is applied to real uterine cancer data.  相似文献   

13.
Cohen's kappa statistic is the conventional method that is used widely in measuring agreement between two responses when they are categorical. In this article, we develop a fixed-effects modeling of Cohen's kappa for bivariate multinomial data which reduces to Cohen's kappa under certain conditions and hence can be considered as a generalization of the conventional Cohen's kappa. Also, this method can easily be adapted as a generalization of Cohen's weighted kappa. Properties of the proposed method are provided. Large sample performance is investigated through bootstrap simulation studies followed by two illustrative examples.  相似文献   

14.
Estimated associations between an outcome variable and misclassified covariates tend to be biased when the methods of estimation that ignore the classification error are applied. Available methods to account for misclassification often require the use of a validation sample (i.e. a gold standard). In practice, however, such a gold standard may be unavailable or impractical. We propose a Bayesian approach to adjust for misclassification in a binary covariate in the random effect logistic model when a gold standard is not available. This Markov Chain Monte Carlo (MCMC) approach uses two imperfect measures of a dichotomous exposure under the assumptions of conditional independence and non-differential misclassification. A simulated numerical example and a real clinical example are given to illustrate the proposed approach. Our results suggest that the estimated log odds of inpatient care and the corresponding standard deviation are much larger in our proposed method compared with the models ignoring misclassification. Ignoring misclassification produces downwardly biased estimates and underestimate uncertainty.  相似文献   

15.
Kappa and B assess agreement between two observers independently classifying N units into k categories. We study their behavior under zero cells in the contingency table and unbalanced asymmetric marginal distributions. Zero cells arise when a cross-classification is never endorsed by both observers; biased marginal distributions occur when some categories are preferred differently between the observers. Simulations studied the distributions of the unweighted and weighted statistics for k=4, under fixed proportions of diagonal agreement and different patterns off-diagonal, with various sample sizes, and under various zero cell count scenarios. Marginal distributions were first uniform and homogeneous, and then unbalanced asymmetric distributions. Results for unweighted kappa and B statistics were comparable to work of Muñoz and Bangdiwala, even with zero cells. A slight increased variation was observed as the sample size decreased. Weighted statistics did show greater variation as the number of zero cells increased, with weighted kappa increasing substantially more than weighted B. Under biased marginal distributions, weighted kappa with Cicchetti weights were higher than with squared weights. Both statistics for observer agreement behaved well under zero cells. The weighted B was less variable than the weighted kappa under similar circumstances and different weights. In general, B's performance and graphical interpretation make it preferable to kappa under the studied scenarios.  相似文献   

16.
Logistic-normal models can be applied for analysis of longitudinal binary data. The aim of this article is to propose a goodness-of-fit test using nonparametric smoothing techniques for checking the adequacy of logistic-normal models. Moreover, the leave-one-out cross-validation method for selecting the suitable bandwidth is developed. The quadratic form of the proposed test statistic based on smoothing residuals provides a global measure for checking the model with categorical and continuous covariates. The formulae of expectation and variance of the proposed statistics are derived, and their asymptotic distribution is approximated by a scaled chi-squared distribution. The power performance of the proposed test for detecting the interaction term or the squared term of continuous covariates is examined by simulation studies. A longitudinal dataset is utilized to illustrate the application of the proposed test.  相似文献   

17.
18.
The study of the dependence between two medical diagnostic tests is an important issue in health research since it can modify the diagnosis and, therefore, the decision regarding a therapeutic treatment for an individual. In many practical situations, the diagnostic procedure includes the use of two tests, with outcomes on a continuous scale. For final classification, usually there is an additional “gold standard” or reference test. Considering binary test responses, we usually assume independence between tests or a joint binary structure for dependence. In this article, we introduce a simulation study assuming two dependent dichotomized tests using two copula function dependence structures in the presence or absence of verification bias. We compare the test parameter estimators obtained under copula structure dependence with those obtained assuming binary dependence or assuming independent tests.  相似文献   

19.
Spearman's rank correlation coefficient is not entirely suitable for measuring the correlation between two rankings in some applications because it treats all ranks equally. In 2000, Blest proposed an alternative measure of correlation that gives more importance to higher ranks but has some drawbacks. This paper proposes a weighted rank measure of correlation that weights the distance between two ranks using a linear function of those ranks, giving more importance to higher ranks than lower ones. It analyses its distribution and provides a table of critical values to test whether a given value of the coefficient is significantly different from zero. The paper also summarizes a number of applications for which the new measure is more suitable than Spearman's.  相似文献   

20.
The authors describe a model‐based kappa statistic for binary classifications which is interpretable in the same manner as Scott's pi and Cohen's kappa, yet does not suffer from the same flaws. They compare this statistic with the data‐driven and population‐based forms of Scott's pi in a population‐based setting where many raters and subjects are involved, and inference regarding the underlying diagnostic procedure is of interest. The authors show that Cohen's kappa and Scott's pi seriously underestimate agreement between experts classifying subjects for a rare disease; in contrast, the new statistic is robust to changes in prevalence. The performance of the three statistics is illustrated with simulations and prostate cancer data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号