期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Asymptotic hypothesis test to simultaneously compare the weighted kappa coefficients of multiple binary diagnostic tests in the presence of ignorable missing data

《Journal of Statistical Computation and Simulation》2012,82(2):273-289

The weighted kappa coefficient of a binary diagnostic test is a measure of the beyond-chance agreement between the diagnostic test and the gold standard, and is a measure that allows us to assess and compare the performance of binary diagnostic tests. In the presence of partial disease verification, the comparison of the weighted kappa coefficients of two or more binary diagnostic tests cannot be carried out ignoring the individuals with an unknown disease status, since the estimators obtained would be affected by verification bias. In this article, we propose a global hypothesis test based on the chi-square distribution to simultaneously compare the weighted kappa coefficients when in the presence of partial disease verification the missing data mechanism is ignorable. Simulation experiments have been carried out to study the type I error and the power of the global hypothesis test. The results have been applied to the diagnosis of coronary disease. 相似文献

2.

Comparison of the accuracy of multiple binary tests in the presence of partial disease verification

José Antonio Roldán Nofuentes Juan Dios Luna del Castillo Ana Eugenia Marín Jimenez 《Journal of statistical planning and inference》2010

In the presence of partial disease verification, the comparison of the accuracy of binary diagnostic tests cannot be carried out through the paired comparison of the diagnostic tests applying McNemar's test, since for a subsample of patients the disease status is unknown. In this study, we have deduced the maximum likelihood estimators for the sensitivities and specificities of multiple binary diagnostic tests and we have studied various joint hypothesis tests based on the chi-square distribution to compare simultaneously the accuracy of these binary diagnostic tests when for some patients in the sample the disease status is unknown. Simulation experiments were carried out to study the type I error and the power of each hypothesis test deduced. The results obtained were applied to the diagnosis of coronary stenosis. 相似文献

3.

The effect of verification bias on the comparison of predictive values of two binary diagnostic tests

J.A. Roldán Nofuentes J.D. Luna del Castillo 《Journal of statistical planning and inference》2008

The comparison of the accuracy of two binary diagnostic tests has traditionally required knowledge of the disease status in all of the patients in the sample via the application of a gold standard. In practice, the gold standard is not always applied to all patients in a sample, and the problem of partial verification of the disease arises. The accuracy of a binary diagnostic test can be measured in terms of positive and negative predictive values, which represent the accuracy of a diagnostic test when it is applied to a cohort of patients. In this paper, we deduce the maximum likelihood estimators of predictive values (PVs) of two binary diagnostic tests, and the hypothesis tests to compare these measures when, in the presence of partial disease verification, the verification process only depends on the results of the two diagnostic tests. The effect of verification bias on the naïve estimators of PVs of two diagnostic tests is studied, and simulation experiments are performed in order to investigate the small sample behaviour of hypothesis tests. The hypothesis tests which we have deduced can be applied when all of the patients are verified with the gold standard. The results obtained have been applied to the diagnosis of coronary stenosis. 相似文献

4.

The Effect of Verification Bias in the Naïve Estimators of Accuracy of a Binary Diagnostic Test

J. A. Roldán Nofuentes J. D. Luna del Castillo 《统计学通讯:模拟与计算》2013,42(5):959-972

The assessment of a binary diagnostic test requires a knowledge of the disease status of all the patients in the sample through the application of a gold standard. In practice, the gold standard is not always applied to all of the patients, which leads to the problem of partial verification of the disease. When the accuracy of the diagnostic test is assessed using only those patients whose disease status has been verified using the gold standard, the estimators obtained in this way, known as Naïve estimators, may be biased. In this study, we obtain the explicit expressions of the bias of the Naïve estimators of sensitivity and specificity of a binary diagnostic test. We also carry out simulation experiments in order to study the effect of the verification probabilities on the Naïve estimators of sensitivity and specificity. 相似文献

5.

Asymptotic hypothesis test to compare likelihood ratios of multiple diagnostic tests in unpaired designs

Jan Luts José Antonio Roldán Nofuentes 《Journal of statistical planning and inference》2011,141(11):3578-3594

The accuracy of a binary diagnostic test is usually measured in terms of its sensitivity and its specificity. Other measures of the performance of a diagnostic test are the positive and negative likelihood ratios, which quantify the increase in knowledge about the presence of the disease through the application of a diagnostic test, and which depend on the sensitivity and specificity of the diagnostic test. In this article, we construct an asymptotic hypothesis test to simultaneously compare the positive and negative likelihood ratios of two or more diagnostic tests in unpaired designs. The hypothesis test is based on the logarithmic transformation of the likelihood ratios and on the chi-square distribution. Simulation experiments have been carried out to study the type I error and the power of the constructed hypothesis test when comparing two and three binary diagnostic tests. The method has been extended to the case of multiple multi-level diagnostic tests. 相似文献

6.

Average kappa coefficient: a new measure to assess a binary test considering the losses associated with an erroneous classification

《Journal of Statistical Computation and Simulation》2012,82(8):1601-1620

The weighted kappa coefficient of a binary diagnostic test (BDT) is a measure of performance of a BDT, and is a function of the sensitivity and the specificity of the diagnostic test, of the disease prevalence and the weighting index. Weighting index represents the relative loss between the false positives and the false negatives. In this study, we propose a new measure of performance of a BDT: the average kappa coefficient. This parameter is the average function of the weighted kappa coefficients and does not depend on the weighting index. We have studied three asymptotic confidence intervals (CIs) for the average kappa coefficient, Wald, logit and bias-corrected bootstrap, and we carried out some simulation experiments to study the asymptotic coverage of each of the three CIs. We have written a program in R, called ‘akcbdt’, to estimate the average kappa coefficient of a BDT. This program is available as supplementary material. The results were applied to two examples. 相似文献

7.

Confidence Intervals of Weighted Kappa Coefficient of a Binary Diagnostic Test

J. A. Roldán Nofuentes J. D. Luna del Castillo M. A. Montero Alonso 《统计学通讯:模拟与计算》2013,42(8):1562-1578

Sensitivity and specificity are classic parameters to assess the performance of a binary diagnostic test. Another useful parameter to measure the performance of a binary test is the weighted kappa coefficient, which is a measure of the classificatory agreement between the binary test and the gold standard. Various confidence intervals are proposed for the weighted kappa coefficient when the binary test and the gold standard are applied to all of the patients in a random sample. The results have been applied to the diagnosis of coronary artery disease. 相似文献

8.

Approximate confidence intervals for the weighted kappa coefficient of a binary diagnostic test subject to a case–control design

J. A. Roldán-Nofuentes R. M. Amro 《Journal of Statistical Computation and Simulation》2017,87(3):530-545

Case–control design to assess the accuracy of a binary diagnostic test (BDT) is very frequent in clinical practice. This design consists of applying the diagnostic test to all of the individuals in a sample of those who have the disease and in another sample of those who do not have the disease. The sensitivity of the diagnostic test is estimated from the case sample and the specificity is estimated from the control sample. Another parameter which is used to assess the performance of a BDT is the weighted kappa coefficient. The weighted kappa coefficient depends on the sensitivity and specificity of the diagnostic test, on the disease prevalence and on the weighting index. In this article, confidence intervals are studied for the weighted kappa coefficient subject to a case–control design and a method is proposed to calculate the sample sizes to estimate this parameter. The results obtained were applied to a real example. 相似文献

9.

Dependence Between Two Diagnostic Tests with Copula Function Approach: A Simulation Study

José Rafael Tovar Jorge Alberto Achcar 《统计学通讯:模拟与计算》2013,42(2):454-475

The study of the dependence between two medical diagnostic tests is an important issue in health research since it can modify the diagnosis and, therefore, the decision regarding a therapeutic treatment for an individual. In many practical situations, the diagnostic procedure includes the use of two tests, with outcomes on a continuous scale. For final classification, usually there is an additional “gold standard” or reference test. Considering binary test responses, we usually assume independence between tests or a joint binary structure for dependence. In this article, we introduce a simulation study assuming two dependent dichotomized tests using two copula function dependence structures in the presence or absence of verification bias. We compare the test parameter estimators obtained under copula structure dependence with those obtained assuming binary dependence or assuming independent tests. 相似文献

10.

A fast Monte Carlo expectation–maximization algorithm for estimation in latent class model analysis with an application to assess diagnostic accuracy for cervical neoplasia in women with atypical glandular cells

Le Kang Kathleen Darcy James Kauderer Shu-Yuan Liao 《Journal of applied statistics》2013,40(12):2699-2719

In this article, we use a latent class model (LCM) with prevalence modeled as a function of covariates to assess diagnostic test accuracy in situations where the true disease status is not observed, but observations on three or more conditionally independent diagnostic tests are available. A fast Monte Carlo expectation–maximization (MCEM) algorithm with binary (disease) diagnostic data is implemented to estimate parameters of interest; namely, sensitivity, specificity, and prevalence of the disease as a function of covariates. To obtain standard errors for confidence interval construction of estimated parameters, the missing information principle is applied to adjust information matrix estimates. We compare the adjusted information matrix-based standard error estimates with the bootstrap standard error estimates both obtained using the fast MCEM algorithm through an extensive Monte Carlo study. Simulation demonstrates that the adjusted information matrix approach estimates the standard error similarly with the bootstrap methods under certain scenarios. The bootstrap percentile intervals have satisfactory coverage probabilities. We then apply the LCM analysis to a real data set of 122 subjects from a Gynecologic Oncology Group study of significant cervical lesion diagnosis in women with atypical glandular cells of undetermined significance to compare the diagnostic accuracy of a histology-based evaluation, a carbonic anhydrase-IX biomarker-based test and a human papillomavirus DNA test. 相似文献

11.

Receiver operating characteristic surfaces in the presence of verification bias

Yueh-Yun Chi Xiao-Hua Zhou 《Journal of the Royal Statistical Society. Series C, Applied statistics》2008,57(1):1-23

Summary. In diagnostic medicine, the receiver operating characteristic (ROC) surface is one of the established tools for assessing the accuracy of a diagnostic test in discriminating three disease states, and the volume under the ROC surface has served as a summary index for diagnostic accuracy. In practice, the selection for definitive disease examination may be based on initial test measurements and induces verification bias in the assessment. We propose a non-parametric likelihood-based approach to construct the empirical ROC surface in the presence of differential verification, and to estimate the volume under the ROC surface. Estimators of the standard deviation are derived by both the Fisher information and the jackknife method, and their relative accuracy is evaluated in an extensive simulation study. The methodology is further extended to incorporate discrete baseline covariates in the selection process, and to compare the accuracy of a pair of diagnostic tests. We apply the proposed method to compare the diagnostic accuracy between mini-mental state examination and clinical evaluation of dementia, in discriminating between three disease states of Alzheimer's disease. 相似文献

12.

Assessing accuracy of a continuous screening test in the presence of verification bias 总被引：1，自引：1，他引：0

Todd A. Alonzo Margaret Sullivan Pepe 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(1):173-190

Summary. In studies to assess the accuracy of a screening test, often definitive disease assessment is too invasive or expensive to be ascertained on all the study subjects. Although it may be more ethical or cost effective to ascertain the true disease status with a higher rate in study subjects where the screening test or additional information is suggestive of disease, estimates of accuracy can be biased in a study with such a design. This bias is known as verification bias. Verification bias correction methods that accommodate screening tests with binary or ordinal responses have been developed; however, no verification bias correction methods exist for tests with continuous results. We propose and compare imputation and reweighting bias-corrected estimators of true and false positive rates, receiver operating characteristic curves and area under the receiver operating characteristic curve for continuous tests. Distribution theory and simulation studies are used to compare the proposed estimators with respect to bias, relative efficiency and robustness to model misspecification. The bias correction estimators proposed are applied to data from a study of screening tests for neonatal hearing loss. 相似文献

13.

An application of multinomial logistic regression to estimating performance of a multiple-screening test with incomplete verification

Chris J. Lloyd Donald J. Frommer 《Journal of the Royal Statistical Society. Series C, Applied statistics》2008,57(1):89-102

Summary. The paper describes a method of estimating the performance of a multiple-screening test where those who test negatively do not have their true disease status determined. The methodology is motivated by a data set on 49927 subjects who were given K =6 binary tests for bowel cancer. A complicating factor is that individuals may have polyps in the bowel, a condition that the screening test is not designed to detect but which may be worth diagnosing. The methodology is based on a multinomial logit model for Pr( S | R ₆), the probability distribution of patient status S (healthy, polyps or diseased) conditional on the results R ₆ from six binary tests. An advantage of the methodology described is that the modelling is data driven. In particular, we require no assumptions about correlation within subjects, the relative sensitivity of the K tests or the conditional independence of the tests. The model leads to simple estimates of the trade-off between different errors as the number of tests is varied, presented graphically by using receiver operating characteristic curves. Finally, the model allows us to estimate better protocols for assigning subjects to the disease group, as well as the gains in accuracy from these protocols. 相似文献

14.

Adjusting ROC curves for covariates in the presence of verification bias

Ronen Fluss Benjamin ReiserDavid Faraggi 《Journal of statistical planning and inference》2012,142(1):1-11

The ROC (receiver operating characteristic) curve is frequently used for describing effectiveness of a diagnostic marker or test. Classical estimation of the ROC curve uses independent identically distributed samples taken randomly from the healthy and diseased populations. Frequently not all subjects undergo a definitive gold standard assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased (verification bias). In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve adjusted for covariates (ROC regression) under verification bias. We develop the estimator's asymptotic distribution and examine its finite sample size properties via a simulation study. We apply this procedure to fingerstick postprandial blood glucose measurement data adjusting for age. 相似文献

15.

Accounting for Response Misclassification and Covariate Measurement Error Using a Random Effects Logit Model

Surupa Roy 《统计学通讯:模拟与计算》2013,42(9):1623-1636

Often in longitudinal data arising out of epidemiologic studies, measurement error in covariates and/or classification errors in binary responses may be present. The goal of the present work is to develop a random effects logistic regression model that corrects for the classification errors in binary responses and/or measurement error in covariates. The analysis is carried out under a Bayesian set up. Simulation study reveals the effect of ignoring measurement error and/or classification errors on the estimates of the regression coefficients. 相似文献

16.

Statistical inference of agreement coefficient between two raters with binary outcomes

Tetsuji Ohyama 《统计学通讯:理论与方法》2020,49(10):2529-2539

Abstract

Scott’s pi and Cohen’s kappa are widely used for assessing the degree of agreement between two raters with binary outcomes. However, many authors have pointed out its paradoxical behavior, that comes from the dependence on the prevalence of a trait under study. To overcome the limitation, Gwet [Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology 61(1):29–48] proposed an alternative and more stable agreement coefficient referred to as the AC₁. In this article, we discuss a likelihood-based inference of the AC₁ in the case of two raters with binary outcomes. Construction of confidence intervals is mainly discussed. In addition, hypothesis testing, and sample size estimation are also presented. 相似文献

17.

Verification bias on sensitivity and specificity measurements in diagnostic medicine: a comparison of some approaches used for correction

İlker Ünal H. Refik Burgut 《Journal of applied statistics》2014,41(5):1091-1104

Verification bias may occur when the test results of not all subjects are verified by using a gold standard. The correction for this bias can be made using different approaches depending on whether missing gold standard test results are random or not. Some of these approaches with binary test and gold standard results include the correction method by Begg and Greenes, lower and upper limits for diagnostic measurements by Zhou, logistic regression method, multiple imputation method, and neural networks. In this study, all these approaches are compared by employing a real and simulated data under different conditions. 相似文献

18.

A bootstrap method for comparing correlated kappa coefficients

《Journal of Statistical Computation and Simulation》2012,82(11):1009-1015

Cohen's kappa coefficient is traditionally used to quantify the degree of agreement between two raters on a nominal scale. Correlated kappas occur in many settings (e.g., repeated agreement by raters on the same individuals, concordance between diagnostic tests and a gold standard) and often need to be compared. While different techniques are now available to model correlated κ coefficients, they are generally not easy to implement in practice. The present paper describes a simple alternative method based on the bootstrap for comparing correlated kappa coefficients. The method is illustrated by examples and its type I error studied using simulations. The method is also compared with the generalized estimating equations of the second order and the weighted least-squares methods. 相似文献

19.

Dynamic Analysis of Recurrent Event Data with Missing Observations, with Application to Infant Diarrhoea in Brazil

ØRNULF BORGAN ROSEMEIRE L. FIACCONE ROBIN HENDERSON MAURICIO L. BARRETO 《Scandinavian Journal of Statistics》2007,34(1):53-69

Abstract. This paper examines and applies methods for modelling longitudinal binary data subject to both intermittent missingness and dropout. The paper is based around the analysis of data from a study into the health impact of a sanitation programme carried out in Salvador, Brazil. Our objective was to investigate risk factors associated with incidence and prevalence of diarrhoea in children aged up to 3 years old. In total, 926 children were followed up at home twice a week from October 2000 to January 2002 and for each child daily occurrence of diarrhoea was recorded. A challenging factor in analysing these data is the presence of between-subject heterogeneity not explained by known risk factors, combined with significant loss of observed data through either intermittent missingness (average of 78 days per child) or dropout (21% of children). We discuss modelling strategies and show the advantages of taking an event history approach with an additive discrete time regression model. 相似文献

20.

On population‐based measures of agreement for binary classifications

Kerrie P. Nelson Don Edwards 《Revue canadienne de statistique》2008,36(3):411-426

The authors describe a model‐based kappa statistic for binary classifications which is interpretable in the same manner as Scott's pi and Cohen's kappa, yet does not suffer from the same flaws. They compare this statistic with the data‐driven and population‐based forms of Scott's pi in a population‐based setting where many raters and subjects are involved, and inference regarding the underlying diagnostic procedure is of interest. The authors show that Cohen's kappa and Scott's pi seriously underestimate agreement between experts classifying subjects for a rare disease; in contrast, the new statistic is robust to changes in prevalence. The performance of the three statistics is illustrated with simulations and prostate cancer data. 相似文献