期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Effect of Verification Bias in the Naïve Estimators of Accuracy of a Binary Diagnostic Test

J. A. Roldán Nofuentes J. D. Luna del Castillo 《统计学通讯:模拟与计算》2013,42(5):959-972

The assessment of a binary diagnostic test requires a knowledge of the disease status of all the patients in the sample through the application of a gold standard. In practice, the gold standard is not always applied to all of the patients, which leads to the problem of partial verification of the disease. When the accuracy of the diagnostic test is assessed using only those patients whose disease status has been verified using the gold standard, the estimators obtained in this way, known as Naïve estimators, may be biased. In this study, we obtain the explicit expressions of the bias of the Naïve estimators of sensitivity and specificity of a binary diagnostic test. We also carry out simulation experiments in order to study the effect of the verification probabilities on the Naïve estimators of sensitivity and specificity. 相似文献

2.

Asymptotic hypothesis test to simultaneously compare the weighted kappa coefficients of multiple binary diagnostic tests in the presence of ignorable missing data

《Journal of Statistical Computation and Simulation》2012,82(2):273-289

The weighted kappa coefficient of a binary diagnostic test is a measure of the beyond-chance agreement between the diagnostic test and the gold standard, and is a measure that allows us to assess and compare the performance of binary diagnostic tests. In the presence of partial disease verification, the comparison of the weighted kappa coefficients of two or more binary diagnostic tests cannot be carried out ignoring the individuals with an unknown disease status, since the estimators obtained would be affected by verification bias. In this article, we propose a global hypothesis test based on the chi-square distribution to simultaneously compare the weighted kappa coefficients when in the presence of partial disease verification the missing data mechanism is ignorable. Simulation experiments have been carried out to study the type I error and the power of the global hypothesis test. The results have been applied to the diagnosis of coronary disease. 相似文献

3.

Comparison of the accuracy of multiple binary tests in the presence of partial disease verification

José Antonio Roldán Nofuentes Juan Dios Luna del Castillo Ana Eugenia Marín Jimenez 《Journal of statistical planning and inference》2010

In the presence of partial disease verification, the comparison of the accuracy of binary diagnostic tests cannot be carried out through the paired comparison of the diagnostic tests applying McNemar's test, since for a subsample of patients the disease status is unknown. In this study, we have deduced the maximum likelihood estimators for the sensitivities and specificities of multiple binary diagnostic tests and we have studied various joint hypothesis tests based on the chi-square distribution to compare simultaneously the accuracy of these binary diagnostic tests when for some patients in the sample the disease status is unknown. Simulation experiments were carried out to study the type I error and the power of each hypothesis test deduced. The results obtained were applied to the diagnosis of coronary stenosis. 相似文献

4.

Risk of Error and the Kappa Coefficient of a Binary Diagnostic Test in the Presence of Partial Verification

J. A. Roldán Nofuentes J. D. Luna Del Castillo 《Journal of applied statistics》2007,34(8):887-898

The accuracy of a binary diagnostic test is usually measured in terms of its sensitivity and its specificity, or through positive and negative predictive values. Another way to describe the validity of a binary diagnostic test is the risk of error and the kappa coefficient of the risk of error. The risk of error is the average loss that is caused when incorrectly classifying a non-diseased or a diseased patient, and the kappa coefficient of the risk of error is a measure of the agreement between the diagnostic test and the gold standard. In the presence of partial verification of the disease, the disease status of some patients is unknown, and therefore the evaluation of a diagnostic test cannot be carried out through the traditional method. In this paper, we have deduced the maximum likelihood estimators and variances of the risk of error and of the kappa coefficient of the risk of error in the presence of partial verification of the disease. Simulation experiments have been carried out to study the effect of the verification probabilities on the coverage of the confidence interval of the kappa coefficient. 相似文献

5.

Confidence Intervals of Weighted Kappa Coefficient of a Binary Diagnostic Test

J. A. Roldán Nofuentes J. D. Luna del Castillo M. A. Montero Alonso 《统计学通讯:模拟与计算》2013,42(8):1562-1578

Sensitivity and specificity are classic parameters to assess the performance of a binary diagnostic test. Another useful parameter to measure the performance of a binary test is the weighted kappa coefficient, which is a measure of the classificatory agreement between the binary test and the gold standard. Various confidence intervals are proposed for the weighted kappa coefficient when the binary test and the gold standard are applied to all of the patients in a random sample. The results have been applied to the diagnosis of coronary artery disease. 相似文献

6.

Dependence Between Two Diagnostic Tests with Copula Function Approach: A Simulation Study

José Rafael Tovar Jorge Alberto Achcar 《统计学通讯:模拟与计算》2013,42(2):454-475

The study of the dependence between two medical diagnostic tests is an important issue in health research since it can modify the diagnosis and, therefore, the decision regarding a therapeutic treatment for an individual. In many practical situations, the diagnostic procedure includes the use of two tests, with outcomes on a continuous scale. For final classification, usually there is an additional “gold standard” or reference test. Considering binary test responses, we usually assume independence between tests or a joint binary structure for dependence. In this article, we introduce a simulation study assuming two dependent dichotomized tests using two copula function dependence structures in the presence or absence of verification bias. We compare the test parameter estimators obtained under copula structure dependence with those obtained assuming binary dependence or assuming independent tests. 相似文献

7.

Assessing accuracy of a continuous screening test in the presence of verification bias 总被引：1，自引：1，他引：0

Todd A. Alonzo Margaret Sullivan Pepe 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(1):173-190

Summary. In studies to assess the accuracy of a screening test, often definitive disease assessment is too invasive or expensive to be ascertained on all the study subjects. Although it may be more ethical or cost effective to ascertain the true disease status with a higher rate in study subjects where the screening test or additional information is suggestive of disease, estimates of accuracy can be biased in a study with such a design. This bias is known as verification bias. Verification bias correction methods that accommodate screening tests with binary or ordinal responses have been developed; however, no verification bias correction methods exist for tests with continuous results. We propose and compare imputation and reweighting bias-corrected estimators of true and false positive rates, receiver operating characteristic curves and area under the receiver operating characteristic curve for continuous tests. Distribution theory and simulation studies are used to compare the proposed estimators with respect to bias, relative efficiency and robustness to model misspecification. The bias correction estimators proposed are applied to data from a study of screening tests for neonatal hearing loss. 相似文献

8.

Bayesian Interval Estimation for Predictive Values from Case-Control Studies

James D. Stamey Melinda M. Holt 《统计学通讯:模拟与计算》2013,42(1):101-110

Positive predictive and negative predictive values (PPV and NPV) are often used to assess the accuracy of binary diagnostic tests. Unlike sensitivity and specificity, PPV and NPV are functions of the accuracy of the test and the overall prevalence of the disease in the population. In many studies of performance of estimators of PPV and NPV the population prevalence is assumed known. We allow for uncertainty in the estimate of the population prevalence and via simulation explore the impact of deviations from the assumed value. 相似文献

9.

Asymptotic hypothesis test to compare likelihood ratios of multiple diagnostic tests in unpaired designs

Jan Luts José Antonio Roldán Nofuentes 《Journal of statistical planning and inference》2011,141(11):3578-3594

The accuracy of a binary diagnostic test is usually measured in terms of its sensitivity and its specificity. Other measures of the performance of a diagnostic test are the positive and negative likelihood ratios, which quantify the increase in knowledge about the presence of the disease through the application of a diagnostic test, and which depend on the sensitivity and specificity of the diagnostic test. In this article, we construct an asymptotic hypothesis test to simultaneously compare the positive and negative likelihood ratios of two or more diagnostic tests in unpaired designs. The hypothesis test is based on the logarithmic transformation of the likelihood ratios and on the chi-square distribution. Simulation experiments have been carried out to study the type I error and the power of the constructed hypothesis test when comparing two and three binary diagnostic tests. The method has been extended to the case of multiple multi-level diagnostic tests. 相似文献

10.

On implementation of the Gibbs sampler for estimating the accuracy of multiple diagnostic tests

Fabio Principato Angela Vullo Domenica Matranga 《Journal of applied statistics》2010,37(8):1335-1354

Implementation of the Gibbs sampler for estimating the accuracy of multiple binary diagnostic tests in one population has been investigated. This method, proposed by Joseph, Gyorkos and Coupal, makes use of a Bayesian approach and is used in the absence of a gold standard to estimate the prevalence, the sensitivity and specificity of medical diagnostic tests. The expressions that allow this method to be implemented for an arbitrary number of tests are given. By using the convergence diagnostics procedure of Raftery and Lewis, the relation between the number of iterations of Gibbs sampling and the precision of the estimated quantiles of the posterior distributions is derived. An example concerning a data set of gastro-esophageal reflux disease patients collected to evaluate the accuracy of the water siphon test compared with 24 h pH-monitoring, endoscopy and histology tests is presented. The main message that emerges from our analysis is that implementation of the Gibbs sampler to estimate the parameters of multiple binary diagnostic tests can be critical and convergence diagnostic is advised for this method. The factors which affect the convergence of the chains to the posterior distributions and those that influence the precision of their quantiles are analyzed. 相似文献

11.

Applications of the Bootstrap in ROC Analysis

《统计学通讯:模拟与计算》2012,41(6):865-877

The problem of estimating standard errors for diagnostic accuracy measures might be challenging for many complicated models. We can address such a problem by using the Bootstrap methods to blunt its technical edge with resampled empirical distributions. We consider two cases where bootstrap methods can successfully improve our knowledge of the sampling variability of the diagnostic accuracy estimators. The first application is to make inference for the area under the ROC curve resulted from a functional logistic regression model which is a sophisticated modelling device to describe the relationship between a dichotomous response and multiple covariates. We consider using this regression method to model the predictive effects of multiple independent variables on the occurrence of a disease. The accuracy measures, such as the area under the ROC curve (AUC) are developed from the functional regression. Asymptotical results for the empirical estimators are provided to facilitate inferences. The second application is to test the difference of two weighted areas under the ROC curve (WAUC) from a paired two sample study. The correlation between the two WAUC complicates the asymptotic distribution of the test statistic. We then employ the bootstrap methods to gain satisfactory inference results. Simulations and examples are supplied in this article to confirm the merits of the bootstrap methods. 相似文献

12.

Comparing accuracies of two screening tests in a two-phase study for dementia

Xiao-Hua Zhou 《Journal of the Royal Statistical Society. Series C, Applied statistics》1998,47(1):135-147

A two-phase design has been widely used in epidemiological studies of dementia. The first phase assesses a large sample with screening tests. The second, based on the screening test results and possibly on other observed patient's factors, selects a subset of the study sample for a more definitive disease verification assessment. In comparing the accuracies of two screening tests in a two-phase study of dementia, inferences are commonly made from a sample of verified cases. The omission of non-verified cases can seriously bias comparison results. To correct for this bias, we derive the maximum likelihood (ML) estimators for the accuracies of two screening tests and their corresponding correlation. The p -values and confidence intervals are computed using the asymptotic normality of the ML estimators. Our method is used to compare the accuracies of two screening tests in a two-phase epidemiological study of dementia. We found that, although the sensitivities of the new and standard screening tests in detecting a diseased subject are not different, the new screening test performs better in detecting a non-diseased subject. 相似文献

13.

A GEE approach to estimating accuracy and its confidence intervals for correlated data

Yaeji Lim 《Pharmaceutical statistics》2020,19(1):59-70

In this paper, we provide a method for constructing confidence interval for accuracy in correlated observations, where one sample of patients is being rated by two or more diagnostic tests. Confidence intervals for other measures of diagnostic tests, such as sensitivity, specificity, positive predictive value, and negative predictive value, have already been developed for clustered or correlated observations using the generalized estimating equations (GEE) method. Here, we use the GEE and delta‐method to construct confidence intervals for accuracy, the proportion of patients who are correctly classified. Simulation results verify that the estimated confidence intervals exhibit consistent/appropriate coverage rates. 相似文献

14.

Adjusting ROC curves for covariates in the presence of verification bias

Ronen Fluss Benjamin ReiserDavid Faraggi 《Journal of statistical planning and inference》2012,142(1):1-11

The ROC (receiver operating characteristic) curve is frequently used for describing effectiveness of a diagnostic marker or test. Classical estimation of the ROC curve uses independent identically distributed samples taken randomly from the healthy and diseased populations. Frequently not all subjects undergo a definitive gold standard assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased (verification bias). In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve adjusted for covariates (ROC regression) under verification bias. We develop the estimator's asymptotic distribution and examine its finite sample size properties via a simulation study. We apply this procedure to fingerstick postprandial blood glucose measurement data adjusting for age. 相似文献

15.

Receiver operating characteristic surfaces in the presence of verification bias

Yueh-Yun Chi Xiao-Hua Zhou 《Journal of the Royal Statistical Society. Series C, Applied statistics》2008,57(1):1-23

Summary. In diagnostic medicine, the receiver operating characteristic (ROC) surface is one of the established tools for assessing the accuracy of a diagnostic test in discriminating three disease states, and the volume under the ROC surface has served as a summary index for diagnostic accuracy. In practice, the selection for definitive disease examination may be based on initial test measurements and induces verification bias in the assessment. We propose a non-parametric likelihood-based approach to construct the empirical ROC surface in the presence of differential verification, and to estimate the volume under the ROC surface. Estimators of the standard deviation are derived by both the Fisher information and the jackknife method, and their relative accuracy is evaluated in an extensive simulation study. The methodology is further extended to incorporate discrete baseline covariates in the selection process, and to compare the accuracy of a pair of diagnostic tests. We apply the proposed method to compare the diagnostic accuracy between mini-mental state examination and clinical evaluation of dementia, in discriminating between three disease states of Alzheimer's disease. 相似文献

16.

A Bayesian algorithm for sample size determination for equivalence and non-inferiority test

Jie Wang James D. Stamey 《Journal of applied statistics》2010,37(10):1749-1759

Bayesian sample size estimation for equivalence and non-inferiority tests for diagnostic methods is considered. The goal of the study is to test whether a new screening test of interest is equivalent to, or not inferior to the reference test, which may or may not be a gold standard. Sample sizes are chosen by the model performance criteria of average posterior variance, length and coverage probability. In the absence of a gold standard, sample sizes are evaluated by the ratio of marginal probabilities of the two screening tests; whereas in the presence of gold standard, sample sizes are evaluated by the measures of sensitivity and specificity. 相似文献

17.

Verification bias on sensitivity and specificity measurements in diagnostic medicine: a comparison of some approaches used for correction

İlker Ünal H. Refik Burgut 《Journal of applied statistics》2014,41(5):1091-1104

Verification bias may occur when the test results of not all subjects are verified by using a gold standard. The correction for this bias can be made using different approaches depending on whether missing gold standard test results are random or not. Some of these approaches with binary test and gold standard results include the correction method by Begg and Greenes, lower and upper limits for diagnostic measurements by Zhou, logistic regression method, multiple imputation method, and neural networks. In this study, all these approaches are compared by employing a real and simulated data under different conditions. 相似文献

18.

Approximate confidence intervals for the weighted kappa coefficient of a binary diagnostic test subject to a case–control design

J. A. Roldán-Nofuentes R. M. Amro 《Journal of Statistical Computation and Simulation》2017,87(3):530-545

Case–control design to assess the accuracy of a binary diagnostic test (BDT) is very frequent in clinical practice. This design consists of applying the diagnostic test to all of the individuals in a sample of those who have the disease and in another sample of those who do not have the disease. The sensitivity of the diagnostic test is estimated from the case sample and the specificity is estimated from the control sample. Another parameter which is used to assess the performance of a BDT is the weighted kappa coefficient. The weighted kappa coefficient depends on the sensitivity and specificity of the diagnostic test, on the disease prevalence and on the weighting index. In this article, confidence intervals are studied for the weighted kappa coefficient subject to a case–control design and a method is proposed to calculate the sample sizes to estimate this parameter. The results obtained were applied to a real example. 相似文献

19.

Comparing diagnostic tests: test of hypothesis for likelihood ratios

《Journal of Statistical Computation and Simulation》2012,82(3):369-381

Likelihood ratios (LRs) are used to characterize the efficiency of diagnostic tests. In this paper, we use the classical weighted least squares (CWLS) test procedure, which was originally used for testing the homogeneity of relative risks, for comparing the LRs of two or more binary diagnostic tests. We compare the performance of this method with the relative diagnostic likelihood ratio (rDLR) method and the diagnostic likelihood ratio regression (DLRReg) approach in terms of size and power, and we observe that the performances of CWLS and rDLR are the same when used to compare two diagnostic tests, while DLRReg method has higher type I error rates and powers. We also examine the performances of the CWLS and DLRReg methods for comparing three diagnostic tests in various sample size and prevalence combinations. On the basis of Monte Carlo simulations, we conclude that all of the tests are generally conservative and have low power, especially in settings of small sample size and low prevalence. 相似文献

20.

Generalized Confidence Interval Estimation for the Difference in Paired Areas Under the ROC Curves in the Absence of a Gold Standard

Feng-chen Chang Shean-ya Yeh Hsin-neng Hsieh 《统计学通讯:模拟与计算》2013,42(9):2056-2072

Receiver operating characteristic (ROC) curves can be used to assess the accuracy of tests measured on ordinal or continuous scales. The most commonly used measure for the overall diagnostic accuracy of diagnostic tests is the area under the ROC curve (AUC). A gold standard (GS) test on the true disease status is required to estimate the AUC. However, a GS test may be too expensive or infeasible. In many medical researches, the true disease status of the subjects may remain unknown. Under the normality assumption on test results from each disease group of subjects, we propose a heuristic method of estimating confidence intervals for the difference in paired AUCs of two diagnostic tests in the absence of a GS reference. This heuristic method is a three-stage method by combining the expectation-maximization (EM) algorithm, bootstrap method, and an estimation based on asymptotic generalized pivotal quantities (GPQs) to construct generalized confidence intervals for the difference in paired AUCs in the absence of a GS. Simulation results show that the proposed interval estimation procedure yields satisfactory coverage probabilities and expected interval lengths. The numerical example using a published dataset illustrates the proposed method. 相似文献