首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.
Positive predictive and negative predictive values (PPV and NPV) are often used to assess the accuracy of binary diagnostic tests. Unlike sensitivity and specificity, PPV and NPV are functions of the accuracy of the test and the overall prevalence of the disease in the population. In many studies of performance of estimators of PPV and NPV the population prevalence is assumed known. We allow for uncertainty in the estimate of the population prevalence and via simulation explore the impact of deviations from the assumed value.  相似文献   

2.
To assess the value of a continuous marker in predicting the risk of a disease, a graphical tool called the predictiveness curve has been proposed. It characterizes the marker's predictiveness, or capacity to risk stratify the population by displaying the distribution of risk endowed by the marker. Methods for making inference about the curve and for comparing curves in a general population have been developed. However, knowledge about a marker's performance in the general population only is not enough. Since a marker's effect on the risk model and its distribution can both differ across subpopulations, its predictiveness may vary when applied to different subpopulations. Moreover, information about the predictiveness of a marker conditional on baseline covariates is valuable for individual decision making about having the marker measured or not. Therefore, to fully realize the usefulness of a risk prediction marker, it is important to study its performance conditional on covariates. In this article, we propose semiparametric methods for estimating covariate-specific predictiveness curves for a continuous marker. Unmatched and matched case-control study designs are accommodated. We illustrate application of the methodology by evaluating serum creatinine as a predictor of risk of renal artery stenosis.  相似文献   

3.
For evaluating diagnostic accuracy of inherently continuous diagnostic tests/biomarkers, sensitivity and specificity are well-known measures both of which depend on a diagnostic cut-off, which is usually estimated. Sensitivity (specificity) is the conditional probability of testing positive (negative) given the true disease status. However, a more relevant question is “what is the probability of having (not having) a disease if a test is positive (negative)?”. Such post-test probabilities are denoted as positive predictive value (PPV) and negative predictive value (NPV). The PPV and NPV at the same estimated cut-off are correlated, hence it is desirable to make the joint inference on PPV and NPV to account for such correlation. Existing inference methods for PPV and NPV focus on the individual confidence intervals and they were developed under binomial distribution assuming binary instead of continuous test results. Several approaches are proposed to estimate the joint confidence region as well as the individual confidence intervals of PPV and NPV. Simulation results indicate the proposed approaches perform well with satisfactory coverage probabilities for normal and non-normal data and, additionally, outperform existing methods with improved coverage as well as narrower confidence intervals for PPV and NPV. The Alzheimer's Disease Neuroimaging Initiative (ADNI) data set is used to illustrate the proposed approaches and compare them with the existing methods.  相似文献   

4.
Measures of sensitivity, predictive accuracy, and agreement are currently used to evaluate the efficiency of diagnostic tests reported on dichotomous scales. This paper presents a unified approach to the evaluation of diagnostic tests in terns of generalized Lnuices of sensitivity, misclassification, predictive accuracy and inaccuracy, classification agreement and prediction agreement for polytomous measurement scales. It Is sufficiently general to accommodate additional complications of study design factors such as multiple testing, known and unknown disease prevalence distributions, and multiple subpopulations defined by the cross-classification of independent factors. Estimation and hypothesis testing are developed within a general linear models approach to the analysis of categorical data from repeated measurement lesigns using weighted least squares computations. This methodology is illustrated within the context of data from a. large cotmiunity-based epidemiologic study of obstructive airways disease,Two diagnostic criteria for inpaired lung function are compared on the basis of their generalized sensitivity and classification agreement measures. The outcomes of the tests are reported on the same three-point scale (normal, questionable, impaired) and are examined within several subpopulations determined by age and sex.  相似文献   

5.
The rapid increase in the number of AIDS cases during the 1980s and the spread of the disease from the high-risk groups into the general population has created widespread concern. In particular, assessing the accuracy of the screening tests used to detect antibodies to the HIV (AIDS) virus in donated blood and determining the prevalance of the disease in the population are fundamental statistical problems. Because the prevalence of AIDS varies widely by geographic region and data on the number of infected blood donors are published regularly, Bayesian methods, which utilize prior results and update them as new data become available, are quite useful. In this paper we develop a Bayesian procedure for estimating the prevalence of a rare disease, the sensitivity and specificity of the screening tests, and the predictive value of a positive or negative screening test. We apply the procedure to data on blood donors in the United States and in Canada. Our results augment those described in Gastwirth (1987) using classical methods. Indeed, we show that the inclusion of sound prior knowledge into the statistical analysis does not yield sufficiently precise estimates of the predictive value of a positive test. Hence confirmatory testing is needed to obtain reliable estimates. The emphasis of the Bayesian predictive paradigm on prediction intervals for future data yields a valuable insight. We demonstrate that using them might have detected a decline in the specificity of the most frequently used screening test earlier than it apparently was.  相似文献   

6.
When a candidate predictive marker is available, but evidence on its predictive ability is not sufficiently reliable, all‐comers trials with marker stratification are frequently conducted. We propose a framework for planning and evaluating prospective testing strategies in confirmatory, phase III marker‐stratified clinical trials based on a natural assumption on heterogeneity of treatment effects across marker‐defined subpopulations, where weak rather than strong control is permitted for multiple population tests. For phase III marker‐stratified trials, it is expected that treatment efficacy is established in a particular patient population, possibly in a marker‐defined subpopulation, and that the marker accuracy is assessed when the marker is used to restrict the indication or labelling of the treatment to a marker‐based subpopulation, ie, assessment of the clinical validity of the marker. In this paper, we develop statistical testing strategies based on criteria that are explicitly designated to the marker assessment, including those examining treatment effects in marker‐negative patients. As existing and developed statistical testing strategies can assert treatment efficacy for either the overall patient population or the marker‐positive subpopulation, we also develop criteria for evaluating the operating characteristics of the statistical testing strategies based on the probabilities of asserting treatment efficacy across marker subpopulations. Numerical evaluations to compare the statistical testing strategies based on the developed criteria are provided.  相似文献   

7.
In many clinical applications, understanding when measurement of new markers is necessary to provide added accuracy to existing prediction tools could lead to more cost effective disease management. Many statistical tools for evaluating the incremental value (IncV) of the novel markers over the routine clinical risk factors have been developed in recent years. However, most existing literature focuses primarily on global assessment. Since the IncVs of new markers often vary across subgroups, it would be of great interest to identify subgroups for which the new markers are most/least useful in improving risk prediction. In this paper we provide novel statistical procedures for systematically identifying potential traditional-marker based subgroups in whom it might be beneficial to apply a new model with measurements of both the novel and traditional markers. We consider various conditional time-dependent accuracy parameters for censored failure time outcome to assess the subgroup-specific IncVs. We provide non-parametric kernel-based estimation procedures to calculate the proposed parameters. Simultaneous interval estimation procedures are provided to account for sampling variation and adjust for multiple testing. Simulation studies suggest that our proposed procedures work well in finite samples. The proposed procedures are applied to the Framingham Offspring Study to examine the added value of an inflammation marker, C-reactive protein, on top of the traditional Framingham risk score for predicting 10-year risk of cardiovascular disease.  相似文献   

8.
Multiple biomarkers are frequently observed or collected for detecting or understanding a disease. The research interest of this article is to extend tools of receiver operating characteristic (ROC) analysis from univariate marker setting to multivariate marker setting for evaluating predictive accuracy of biomarkers using a tree-based classification rule. Using an arbitrarily combined and-or classifier, an ROC function together with a weighted ROC function (WROC) and their conjugate counterparts are introduced for examining the performance of multivariate markers. Specific features of the ROC and WROC functions and other related statistics are discussed in comparison with those familiar properties for univariate marker. Nonparametric methods are developed for estimating the ROC and WROC functions, and area under curve and concordance probability. With emphasis on population average performance of markers, the proposed procedures and inferential results are useful for evaluating marker predictability based on multivariate marker measurements with different choices of markers, and for evaluating different and-or combinations in classifiers.  相似文献   

9.
To evaluate the clinical utility of new risk markers, a crucial step is to measure their predictive accuracy with prospective studies. However, it is often infeasible to obtain marker values for all study participants. The nested case-control (NCC) design is a useful cost-effective strategy for such settings. Under the NCC design, markers are only ascertained for cases and a fraction of controls sampled randomly from the risk sets. The outcome dependent sampling generates a complex data structure and therefore a challenge for analysis. Existing methods for analyzing NCC studies focus primarily on association measures. Here, we propose a class of non-parametric estimators for commonly used accuracy measures. We derived asymptotic expansions for accuracy estimators based on both finite population and Bernoulli sampling and established asymptotic equivalence between the two. Simulation results suggest that the proposed procedures perform well in finite samples. The new procedures were illustrated with data from the Framingham Offspring study.  相似文献   

10.
In a wide variety of biomedical and clinical research studies, sample statistics from diagnostic marker measurements are presented as a means of distinguishing between two populations, such as with and without disease. Intuitively, a larger difference between the mean values of a marker for the two populations, and a smaller spread of values within each population, should lead to more reliable classification rules based on this marker. We formalize this intuitive notion by deriving practical, new, closed-form expressions for the sensitivity and specificity of three different discriminant tests defined in terms of the sample means and standard deviations of diagnostic marker measurements. The three discriminant tests evaluated are based, respectively, on the Euclidean distance and the Mahalanobis distance between means, and a likelihood ratio analysis. Expressions for the effects of measurement error are also presented. Our final expressions assume that the diagnostic markers follow independent normal distributions for the two populations, although it will be clear that other known distributions may be similarly analyzed. We then discuss applications drawn from the medical literature, although the formalism is clearly not restricted to that application.  相似文献   

11.
Suppose that we need to classify a population of subjects into several well-defined ordered risk categories for disease prevention or management with their “baseline” risk factors/markers. In this article, we present a systematic approach to identify subjects using their conventional risk factors/markers who would benefit from a new set of risk markers for more accurate classification. Specifically for each subgroup of individuals with the same conventional risk estimate, we present inference procedures for the reclassification and the corresponding correct re-categorization rates with the new markers. We then apply these new tools to analyze the data from the Cardiovascular Health Study sponsored by the US National Heart, Lung, and Blood Institute. We used Framingham risk factors plus the information of baseline anti-hypertensive drug usage to identify adult American women who may benefit from the measurement of a new blood biomarker, CRP, for better risk classification in order to intensify prevention of coronary heart disease for the subsequent 10 years.  相似文献   

12.
Assessment of circulating CD4 count change over time in HIV-infected subjects on antiretroviral therapy (ART) is a central component of disease monitoring. The increasing number of HIV-infected subjects starting therapy and the limited capacity to support CD4 count testing within resource-limited settings have fueled interest in identifying correlates of CD4 count change such as total lymphocyte count, among others. The application of modeling techniques will be essential to this endeavor due to the typically non-linear CD4 trajectory over time and the multiple input variables necessary for capturing CD4 variability. We propose a prediction based classification approach that involves first stage modeling and subsequent classification based on clinically meaningful thresholds. This approach draws on existing analytical methods described in the receiver operating characteristic curve literature while presenting an extension for handling a continuous outcome. Application of this method to an independent test sample results in greater than 98% positive predictive value for CD4 count change. The prediction algorithm is derived based on a cohort of n = 270 HIV-1 infected individuals from the Royal Free Hospital, London who were followed for up to three years from initiation of ART. A test sample comprised of n = 72 individuals from Philadelphia and followed for a similar length of time is used for validation. Results suggest that this approach may be a useful tool for prioritizing limited laboratory resources for CD4 testing after subjects start antiretroviral therapy.  相似文献   

13.
Two‐stage clinical trial designs may be efficient in pharmacogenetics research when there is some but inconclusive evidence of effect modification by a genomic marker. Two‐stage designs allow to stop early for efficacy or futility and can offer the additional opportunity to enrich the study population to a specific patient subgroup after an interim analysis. This study compared sample size requirements for fixed parallel group, group sequential, and adaptive selection designs with equal overall power and control of the family‐wise type I error rate. The designs were evaluated across scenarios that defined the effect sizes in the marker positive and marker negative subgroups and the prevalence of marker positive patients in the overall study population. Effect sizes were chosen to reflect realistic planning scenarios, where at least some effect is present in the marker negative subgroup. In addition, scenarios were considered in which the assumed ‘true’ subgroup effects (i.e., the postulated effects) differed from those hypothesized at the planning stage. As expected, both two‐stage designs generally required fewer patients than a fixed parallel group design, and the advantage increased as the difference between subgroups increased. The adaptive selection design added little further reduction in sample size, as compared with the group sequential design, when the postulated effect sizes were equal to those hypothesized at the planning stage. However, when the postulated effects deviated strongly in favor of enrichment, the comparative advantage of the adaptive selection design increased, which precisely reflects the adaptive nature of the design. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

14.
ABSTRACT

The Concordance statistic (C-statistic) is commonly used to assess the predictive performance (discriminatory ability) of logistic regression model. Although there are several approaches for the C-statistic, their performance in quantifying the subsequent improvement in predictive accuracy due to inclusion of novel risk factors or biomarkers in the model has been extremely criticized in literature. This paper proposed a model-based concordance-type index, CK, for use with logistic regression model. The CK and its asymptotic sampling distribution is derived following Gonen and Heller's approach for Cox PH model for survival data but taking necessary modifications for use with binary data. Unlike the existing C-statistics for logistic model, it quantifies the concordance probability by taking the difference in the predicted risks between two subjects in a pair rather than ranking them and hence is able to quantify the equivalent incremental value from the new risk factor or marker. The simulation study revealed that the CK performed well when the model parameters are correctly estimated for large sample and showed greater improvement in quantifying the additional predictive value from the new risk factor or marker than the existing C-statistics. Furthermore, the illustration using three datasets supports the findings from simulation study.  相似文献   

15.
The ROC (receiver operating characteristic) curve is frequently used for describing effectiveness of a diagnostic marker or test. Classical estimation of the ROC curve uses independent identically distributed samples taken randomly from the healthy and diseased populations. Frequently not all subjects undergo a definitive gold standard assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased (verification bias). In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve adjusted for covariates (ROC regression) under verification bias. We develop the estimator's asymptotic distribution and examine its finite sample size properties via a simulation study. We apply this procedure to fingerstick postprandial blood glucose measurement data adjusting for age.  相似文献   

16.
Unmeasured confounding is a common problem in observational studies. This article presents simple formulae that can set the bounds of the confounding risk ratio under three standard populations of the exposed, unexposed, and total groups. The bounds are derived by considering the confounding risk ratio as a function of the prevalence of a covariate, and can be constructed using only information about either the exposure–confounder or the disease–confounder relationship. The formulae can be extended to the confounding odds ratio in case–control studies, and the confounding risk difference is discussed. The application of these formulae is demonstrated using an example in which estimation may suffer from bias due to population stratification. The formulae can help to provide a realistic picture of the potential impact of bias due to confounding.  相似文献   

17.
With a growing interest in using non-representative samples to train prediction models for numerous outcomes it is necessary to account for the sampling design that gives rise to the data in order to assess the generalized predictive utility of a proposed prediction rule. After learning a prediction rule based on a non-uniform sample, it is of interest to estimate the rule's error rate when applied to unobserved members of the population. Efron (1986) proposed a general class of covariance penalty inflated prediction error estimators that assume the available training data are representative of the target population for which the prediction rule is to be applied. We extend Efron's estimator to the complex sample context by incorporating Horvitz–Thompson sampling weights and show that it is consistent for the true generalization error rate when applied to the underlying superpopulation. The resulting Horvitz–Thompson–Efron estimator is equivalent to dAIC, a recent extension of Akaike's information criteria to survey sampling data, but is more widely applicable. The proposed methodology is assessed with simulations and is applied to models predicting renal function obtained from the large-scale National Health and Nutrition Examination Study survey. The Canadian Journal of Statistics 48: 204–221; 2020 © 2019 Statistical Society of Canada  相似文献   

18.
The problem is to classify an individual into one of two populations based on an observation on the individual which follows a stationary Gaussian process and the populations are two distinct time points. Plug-in likelihood ratio rules are considered using samples from the process. The distribution of associated classification statistics are derived. For the special case when the mis-classification probabilities are equal, the nature of dependence between the population distributions on the probability of correct classification is studied. Lower bounds and iterative method of evaluation of the optimal correlation between the populations are obtained.  相似文献   

19.
The use of the area under the receiver-operating characteristic, ROC, curve (AUC) as an index of diagnostic accuracy is overwhelming in fields such as biomedical science and machine learning. It seems that a larger AUC value has become synonymous with a better performance. The functional transformation of the marker values has been proposed in the specialized literature as a procedure for increasing the AUC and therefore the diagnostic accuracy. However, the classification process is based on some regions (classification subsets) which support the decision made; one subject is classified as positive if its marker is within this region and classified as negative otherwise. In this paper we study the capacity of improving the classification performance of univariate biomarkers via functional transformations and the impact of this transformation on the final classification regions based on a real-world dataset. Particularly, we consider the problem of determining the gender of a subject based on the Mode frequency of his/her voice. The shape of the cumulative distribution function of this characteristic in both the male and the female groups makes the resulting classification problem useful for illustrating the differences between having useful diagnostic rules and obtaining an optimal AUC value. Our point is that improving the AUC by means of a functional transformation can produce classification regions with no practical interpretability. We propose to improve the classification accuracy by making the selection of the classification subsets more flexible while preserving their interpretability. Besides, we provide different graphical approximations which allow us a better understanding of the classification problem.  相似文献   

20.
Developing new medical tests and identifying single biomarkers or panels of biomarkers with superior accuracy over existing classifiers promotes lifelong health of individuals and populations. Before a medical test can be routinely used in clinical practice, its accuracy within diseased and non-diseased populations must be rigorously evaluated. We introduce a method for sample size determination for studies designed to test hypotheses about medical test or biomarker sensitivity and specificity. We show how a sample size can be determined to guard against making type I and/or type II errors by calculating Bayes factors from multiple data sets simulated under null and/or alternative models. The approach can be implemented across a variety of study designs, including investigations into one test or two conditionally independent or dependent tests. We focus on a general setting that involves non-identifiable models for data when true disease status is unavailable due to the nonexistence of or undesirable side effects from a perfectly accurate (i.e. ‘gold standard’) test; special cases of the general method apply to identifiable models with or without gold-standard data. Calculation of Bayes factors is performed by incorporating prior information for model parameters (e.g. sensitivity, specificity, and disease prevalence) and augmenting the observed test-outcome data with unobserved latent data on disease status to facilitate Gibbs sampling from posterior distributions. We illustrate our methods using a thorough simulation study and an application to toxoplasmosis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号