期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using the area under an estimated ROC curve to test the adequacy of binary predictors*

Robert P. Lieli Yu-Chin Hsu 《Journal of nonparametric statistics》2019,31(1):100-130

相似文献

2.

Bounding sample size projections for the area under a ROC curve

Jeffrey D. Blume 《Journal of statistical planning and inference》2009

Studies of diagnostic tests are often designed with the goal of estimating the area under the receiver operating characteristic curve (AUC) because the AUC is a natural summary of a test's overall diagnostic ability. However, sample size projections dealing with AUCs are very sensitive to assumptions about the variance of the empirical AUC estimator, which depends on two correlation parameters. While these correlation parameters can be estimated from the available data, in practice it is hard to find reliable estimates before the study is conducted. Here we derive achievable bounds on the projected sample size that are free of these two correlation parameters. The lower bound is the smallest sample size that would yield the desired level of precision for some model, while the upper bound is the smallest sample size that would yield the desired level of precision for all models. These bounds are important reference points when designing a single or multi-arm study; they are the absolute minimum and maximum sample size that would ever be required. When the study design includes multiple readers or interpreters of the test, we derive bounds pertaining to the average reader AUC and the ‘pooled’ or overall AUC for the population of readers. These upper bounds for multireader studies are not too conservative when several readers are involved. 相似文献

3.

Optimal nonparametric estimator of the area under ROC curve based on clustered data

Yougui Wu 《统计学通讯:理论与方法》2020,49(6):1446-1461

Abstract

In diagnostic trials, clustered data are obtained when several subunits of the same patient are observed. Intracluster correlations need to be taken into account when analyzing such clustered data. A nonparametric method has been proposed by Obuchowski (1997 Obuchowski, N. A. 1997. Nonparametric analysis of clustered ROC curve data. Biometrics 53 (2):567–78.[Crossref], [PubMed], [Web of Science ®] , [Google Scholar]) to estimate the Receiver Operating Characteristic curve area (AUC) for such clustered data. However, Obuchowski’s estimator is not efficient as it gives equal weight to all pairwise rankings within and between cluster. In this paper, we propose a more efficient nonparametric AUC estimator with two sets of optimal weights. Simulation results show that the loss of efficiency of Obuchowski’s estimator for a single AUC or the AUC difference can be substantial when there is a moderate intracluster test correlation and the cluster size is large. The efficiency gain of our weighted AUC estimator for a single AUC or the AUC difference is further illustrated using the data from a study of screening tests for neonatal hearing. 相似文献

4.

A Two-Latent-Class Model for Smoking Cessation Data with Informative Dropouts

Li Qin Lisa A. Weissfeld Changyu Shen Michele D. Levine 《统计学通讯:理论与方法》2013,42(15):2604-2619

Non ignorable missing data is a common problem in longitudinal studies. Latent class models are attractive for simplifying the modeling of missing data when the data are subject to either a monotone or intermittent missing data pattern. In our study, we propose a new two-latent-class model for categorical data with informative dropouts, dividing the observed data into two latent classes; one class in which the outcomes are deterministic and a second one in which the outcomes can be modeled using logistic regression. In the model, the latent classes connect the longitudinal responses and the missingness process under the assumption of conditional independence. Parameters are estimated by the method of maximum likelihood estimation based on the above assumptions and the tetrachoric correlation between responses within the same subject. We compare the proposed method with the shared parameter model and the weighted GEE model using the areas under the ROC curves in the simulations and the application to the smoking cessation data set. The simulation results indicate that the proposed two-latent-class model performs well under different missing procedures. The application results show that our proposed method is better than the shared parameter model and the weighted GEE model. 相似文献

5.

Regularized receiver operating characteristic-based logistic regression for grouped variable selection with composite criterion

Yang Li Chenqun Yu Yichen Qin Limin Wang Jiaxu Chen Danhui Yi 《Journal of Statistical Computation and Simulation》2015,85(13):2582-2595

It is well known that statistical classifiers trained from imbalanced data lead to low true positive rates and select inconsistent significant variables. In this article, an improved method is proposed to enhance the classification accuracy for the minority class by differentiating misclassification cost for each group. The overall error rate is replaced by an alternative composite criterion. Furthermore, we propose an approach to estimate the tuning parameter, the composite criterion, and the cut-point simultaneously. Simulations show that the proposed method achieves a high true positive rate on prediction and a good performance on variable selection for both continuous and categorical predictors, even with highly imbalanced data. An illustrative example of the analysis of the suboptimal health state data in traditional Chinese medicine is discussed to show the reasonable application of the proposed method. 相似文献

6.

Estimation of the area under ROC curve with censored data

Qihua Wang Lili Yao Peng Lai 《Journal of statistical planning and inference》2009

The area under the receiver operating characteristic curve is the most commonly used summary measure of diagnostic accuracy for a continuous-scale diagnostic test. In this paper, we develop methods to estimate the area under the curve (AUC) with censored data. Based on two different integration representations of this parameter, two nonparametric estimators are defined by the “plug in” method. Both the proposed estimators are shown to be asymptotically normal based on counting process and martingale theory. A simulation study is conducted to evaluate the performances of the proposed estimators. 相似文献

7.

Variable Selection for Partially Linear Models with Randomly Censored Data

Yiping Yang Liugen Xue Weihu Cheng 《统计学通讯:模拟与计算》2013,42(8):1577-1589

This article proposes a variable selection procedure for partially linear models with right-censored data via penalized least squares. We apply the SCAD penalty to select significant variables and estimate unknown parameters simultaneously. The sampling properties for the proposed procedure are investigated. The rate of convergence and the asymptotic normality of the proposed estimators are established. Furthermore, the SCAD-penalized estimators of the nonzero coefficients are shown to have the asymptotic oracle property. In addition, an iterative algorithm is proposed to find the solution of the penalized least squares. Simulation studies are conducted to examine the finite sample performance of the proposed method. 相似文献

8.

Smooth-Threshold GEE Variable Selection in High-Dimensional Partially Linear Models with Longitudinal Data

Ruiqin Tian Liugen Xue 《统计学通讯:模拟与计算》2015,44(7):1720-1734

We consider the problem of variable selection in high-dimensional partially linear models with longitudinal data. A variable selection procedure is proposed based on the smooth-threshold generalized estimating equation (SGEE). The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero, and simultaneously estimates the nonzero regression coefficients by solving the SGEE. We establish the asymptotic properties in a high-dimensional framework where the number of covariates p_n increases as the number of clusters n increases. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure. 相似文献

9.

Group Variable Selection with Oracle Property by Weight-Fused Adaptive Elastic Net Model for Strongly Correlated Data

Guang-Hui Fu Wen-Ming Zhang Lin Dai Ying-Zi Fu 《统计学通讯:模拟与计算》2013,42(10):2468-2481

This paper is the generalization of weight-fused elastic net (Fu and Xu, 2012 Fu, G., Xu, Q. (2012). Grouping variable selection by weight fused elastic net for multi-collinear data. Communications in Statistics-Simulation and Computation 41(2):205–221.[Taylor & Francis Online], [Web of Science ®] , [Google Scholar]), which performs group variable selection by combining weight-fused LASSO(wfLasso) and elastic net (Zou and Hastie, 2005 Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301–320.[Crossref], [Web of Science ®] , [Google Scholar]) penalties. In this study, the elastic net penalty is replaced by adaptive elastic net penalty (AdaEnet) (Zou and Zhang, 2009 Zou, H., Zhang, H. (2009). On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics 37(4):1733–1751.[Crossref], [PubMed], [Web of Science ®] , [Google Scholar]), and a new group variable selection algorithm with oracle property (Fan and Li, 2001 Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96(456):1348–1360.[Taylor & Francis Online], [Web of Science ®] , [Google Scholar]; Zou, 2006 Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476):1418–1429.[Taylor & Francis Online], [Web of Science ®] , [Google Scholar]) is obtained. 相似文献

10.

Modeling Longitudinal Obesity Data with Intermittent Missingness Using a New Latent Variable Model

Li Qin Lisa Weissfeld Marsha D. Marcus Michele D. Levine Feng Dai 《统计学通讯:模拟与计算》2016,45(6):2018-2031

We propose a latent variable model for informative missingness in longitudinal studies which is an extension of latent dropout class model. In our model, the value of the latent variable is affected by the missingness pattern and it is also used as a covariate in modeling the longitudinal response. So the latent variable links the longitudinal response and the missingness process. In our model, the latent variable is continuous instead of categorical and we assume that it is from a normal distribution. The EM algorithm is used to obtain the estimates of the parameter we are interested in and Gauss–Hermite quadrature is used to approximate the integration of the latent variable. The standard errors of the parameter estimates can be obtained from the bootstrap method or from the inverse of the Fisher information matrix of the final marginal likelihood. Comparisons are made to the mixed model and complete-case analysis in terms of a clinical trial dataset, which is Weight Gain Prevention among Women (WGPW) study. We use the generalized Pearson residuals to assess the fit of the proposed latent variable model. 相似文献

11.

Intrinsic Priors for Model Selection Using an Encompassing Model with Applications to Censored Failure Time Data

Kim SW Sun D 《Lifetime data analysis》2000,6(3):251-269

In Bayesian model selection or testingproblems one cannot utilize standard or default noninformativepriors, since these priors are typically improper and are definedonly up to arbitrary constants. Therefore, Bayes factors andposterior probabilities are not well defined under these noninformativepriors, making Bayesian model selection and testing problemsimpossible. We derive the intrinsic Bayes factor (IBF) of Bergerand Pericchi (1996a, 1996b) for the commonly used models in reliabilityand survival analysis using an encompassing model. We also deriveproper intrinsic priors for these models, whose Bayes factors are asymptoticallyequivalent to the respective IBFs. We demonstrate our resultsin three examples. 相似文献

12.

Empirical Likelihood-Based Confidence Intervals for the Sensitivity of a Continuous-Scale Diagnostic Test with Missing Data

Binhuan Wang 《统计学通讯:理论与方法》2014,43(15):3248-3268

In a continuous-scale diagnostic test, the receiver operating characteristic (ROC) curve is useful to evaluate the range of the sensitivity at the cut-off point that yields a desired specificity. Many current studies on inference of the ROC curve focus on the complete data case. In this paper, an imputation-based profile empirical likelihood ratio for the sensitivity, which is free of bandwidth selection, is defined and shown to follow an asymptotically scaled Chi-square distribution. Two new confidence intervals are proposed for the sensitivity with missing data. Simulation studies are conducted to evaluate the finite sample performance of the proposed intervals in terms of coverage probability. A real example is used to illustrate the new methods. 相似文献

13.

On Estimating the Integrated Co-Volatility Using Noisy High-Frequency Data with Jumps

Bing-Yi Jing Cui-Xia Li 《统计学通讯:理论与方法》2013,42(21):3889-3901

In this article, we consider the estimation of covariation of two asset prices which contain jumps and microstructure noise, based on high-frequency data. We propose a realized covariance estimator, which combines pre-averaging method to remove the microstructure noise and the threshold method to reduce the jumps effect. The asymptotic properties, such as consistency and asymptotic normality, are investigated. The estimator allows very general structure of jumps, for example, infinity activity or even infinity variation. Simulation is also included to illustrate the performance of the proposed procedure. 相似文献

14.

Using BBPSO Algorithm to Estimate the Weibull Parameters with Censored Data

Fu-Kwun Wang 《统计学通讯:模拟与计算》2013,42(10):2614-2627

This article proposes the maximum likelihood estimates based on bare bones particle swarm optimization (BBPSO) algorithm for estimating the parameters of Weibull distribution with censored data, which is widely used in lifetime data analysis. This approach can produce more accuracy of the parameter estimation for the Weibull distribution. Additionally, the confidence intervals for the estimators are obtained. The simulation results show that the BB PSO algorithm outperforms the Newton–Raphson method in most cases in terms of bias, root mean square of errors, and coverage rate. Two examples are used to demonstrate the performance of the proposed approach. The results show that the maximum likelihood estimates via BBPSO algorithm perform well for estimating the Weibull parameters with censored data. 相似文献

15.

A mixed effects model for analyzing area under the curve of longitudinally measured biomarkers with missing data

Luoxi Shi Dorothy K. Hatsukami Joseph S. Koopmeiners Chap T. Le Neal L. Benowitz Eric C. Donny Xianghua Luo 《Pharmaceutical statistics》2021,20(6):1249-1264

A simple approach for analyzing longitudinally measured biomarkers is to calculate summary measures such as the area under the curve (AUC) for each individual and then compare the mean AUC between treatment groups using methods such as t test. This two-step approach is difficult to implement when there are missing data since the AUC cannot be directly calculated for individuals with missing measurements. Simple methods for dealing with missing data include the complete case analysis and imputation. A recent study showed that the estimated mean AUC difference between treatment groups based on the linear mixed model (LMM), rather than on individually calculated AUCs by simple imputation, has negligible bias under random missing assumptions and only small bias when missing is not at random. However, this model assumes the outcome to be normally distributed, which is often violated in biomarker data. In this paper, we propose to use a LMM on log-transformed biomarkers, based on which statistical inference for the ratio, rather than difference, of AUC between treatment groups is provided. The proposed method can not only handle the potential baseline imbalance in a randomized trail but also circumvent the estimation of the nuisance variance parameters in the log-normal model. The proposed model is applied to a recently completed large randomized trial studying the effect of nicotine reduction on biomarker exposure of smokers. 相似文献

16.

An Appraisal of Methods for the Analysis of Longitudinal Ordinal Response Data with Random Dropout Using a Nonhomogeneous Markov Model

Z. Rezaei Ghahroodi H. Navvabpour D. Berridge 《统计学通讯:模拟与计算》2013,42(5):1027-1048

There are many methods for analyzing longitudinal ordinal response data with random dropout. These include maximum likelihood (ML), weighted estimating equations (WEEs), and multiple imputations (MI). In this article, using a Markov model where the effect of previous response on the current response is investigated as an ordinal variable, the likelihood is partitioned to simplify the use of existing software. Simulated data, generated to present a three-period longitudinal study with random dropout, are used to compare performance of ML, WEE, and MI methods in terms of standardized bias and coverage probabilities. These estimation methods are applied to a real medical data set. 相似文献

17.

Network reliability for multipath TCP networks with a retransmission mechanism under the time constraint

Yi-Kuei Lin Chih-Li Pan Louis Cheng-Lu Yeng 《Journal of Statistical Computation and Simulation》2018,88(12):2273-2286

It is essential to reduce data latency and guarantee quality of service for modern computer networks. The emerging networking protocol, Multipath Transmission Control Protocol, can reduce data latency by transmitting data through multiple minimal paths (MPs) and ensure data integrity by the packets retransmission mechanism. The bandwidth of each edge can be considered as multi-state in computer networks because different situations, such as failures, partial failures and maintenance, exist. We evaluate network reliability for a multi-state retransmission flow network through which the data can be successfully transmitted by means of multiple MPs under the time constraint. By generating all minimal bandwidth patterns, the proposed algorithm can satisfy these requirements to calculate network reliability. An example and a practical case of the Pan-European Research and Education Network are applied to demonstrate the proposed algorithm. 相似文献