期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bounding sample size projections for the area under a ROC curve

Jeffrey D. Blume 《Journal of statistical planning and inference》2009

Studies of diagnostic tests are often designed with the goal of estimating the area under the receiver operating characteristic curve (AUC) because the AUC is a natural summary of a test's overall diagnostic ability. However, sample size projections dealing with AUCs are very sensitive to assumptions about the variance of the empirical AUC estimator, which depends on two correlation parameters. While these correlation parameters can be estimated from the available data, in practice it is hard to find reliable estimates before the study is conducted. Here we derive achievable bounds on the projected sample size that are free of these two correlation parameters. The lower bound is the smallest sample size that would yield the desired level of precision for some model, while the upper bound is the smallest sample size that would yield the desired level of precision for all models. These bounds are important reference points when designing a single or multi-arm study; they are the absolute minimum and maximum sample size that would ever be required. When the study design includes multiple readers or interpreters of the test, we derive bounds pertaining to the average reader AUC and the ‘pooled’ or overall AUC for the population of readers. These upper bounds for multireader studies are not too conservative when several readers are involved. 相似文献

2.

A fast Monte Carlo expectation–maximization algorithm for estimation in latent class model analysis with an application to assess diagnostic accuracy for cervical neoplasia in women with atypical glandular cells

Le Kang Kathleen Darcy James Kauderer Shu-Yuan Liao 《Journal of applied statistics》2013,40(12):2699-2719

In this article, we use a latent class model (LCM) with prevalence modeled as a function of covariates to assess diagnostic test accuracy in situations where the true disease status is not observed, but observations on three or more conditionally independent diagnostic tests are available. A fast Monte Carlo expectation–maximization (MCEM) algorithm with binary (disease) diagnostic data is implemented to estimate parameters of interest; namely, sensitivity, specificity, and prevalence of the disease as a function of covariates. To obtain standard errors for confidence interval construction of estimated parameters, the missing information principle is applied to adjust information matrix estimates. We compare the adjusted information matrix-based standard error estimates with the bootstrap standard error estimates both obtained using the fast MCEM algorithm through an extensive Monte Carlo study. Simulation demonstrates that the adjusted information matrix approach estimates the standard error similarly with the bootstrap methods under certain scenarios. The bootstrap percentile intervals have satisfactory coverage probabilities. We then apply the LCM analysis to a real data set of 122 subjects from a Gynecologic Oncology Group study of significant cervical lesion diagnosis in women with atypical glandular cells of undetermined significance to compare the diagnostic accuracy of a histology-based evaluation, a carbonic anhydrase-IX biomarker-based test and a human papillomavirus DNA test. 相似文献

3.

Using the singly truncated normal distribution to analyze satellite data

Douglas J. DePriest 《统计学通讯:理论与方法》2013,42(3):263-272

This paper proposes the singly truncated normal distribution as a model for estimating radiance measurements from satellite-borne infrared sensors. These measurements are made in order to estimate sea surface temperatures which can be related to radiances. Maximum likelihood estimation is used to provide estimates for the unknown parameters. In particular, a procedure is described for estimating clear radiances in the presence of clouds and the Kolmogorov-Smirnov statistic is used to test goodness-of-fit of the measurements to the singly truncated normal distribution. Tables of quantile values of the Kolmogorov-Smirnov statistic for several values of the truncation point are generated from Monie Carlo experiment Mnally a numerical emample using satetic data is presented to illustrate the application of the procedures. 相似文献

4.

A unified approach to estimating association measures via a joint generalized linear model for paired binary data

Yinsheng Qu Ming Tan Lisa Rybicki 《统计学通讯:理论与方法》2013,42(1):143-156

Paired binary data arise frequently in biomedical studies with unique features of their own. For instance, in clinical studies involving pairs such as ears, eyes etc., often both the intrapair association parameter and the event probability are of interest. In addition, we may be interested in the dependence of the association parameter on certain covariates as well. Although various methods have been proposed to model paired binary data, this paper proposes a unified approach for estimating various intrapair measures under a generalized linear model with simultaneous maximum likelihood estimates of the marginal probabilities and the intrapair association. The methods are illustrated with a twin morbidity study. 相似文献

5.

Semiparametric estimation of employment duration models

Joel L. Horowitz George R. Neumann 《Econometric Reviews》2013,32(1):5-40

There are a variety of economic areas, such as studies of employment duration and of the durability of capital goods, in which data on important variables typically are censored. The standard techinques for estimating a model from censored data require the distributions of unobservable random components of the model to be specified a priori up to a finite set of parameters, and misspecification of these distributions usually leads to inconsistent parameter estimates. However, economic theory rarely gives guidance about distributions and the standard estimation techniques do not provide convenient methods for identifying distributions from censored data. Recently, several distribution-free or semiparametric methods for estimating censored regression models have been developed. This paper presents the results of using two such methods to estimate a model of employment duration. The paper reports the operating characteristics of the semiparametric estimators and compares the semiparametric estimates with those obtained from a standard parametric model. 相似文献

6.

Nonparametric methods for the estimation of imputation uncertainty

Akbar Heydarbeygie Nima Ahmadi 《Journal of applied statistics》2013,40(3):693-698

It is cleared in recent researches that the raising of missing values in datasets is inevitable. Imputation of missing data is one of the several methods which have been introduced to overcome this issue. Imputation techniques are trying to answer the case of missing data by covering missing values with reasonable estimates permanently. There are a lot of benefits for these procedures rather than their drawbacks. The operation of these methods has not been clarified, which means that they provide mistrust among analytical results. One approach to evaluate the outcomes of the imputation process is estimating uncertainty in the imputed data. Nonparametric methods are appropriate to estimating the uncertainty when data are not followed by any particular distribution. This paper deals with a nonparametric method for estimation and testing the significance of the imputation uncertainty, which is based on Wilcoxon test statistic, and which could be employed for estimating the precision of the imputed values created by imputation methods. This proposed procedure could be employed to judge the possibility of the imputation process for datasets, and to evaluate the influence of proper imputation methods when they are utilized to the same dataset. This proposed approach has been compared with other nonparametric resampling methods, including bootstrap and jackknife to estimate uncertainty in the imputed data under the Bayesian bootstrap imputation method. The ideas supporting the proposed method are clarified in detail, and a simulation study, which indicates how the approach has been employed in practical situations, is illustrated. 相似文献

7.

Threshold knot selection for large-scale spatial models with applications to the Deepwater Horizon disaster

Casey M. Jelsema Richard K. Kwok Shyamal D. Peddada 《Journal of Statistical Computation and Simulation》2019,89(11):2121-2137

Large spatial datasets are typically modelled through a small set of knot locations; often these locations are specified by the investigator by arbitrary criteria. Existing methods of estimating the locations of knots assume their number is known a priori, or are otherwise computationally intensive. We develop a computationally efficient method of estimating both the location and number of knots for spatial mixed effects models. Our proposed algorithm, Threshold Knot Selection (TKS), estimates knot locations by identifying clusters of large residuals and placing a knot in the centroid of those clusters. We conduct a simulation study showing TKS in relation to several comparable methods of estimating knot locations. Our case study utilizes data of particulate matter concentrations collected during the course of the response and clean-up effort from the 2010 Deepwater Horizon oil spill in the Gulf of Mexico. 相似文献

8.

Extending the Mann–Whitney–Wilcoxon rank sum test to longitudinal regression analysis

R. Chen T. Chen N. Lu H. Zhang P. Wu C. Feng 《Journal of applied statistics》2014,41(12):2658-2675

Outliers are commonly observed in psychosocial research, generally resulting in biased estimates when comparing group differences using popular mean-based models such as the analysis of variance model. Rank-based methods such as the popular Mann–Whitney–Wilcoxon (MWW) rank sum test are more effective to address such outliers. However, available methods for inference are limited to cross-sectional data and cannot be applied to longitudinal studies under missing data. In this paper, we propose a generalized MWW test for comparing multiple groups with covariates within a longitudinal data setting, by utilizing the functional response models. Inference is based on a class of U-statistics-based weighted generalized estimating equations, providing consistent and asymptotically normal estimates not only under complete but missing data as well. The proposed approach is illustrated with both real and simulated study data. 相似文献

9.

Detection and Localization in Test Accuracy: A Bayesian Perspective

Lyle D. Broemeling 《统计学通讯:理论与方法》2013,42(8):1555-1564

In assessing the area under the ROC curve for the accuracy of a diagnostic test, it is imperative to detect and locate multiple abnormalities per image. This approach takes that into account by adopting a statistical model that allows for correlation between the reader scores of several regions of interest (ROI).

The ROI method of partitioning the image is taken. The readers give a score to each ROI in the image and the statistical model takes into account the correlation between the scores of the ROI's of an image in estimating test accuracy. The test accuracy is given by Pr[Y > Z] + (1/2)Pr[Y = Z], where Y is an ordinal diagnostic measurement of an affected ROI, and Z is the diagnostic measurement of an unaffected ROI. This way of measuring test accuracy is equivalent to the area under the ROC curve. The parameters are the parameters of a multinomial distribution, then based on the multinomial distribution, a Bayesian method of inference is adopted for estimating the test accuracy.

Using a multinomial model for the test results, a Bayesian method based on the predictive distribution of future diagnostic scores is employed to find the test accuracy. By resampling from the posterior distribution of the model parameters, samples from the posterior distribution of test accuracy are also generated. Using these samples, the posterior mean, standard deviation, and credible intervals are calculated in order to estimate the area under the ROC curve. This approach is illustrated by estimating the area under the ROC curve for a study of the diagnostic accuracy of magnetic resonance angiography for diagnosis of arterial atherosclerotic stenosis. A generalization to multiple readers and/or modalities is proposed.

A Bayesian way to estimate test accuracy is easy to perform with standard software packages and has the advantage of employing the efficient inclusion of information from prior related imaging studies. 相似文献

10.

Semiparametric estimation of treatment effect with density ratio model

Cunjie Lin Yong Zhou 《统计学通讯:理论与方法》2018,47(14):3338-3359

In this study we propose a unified semiparametric approach to estimate various indices of treatment effect under the density ratio model, which connects two density functions by an exponential tilt. For each index, we construct two estimating functions based on the model and apply the generalized method of moments to improve the estimates. The estimating functions are allowed to be non smooth with respect to parameters and hence make the proposed method more flexible. We establish the asymptotic properties of the proposed estimators and illustrate the application with several simulations and two real data sets. 相似文献

11.

A step-by-step algorithm for combining diagnostic tests

Luis Mariano Esteban Gerardo Sanz Angel Borque 《Journal of applied statistics》2011,38(5):899-911

Combining data of several tests or markers for the classification of patients according to their health status for assigning better treatments is a major issue in the study of diseases such as cancer. In order to tackle this problem, several approaches have been proposed in the literature. In this paper, a step-by-step algorithm for estimating the parameters of a linear classifier that combines several measures is considered. The optimization criterion is to maximize the area under the receiver operating characteristic curve. The algorithm is applied to different simulated data sets and its performance is evaluated. Finally, the method is illustrated with a prostate cancer staging database. 相似文献

12.

Regression-based estimation of the false negative fraction when multiple negatives are unverified 总被引：1，自引：1，他引：0

Chris J. Lloyd Donald J. Frommer 《Journal of the Royal Statistical Society. Series C, Applied statistics》2004,53(4):619-631

Summary. The paper describes a method of estimating the false negative fraction of a multiple-screening test when individuals who test negatively on all K tests do not have their true disease status verified. The method proposed makes no explicit assumptions about the underlying heterogeneity of the population or about serial correlation of test results within an individual. Rather, it is based on estimating false negative fractions conditionally on observed diagnostic histories and extrapolating the observed patterns in these empirical frequencies by using logistic regression. The method is illustrated on, and motivated by, data on a multiple-screening test for bowel cancer. 相似文献

13.

Goodness‐of‐Fit based on Downsampling with Applications to Linear Drift Diffusions

JULIE L. FORMAN BO MARKUSSEN HELLE SØRENSEN 《Scandinavian Journal of Statistics》2011,38(2):288-310

Abstract. A goodness‐of‐fit test for continuous‐time models is developed that examines if the parameter estimates are consistent with another for different sampling frequencies. The test compares parameter estimates obtained from estimating functions for downsamples of the data. We prove asymptotic results for stationary and ergodic processes, and apply the downsampling test to linear drift diffusions. Simulations indicate that the test is quite powerful in detecting non‐Markovian deviations from the linear drift diffusions. 相似文献

14.

Semiparametric transformation models for multivariate panel count data with dependent observation process

Li N Park DH Sun J Kim K 《Revue canadienne de statistique》2011,39(3):458-474

This article discusses regression analysis of multivariate panel count data in which the observation process may contain relevant information about or be related to the underlying recurrent event processes of interest. Such data occur if a recurrent event study involves several related types of recurrent events and the observation scheme or process may be subject-specific. For the problem, a class of semiparametric transformation models is presented, which provides a great flexibility for modelling the effects of covariates on the recurrent event processes. For estimation of regression parameters, an estimating equation-based inference procedure is developed and the asymptotic properties of the resulting estimates are established. Also the proposed approach is evaluated by simulation studies and applied to the data arising from a skin cancer chemoprevention trial. 相似文献

15.

Robust estimation of variance components

Daniel Gervini Victor J. Yohai 《Revue canadienne de statistique》1998,26(3):419-430

New robust estimates for variance components are introduced. Two simple models are considered: the balanced one-way classification model with a random factor and the balanced mixed model with one random factor and one fixed factor. However, the method of estimation proposed can be extended to more complex models. The new method of estimation we propose is based on the relationship between the variance components and the coefficients of the least-mean-squared-error predictor between two observations of the same group. This relationship enables us to transform the problem of estimating the variance components into the problem of estimating the coefficients of a simple linear regression model. The variance-component estimators derived from the least-squares regression estimates are shown to coincide with the maximum-likelihood estimates. Robust estimates of the variance components can be obtained by replacing the least-squares estimates by robust regression estimates. In particular, a Monte Carlo study shows that for outlier-contaminated normal samples, the estimates of variance components derived from GM regression estimates and the derived test outperform other robust procedures. 相似文献

16.

Compare diagnostic tests using transformation-invariant smoothed ROC curves()

Tang L Du P Wu C 《Journal of statistical planning and inference》2010,140(11):3540-3551

Receiver operating characteristic (ROC) curve, plotting true positive rates against false positive rates as threshold varies, is an important tool for evaluating biomarkers in diagnostic medicine studies. By definition, ROC curve is monotone increasing from 0 to 1 and is invariant to any monotone transformation of test results. And it is often a curve with certain level of smoothness when test results from the diseased and non-diseased subjects follow continuous distributions. Most existing ROC curve estimation methods do not guarantee all of these properties. One of the exceptions is Du and Tang (2009) which applies certain monotone spline regression procedure to empirical ROC estimates. However, their method does not consider the inherent correlations between empirical ROC estimates. This makes the derivation of the asymptotic properties very difficult. In this paper we propose a penalized weighted least square estimation method, which incorporates the covariance between empirical ROC estimates as a weight matrix. The resulting estimator satisfies all the aforementioned properties, and we show that it is also consistent. Then a resampling approach is used to extend our method for comparisons of two or more diagnostic tests. Our simulations show a significantly improved performance over the existing method, especially for steep ROC curves. We then apply the proposed method to a cancer diagnostic study that compares several newly developed diagnostic biomarkers to a traditional one. 相似文献

17.

Estimation of regression parameters in generalized linear models for cluster correlated data with measurement error

B.C. Sutradhar J.N.K. Rao 《Revue canadienne de statistique》1996,24(2):177-192

Liang and Zeger (1986) introduced a class of estimating equations that gives consistent estimates of regression parameters and of their asymptotic variances in the class of generalized linear models for cluster correlated data. When the independent variables or covariates in such models are subject to measurement errors, the parameter estimates obtained from these estimating equations are no longer consistent. To correct for the effect of measurement errors, an estimator with smaller asymptotic bias is constructed along the lines of Stefanski (1985), assuming that the measurement error variance is either known or estimable. The asymptotic distribution of the bias-corrected estimator and a consistent estimator of its asymptotic variance are also given. The special case of a binary logistic regression model is studied in detail. For this case, methods based on conditional scores and quasilikelihood are also extended to cluster correlated data. Results of a small simulation study on the performance of the proposed estimators and associated tests of hypotheses are reported. 相似文献

18.

Estimation in an empirical bayes model for longitudinal and cross‐sectionally clustered binary data

Andreas I. Sashegyi K. Stephen Brown Patrick J. Farrell 《Revue canadienne de statistique》2000,28(1):45-63

Some studies generate data that can be grouped into clusters in more than one way. Consider for instance a smoking prevention study in which responses on smoking status are collected over several years in a cohort of students from a number of different schools. This yields longitudinal data, also cross‐sectionaliy clustered in schools. The authors present a model for analyzing binary data of this type, combining generalized estimating equations and estimation of random effects to address the longitudinal and cross‐sectional dependence, respectively. The estimation procedure for this model is discussed, as are the results of a simulation study used to investigate the properties of its estimates. An illustration using data from a smoking prevention trial is given. 相似文献

19.

Multivariate meta-analysis for data consortia,individual patient meta-analysis,and pooling projects

John Ritz Eugene Demidenko Donna Spiegelman 《Journal of statistical planning and inference》2008

We discuss maximum likelihood and estimating equations methods for combining results from multiple studies in pooling projects and data consortia using a meta-analysis model, when the multivariate estimates with their covariance matrices are available. The estimates to be combined are typically regression slopes, often from relative risk models in biomedical and epidemiologic applications. We generalize the existing univariate meta-analysis model and investigate the efficiency advantages of the multivariate methods, relative to the univariate ones. We generalize a popular univariate test for between-studies homogeneity to a multivariate test. The methods are applied to a pooled analysis of type of carotenoids in relation to lung cancer incidence from seven prospective studies. In these data, the expected gain in efficiency was evident, sometimes to a large extent. Finally, we study the finite sample properties of the estimators and compare the multivariate ones to their univariate counterparts. 相似文献

20.

Estimating probability of occurrence of the most likely multinomial event

《Journal of statistical planning and inference》1997,59(2):257-277

The problem of estimating the parameters of a multinomial population arises in discrete multivariate analysis. This paper deals with the problem of estimating the probability associated with the most likely multinomial event. We consider several estimates, such as the maximum likelihood estimate and its modifications, and a Bayes estimate. Certain mathematical properties of the estimates are shown. Empirical results are given, showing the relative performance of the estimates with respect to the mean squared error as the loss function. 相似文献