期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Borrowing Information across Populations in Estimating Positive and Negative Predictive Values

Huang Y Fong Y Wei J Feng Z 《Journal of the Royal Statistical Society. Series C, Applied statistics》2011,60(5):633-653

A marker's capacity to predict risk of a disease depends on disease prevalence in the target population and its classification accuracy, i.e. its ability to discriminate diseased subjects from non-diseased subjects. The latter is often considered an intrinsic property of the marker; it is independent of disease prevalence and hence more likely to be similar across populations than risk prediction measures. In this paper, we are interested in evaluating the population-specific performance of a risk prediction marker in terms of positive predictive value (PPV) and negative predictive value (NPV) at given thresholds, when samples are available from the target population as well as from another population. A default strategy is to estimate PPV and NPV using samples from the target population only. However, when the marker's classification accuracy as characterized by a specific point on the receiver operating characteristics (ROC) curve is similar across populations, borrowing information across populations allows increased efficiency in estimating PPV and NPV. We develop estimators that optimally combine information across populations. We apply this methodology to a cross-sectional study where we evaluate PCA3 as a risk prediction marker for prostate cancer among subjects with or without previous negative biopsy. 相似文献

2.

A visualization method measuring the performance of biomarkers for guiding treatment decisions

下载免费PDF全文

Hui Yang Rui Tang Mike Hale Jing Huang 《Pharmaceutical statistics》2016,15(2):152-164

Biomarkers that predict efficacy and safety for a given drug therapy become increasingly important for treatment strategy and drug evaluation in personalized medicine. Methodology for appropriately identifying and validating such biomarkers is critically needed, although it is very challenging to develop, especially in trials of terminal diseases with survival endpoints. The marker‐by‐treatment predictiveness curve serves this need by visualizing the treatment effect on survival as a function of biomarker for each treatment. In this article, we propose the weighted predictiveness curve (WPC). Based on the nature of the data, it generates predictiveness curves by utilizing either parametric or nonparametric approaches. Especially for nonparametric predictiveness curves, by incorporating local assessment techniques, it requires minimum model assumptions and provides great flexibility to visualize the marker‐by‐treatment relationship. WPC can be used to compare biomarkers and identify the one with the highest potential impact. Equally important, by simultaneously viewing several treatment‐specific predictiveness curves across the biomarker range, WPC can also guide the biomarker‐based treatment regimens. Simulations representing various scenarios are employed to evaluate the performance of WPC. Application on a well‐known liver cirrhosis trial sheds new light on the data and leads to discovery of novel patterns of treatment biomarker interactions. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

3.

Estimation with Interval Censored Data and Covariates

van der Laan Mark J. Hubbard Alan 《Lifetime data analysis》1997,3(1):77-91

In biostatistical applications interest often focuses on the estimation of the distribution of time T between two consecutive events. If the initial event time is observed and the subsequent event time is only known to be larger or smaller than an observed monitoring time C, then the data conforms to the well understood singly-censored current status model, also known as interval censored data, case I. Additional covariates can be used to allow for dependent censoring and to improve estimation of the marginal distribution of T. Assuming a wrong model for the conditional distribution of T, given the covariates, will lead to an inconsistent estimator of the marginal distribution. On the other hand, the nonparametric maximum likelihood estimator of FT requires splitting up the sample in several subsamples corresponding with a particular value of the covariates, computing the NPMLE for every subsample and then taking an average. With a few continuous covariates the performance of the resulting estimator is typically miserable. In van der Laan, Robins (1996) a locally efficient one-step estimator is proposed for smooth functionals of the distribution of T, assuming nothing about the conditional distribution of T, given the covariates, but assuming a model for censoring, given the covariates. The estimators are asymptotically linear if the censoring mechanism is estimated correctly. The estimator also uses an estimator of the conditional distribution of T, given the covariates. If this estimate is consistent, then the estimator is efficient and if it is inconsistent, then the estimator is still consistent and asymptotically normal. In this paper we show that the estimators can also be used to estimate the distribution function in a locally optimal way. Moreover, we show that the proposed estimator can be used to estimate the distribution based on interval censored data (T is now known to lie between two observed points) in the presence of covariates. The resulting estimator also has a known influence curve so that asymptotic confidence intervals are directly available. In particular, one can apply our proposal to the interval censored data without covariates. In Geskus (1992) the information bound for interval censored data with two uniformly distributed monitoring times at the uniform distribution (for T has been computed. We show that the relative efficiency of our proposal w.r.t. this optimal bound equals 0.994, which is also reflected in finite sample simulations. Finally, the good practical performance of the estimator is shown in a simulation study. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

4.

A Profile Conditional Likelihood Approach for the Semiparametric Transformation Regression Model with Missing Covariates

Hua Yun Chen Roderick J. Little 《Lifetime data analysis》2001,7(3):207-224

We propose a profile conditional likelihood approach to handle missing covariates in the general semiparametric transformation regression model. The method estimates the marginal survival function by the Kaplan-Meier estimator, and then estimates the parameters of the survival model and the covariate distribution from a conditional likelihood, substituting the Kaplan-Meier estimator for the marginal survival function in the conditional likelihood. This method is simpler than full maximum likelihood approaches, and yields consistent and asymptotically normally distributed estimator of the regression parameter when censoring is independent of the covariates. The estimator demonstrates very high relative efficiency in simulations. When compared with complete-case analysis, the proposed estimator can be more efficient when the missing data are missing completely at random and can correct bias when the missing data are missing at random. The potential application of the proposed method to the generalized probit model with missing continuous covariates is also outlined. 相似文献

5.

Flexible modeling of conditional distributions using smooth mixtures of asymmetric student t densities

Feng Li Mattias Villani Robert Kohn 《Journal of statistical planning and inference》2010

A general model is proposed for flexibly estimating the density of a continuous response variable conditional on a possibly high-dimensional set of covariates. The model is a finite mixture of asymmetric student t densities with covariate-dependent mixture weights. The four parameters of the components, the mean, degrees of freedom, scale and skewness, are all modeled as functions of the covariates. Inference is Bayesian and the computation is carried out using Markov chain Monte Carlo simulation. To enable model parsimony, a variable selection prior is used in each set of covariates and among the covariates in the mixing weights. The model is used to analyze the distribution of daily stock market returns, and shown to more accurately forecast the distribution of returns than other widely used models for financial data. 相似文献

6.

ROC curve and covariates: extending induced methodology to the non-parametric framework

María Xosé Rodríguez-Álvarez Javier Roca-Pardiñas Carmen Cadarso-Suárez 《Statistics and Computing》2011,21(4):483-499

Continuous diagnostic tests are often used to discriminate between diseased and healthy populations. The receiver operating characteristic (ROC) curve is a widely used tool that provides a graphical visualisation of the effectiveness of such tests. The potential performance of the tests in terms of distinguishing diseased from healthy people may be strongly influenced by covariates, and a variety of regression methods for adjusting ROC curves has been developed. Until now, these methodologies have assumed that covariate effects have parametric forms, but in this paper we extend the induced methodology by allowing for arbitrary non-parametric effects of a continuous covariate. To this end, local polynomial kernel smoothers are used in the estimation procedure. Our method allows for covariate effect not only on the mean, but also on the variance of the diagnostic test. We also present a bootstrap-based method for testing for a significant covariate effect on the ROC curve. To illustrate the method, endocrine data were analysed with the aim of assessing the performance of anthropometry for predicting clusters of cardiovascular risk factors in an adult population in Galicia (NW Spain), duly adjusted for age. The proposed methodology has proved useful for providing age-specific thresholds for anthropometric measures in the Galician community. 相似文献

7.

Bayesian analysis for zero-or-one inflated proportion data using quantile regression

《Journal of Statistical Computation and Simulation》2012,82(17):3579-3593

In this paper, we propose the use of Bayesian quantile regression for the analysis of proportion data. We also consider the case when the data present a zero-or-one inflation using a two-part model approach. For the latter scheme, we assume that the response variable is generated by a mixed discrete–continuous distribution with a point mass at zero or one. Quantile regression is then used to explain the conditional distribution of the continuous part between zero and one, while the mixture probability is also modelled as a function of the covariates. We check the performance of these models with two simulation studies. We illustrate the method with data about the proportion of households with access to electricity in Brazil. 相似文献

8.

Local quantile regression

Vladimir Spokoiny Weining Wang Wolfgang Karl Härdle 《Journal of statistical planning and inference》2013

Quantile regression is a technique to estimate conditional quantile curves. It provides a comprehensive picture of a response contingent on explanatory variables. In a flexible modeling framework, a specific form of the conditional quantile curve is not a priori fixed. This motivates a local parametric rather than a global fixed model fitting approach. A nonparametric smoothing estimator of the conditional quantile curve requires to balance between local curvature and stochastic variability. In this paper, we suggest a local model selection technique that provides an adaptive estimator of the conditional quantile regression curve at each design point. Theoretical results claim that the proposed adaptive procedure performs as good as an oracle which would minimize the local estimation risk for the problem at hand. We illustrate the performance of the procedure by an extensive simulation study and consider a couple of applications: to tail dependence analysis for the Hong Kong stock market and to analysis of the distributions of the risk factors of temperature dynamics. 相似文献

9.

Central quantile subspace

Eliana Christou 《Statistics and Computing》2020,30(3):677-695

Quantile regression (QR) is becoming increasingly popular due to its relevance in many scientific investigations. There is a great amount of work about linear and nonlinear QR models. Specifically, nonparametric estimation of the conditional quantiles received particular attention, due to its model flexibility. However, nonparametric QR techniques are limited in the number of covariates. Dimension reduction offers a solution to this problem by considering low-dimensional smoothing without specifying any parametric or nonparametric regression relation. The existing dimension reduction techniques focus on the entire conditional distribution. We, on the other hand, turn our attention to dimension reduction techniques for conditional quantiles and introduce a new method for reducing the dimension of the predictor $$\mathbf {X}$$. The novelty of this paper is threefold. We start by considering a single index quantile regression model, which assumes that the conditional quantile depends on $$\mathbf {X}$$ through a single linear combination of the predictors, then extend to a multi-index quantile regression model, and finally, generalize the proposed methodology to any statistical functional of the conditional distribution. The performance of the methodology is demonstrated through simulation examples and real data applications. Our results suggest that this method has a good finite sample performance and often outperforms the existing methods. 相似文献

10.

Nonparametric regression estimation of conditional tails: the random covariate case

Yuri Goegebeur Armelle Guillou Antoine Schorgen 《Statistics》2013,47(4):732-755

We present families of nonparametric estimators for the conditional tail index of a Pareto-type distribution in the presence of random covariates. These families are constructed from locally weighted sums of power transformations of excesses over a high threshold. The asymptotic properties of the proposed estimators are derived under some assumptions on the conditional response distribution, the weight function and the density function of the covariates. We also introduce bias-corrected versions of the estimators for the conditional tail index, and propose in this context a consistent estimator for the second-order tail parameter. The finite sample performance of some specific examples from our classes of estimators is illustrated with a small simulation experiment. 相似文献

11.

Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

J. G. Ibrahim S. R. Lipsitz & M.-H. Chen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):173-190

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed. 相似文献

12.

Generalized accelerated failure time spatial frailty model for arbitrarily censored data

Haiming Zhou Timothy Hanson Jiajia Zhang 《Lifetime data analysis》2017,23(3):495-515

Flexible incorporation of both geographical patterning and risk effects in cancer survival models is becoming increasingly important, due in part to the recent availability of large cancer registries. Most spatial survival models stochastically order survival curves from different subpopulations. However, it is common for survival curves from two subpopulations to cross in epidemiological cancer studies and thus interpretable standard survival models can not be used without some modification. Common fixes are the inclusion of time-varying regression effects in the proportional hazards model or fully nonparametric modeling, either of which destroys any easy interpretability from the fitted model. To address this issue, we develop a generalized accelerated failure time model which allows stratification on continuous or categorical covariates, as well as providing per-variable tests for whether stratification is necessary via novel approximate Bayes factors. The model is interpretable in terms of how median survival changes and is able to capture crossing survival curves in the presence of spatial correlation. A detailed Markov chain Monte Carlo algorithm is presented for posterior inference and a freely available function frailtyGAFT is provided to fit the model in the R package spBayesSurv. We apply our approach to a subset of the prostate cancer data gathered for Louisiana by the surveillance, epidemiology, and end results program of the National Cancer Institute. 相似文献

13.

Measuring and estimating the interaction between exposures on a dichotomous outcome for observational studies

Xiaoqin Wang Weimin Ye 《Journal of applied statistics》2017,44(14):2483-2498

In observational studies for the interaction between exposures on a dichotomous outcome of a certain population, usually one parameter of a regression model is used to describe the interaction, leading to one measure of the interaction. In this article we use the conditional risk of an outcome given exposures and covariates to describe the interaction and obtain five different measures of the interaction, that is, difference between the marginal risk differences, ratio of the marginal risk ratios, ratio of the marginal odds ratios, ratio of the conditional risk ratios, and ratio of the conditional odds ratios. These measures reflect different aspects of the interaction. By using only one regression model for the conditional risk, we obtain the maximum-likelihood (ML)-based point and interval estimates of these measures, which are most efficient due to the nature of ML. We use the ML estimates of the model parameters to obtain the ML estimates of these measures. We use the approximate normal distribution of the ML estimates of the model parameters to obtain approximate non-normal distributions of the ML estimates of these measures and then confidence intervals of these measures. The method can be easily implemented and is presented via a medical example. 相似文献

14.

Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies

Austin PC 《Pharmaceutical statistics》2011,10(2):150-161

In a study comparing the effects of two treatments, the propensity score is the probability of assignment to one treatment conditional on a subject's measured baseline covariates. Propensity-score matching is increasingly being used to estimate the effects of exposures using observational data. In the most common implementation of propensity-score matching, pairs of treated and untreated subjects are formed whose propensity scores differ by at most a pre-specified amount (the caliper width). There has been a little research into the optimal caliper width. We conducted an extensive series of Monte Carlo simulations to determine the optimal caliper width for estimating differences in means (for continuous outcomes) and risk differences (for binary outcomes). When estimating differences in means or risk differences, we recommend that researchers match on the logit of the propensity score using calipers of width equal to 0.2 of the standard deviation of the logit of the propensity score. When at least some of the covariates were continuous, then either this value, or one close to it, minimized the mean square error of the resultant estimated treatment effect. It also eliminated at least 98% of the bias in the crude estimator, and it resulted in confidence intervals with approximately the correct coverage rates. Furthermore, the empirical type I error rate was approximately correct. When all of the covariates were binary, then the choice of caliper width had a much smaller impact on the performance of estimation of risk differences and differences in means. 相似文献

15.

Structured kernel quantile regression

Ja-Yong Koo Kwi Wook Park Byung Won Kim Kwang-Rae Kim 《Journal of Statistical Computation and Simulation》2013,83(1):179-190

Quantile regression can provide more useful information on the conditional distribution of a response variable given covariates while classical regression provides informations on the conditional mean alone. In this paper, we propose a structured quantile estimation methodology in a nonparametric function estimation setup. Through the functional analysis of variance decomposition, the optimization of the proposed method can be solved using a series of quadratic and linear programmings. Our method automatically selects relevant covariates by adopting a lasso-type penalty. The performance of the proposed methodology is illustrated through numerical examples on both simulated and real data. 相似文献

16.

Approximate Likelihoods for Generalized Linear Errors-in-variables Models

John J. Hanfelt & Kung-Yee Liang 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1997,59(3):627-637

When measurement error is present in covariates, it is well known that naïvely fitting a generalized linear model results in inconsistent inferences. Several methods have been proposed to adjust for measurement error without making undue distributional assumptions about the unobserved true covariates. Stefanski and Carroll focused on an unbiased estimating function rather than a likelihood approach. Their estimating function, known as the conditional score, exists for logistic regression models but has two problems: a poorly behaved Wald test and multiple solutions. They suggested a heuristic procedure to identify the best solution that works well in practice but has little theoretical support compared with maximum likelihood estimation. To help to resolve these problems, we propose a conditional quasi-likelihood to accompany the conditional score that provides an alternative to Wald's test and successfully identifies the consistent solution in large samples. 相似文献

17.

Cure rate survival models with missing covariates: a simulation study

Renata Santana Fonseca Heleno Bolfarine 《Journal of Statistical Computation and Simulation》2013,83(1):97-113

In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence. 相似文献

18.

Bayesian non-crossing quantile regression for regularly varying distributions

Salaheddine El Adlouni 《Journal of Statistical Computation and Simulation》2019,89(5):884-898

Quantile regression is a very important statistical tool for predictive modelling and risk assessment. For many applications, conditional quantile at different levels are estimated separately. Consequently the monotonicity of conditional quantiles can be violated when quantile regression curves cross each other. In this paper, we propose a new Bayesian multiple quantile regression based on heavy tailed distribution for non-crossing. We consider a linear quantile regression model for simultaneous Bayesian estimation of multiple quantiles based on a regularly varying assumptions. The numerical and competitive performance of the proposed method is illustrated by simulation. 相似文献

19.

Nonparametric estimation of regression models with mixed discrete and continuous covariates by the K-nn method

Carl Green Yu Yvette Zhang 《Econometric Reviews》2017,36(1-3):205-224

In this article we consider the problem of estimating a nonparametric conditional mean function with mixed discrete and continuous covariates by the nonparametric k-nearest-neighbor (k-nn) method. We derive the asymptotic normality result of the proposed estimator and use Monte Carlo simulations to demonstrate its finite sample performance. We also provide an illustrative empirical example of our method. 相似文献

20.

Estimating the Population Survival Function Using Additional Information Recorded Over Time: a Filter Based Approach

Torben Martinussen & Thomas H. Scheike 《Scandinavian Journal of Statistics》1998,25(4):621-635

Survival studies often collect information about covariates. If these covariates are believed to contain information about the life-times, they may be considered when estimating the underlying life-time distribution. We propose a non-parametric estimator which uses the recorded information about the covariates. Various forms of incomplete data, e.g. right-censored data, are allowed. The estimator is the conditional mean of the true empirical survival function given the observed history, and it is derived using a general filtering formula. Feng & Kurtz (1994) showed that the estimator is the Kaplan–Meier estimator in the case of right-censoring when using the observed life-times and censoring-times as the observed history. We take the same approach as Feng & Kurtz (1994) but in addition we incorporate the recorded information about the covariates in the observed history. Two models are considered and in both cases the Kaplan–Meier estimator is a special case of the estimator. In a simulation study the estimator is compared with the Kaplan–Meier estimator in small samples. 相似文献