期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Influence Function Based Variance Estimation and Missing Data Issues in Case-Cohort Studies 总被引：1，自引：0，他引：1

Mark Steven D. Katki Hormuzd 《Lifetime data analysis》2001,7(4):331-344

Recognizing that the efficiency in relative risk estimation for the Cox proportional hazards model is largely constrained by the total number of cases, Prentice (1986) proposed the case-cohort design in which covariates are measured on all cases and on a random sample of the cohort. Subsequent to Prentice, other methods of estimation and sampling have been proposed for these designs. We formalize an approach to variance estimation suggested by Barlow (1994), and derive a robust variance estimator based on the influence function. We consider the applicability of the variance estimator to all the proposed case-cohort estimators, and derive the influence function when known sampling probabilities in the estimators are replaced by observed sampling fractions. We discuss the modifications required when cases are missing covariate information. The missingness may occur by chance, and be completely at random; or may occur as part of the sampling design, and depend upon other observed covariates. We provide an adaptation of S-plus code that allows estimating influence function variances in the presence of such missing covariates. Using examples from our current case-cohort studies on esophageal and gastric cancer, we illustrate how our results our useful in solving design and analytic issues that arise in practice. 相似文献

2.

Simple Formulae for Evaluating the Potential Impact of Confounding Bias

Yasutaka Chiba 《统计学通讯:理论与方法》2013,42(23):4278-4288

Unmeasured confounding is a common problem in observational studies. This article presents simple formulae that can set the bounds of the confounding risk ratio under three standard populations of the exposed, unexposed, and total groups. The bounds are derived by considering the confounding risk ratio as a function of the prevalence of a covariate, and can be constructed using only information about either the exposure–confounder or the disease–confounder relationship. The formulae can be extended to the confounding odds ratio in case–control studies, and the confounding risk difference is discussed. The application of these formulae is demonstrated using an example in which estimation may suffer from bias due to population stratification. The formulae can help to provide a realistic picture of the potential impact of bias due to confounding. 相似文献

3.

Covariate Decomposition Methods for Longitudinal Missing‐at‐Random Data and Predictors Associated with Subject‐Specific Effects

John M. Neuhaus Charles E. McCulloch 《Australian & New Zealand Journal of Statistics》2014,56(4):331-345

Investigators often gather longitudinal data to assess changes in responses over time within subjects and to relate these changes to within‐subject changes in predictors. Missing data are common in such studies and predictors can be correlated with subject‐specific effects. Maximum likelihood methods for generalized linear mixed models provide consistent estimates when the data are ‘missing at random’ (MAR) but can produce inconsistent estimates in settings where the random effects are correlated with one of the predictors. On the other hand, conditional maximum likelihood methods (and closely related maximum likelihood methods that partition covariates into between‐ and within‐cluster components) provide consistent estimation when random effects are correlated with predictors but can produce inconsistent covariate effect estimates when data are MAR. Using theory, simulation studies, and fits to example data this paper shows that decomposition methods using complete covariate information produce consistent estimates. In some practical cases these methods, that ostensibly require complete covariate information, actually only involve the observed covariates. These results offer an easy‐to‐use approach to simultaneously protect against bias from both cluster‐level confounding and MAR missingness in assessments of change. 相似文献

4.

Simulation-Based Density Estimation for Time Series Using Covariate Data

Yin Liao John Stachurski 《商业与经济统计学杂志》2013,31(4):595-606

This article proposes a simulation-based density estimation technique for time series that exploits information found in covariate data. The method can be paired with a large range of parametric models used in time series estimation. We derive asymptotic properties of the estimator and illustrate attractive finite sample properties for a range of well-known econometric and financial applications. 相似文献

5.

Analyzing dependence in incidence of diabetes and heart problem using generalized bivariate geometric models with covariates

M. Ataharul Islam Rafiqul I. Chowdhury K. S. Sultan 《Journal of applied statistics》2017,44(16):2890-2907

For analyzing incidence data on diabetes and health problems, the bivariate geometric probability distribution is a natural choice but remained unexplored largely due to lack of models linking covariates with the probabilities of bivariate incidence of correlated outcomes. In this paper, bivariate geometric models are proposed for two correlated incidence outcomes. The extended generalized linear models are developed to take into account covariate dependence of the bivariate probabilities of correlated incidence outcomes for diabetes and heart diseases for the elderly population. The estimation and test procedures are illustrated using the Health and Retirement Study data. Two models are shown in this paper, one based on conditional-marginal approach and the other one based on the joint probability distribution with an association parameter. The joint model with association parameter appears to be a very good choice for analyzing the covariate dependence of the joint incidence of diabetes and heart diseases. Bootstrapping is performed to measure the accuracy of estimates and the results indicate very small bias. 相似文献

6.

Calibrating covariate informed product partition models

Garritt L. Page Fernando A. Quintana 《Statistics and Computing》2018,28(5):1009-1031

Covariate informed product partition models incorporate the intuitively appealing notion that individuals or units with similar covariate values a priori have a higher probability of co-clustering than those with dissimilar covariate values. These methods have been shown to perform well if the number of covariates is relatively small. However, as the number of covariates increase, their influence on partition probabilities overwhelm any information the response may provide in clustering and often encourage partitions with either a large number of singleton clusters or one large cluster resulting in poor model fit and poor out-of-sample prediction. This same phenomenon is observed in Bayesian nonparametric regression methods that induce a conditional distribution for the response given covariates through a joint model. In light of this, we propose two methods that calibrate the covariate-dependent partition model by capping the influence that covariates have on partition probabilities. We demonstrate the new methods’ utility using simulation and two publicly available datasets. 相似文献

7.

Two-stage informative cluster sampling—estimation and prediction with applications for small-area models

Abdulhakeem Eideh Gad Nathan 《Journal of statistical planning and inference》2009

This paper considers the effects of informative two-stage cluster sampling on estimation and prediction. The aims of this article are twofold: first to estimate the parameters of the superpopulation model for two-stage cluster sampling from a finite population, when the sampling design for both stages is informative, using maximum likelihood estimation methods based on the sample-likelihood function; secondly to predict the finite population total and to predict the cluster-specific effects and the cluster totals for clusters in the sample and for clusters not in the sample. To achieve this we derive the sample and sample-complement distributions and the moments of the first and second stage measurements. Also we derive the conditional sample and conditional sample-complement distributions and the moments of the cluster-specific effects given the cluster measurements. It should be noted that classical design-based inference that consists of weighting the sample observations by the inverse of sample selection probabilities cannot be applied for the prediction of the cluster-specific effects for clusters not in the sample. Also we give an alternative justification of the Royall [1976. The linear least squares prediction approach to two-stage sampling. Journal of the American Statistical Association 71, 657–664] predictor of the finite population total under two-stage cluster population. Furthermore, small-area models are studied under informative sampling. 相似文献

8.

Bayesian analysis of zero-inflated regression models

《Journal of statistical planning and inference》2006,136(4):1360-1375

In modeling defect counts collected from an established manufacturing processes, there are usually a relatively large number of zeros (non-defects). The commonly used models such as Poisson or Geometric distributions can underestimate the zero-defect probability and hence make it difficult to identify significant covariate effects to improve production quality. This article introduces a flexible class of zero inflated models which includes other familiar models such as the Zero Inflated Poisson (ZIP) models, as special cases. A Bayesian estimation method is developed as an alternative to traditionally used maximum likelihood based methods to analyze such data. Simulation studies show that the proposed method has better finite sample performance than the classical method with tighter interval estimates and better coverage probabilities. A real-life data set is analyzed to illustrate the practicability of the proposed method easily implemented using WinBUGS. 相似文献

9.

THE POTENTIAL FOR BIAS IN PRINCIPAL CAUSAL EFFECT ESTIMATION WHEN TREATMENT RECEIVED DEPENDS ON A KEY COVARIATE

Zigler CM Belin TR 《The annals of applied statistics》2011,5(3):1876-1892

Motivated by a potential-outcomes perspective, the idea of principal stratification has been widely recognized for its relevance in settings susceptible to posttreatment selection bias such as randomized clinical trials where treatment received can differ from treatment assigned. In one such setting, we address subtleties involved in inference for causal effects when using a key covariate to predict membership in latent principal strata. We show that when treatment received can differ from treatment assigned in both study arms, incorporating a stratum-predictive covariate can make estimates of the "complier average causal effect" (CACE) derive from observations in the two treatment arms with different covariate distributions. Adopting a Bayesian perspective and using Markov chain Monte Carlo for computation, we develop posterior checks that characterize the extent to which incorporating the pretreatment covariate endangers estimation of the CACE. We apply the method to analyze a clinical trial comparing two treatments for jaw fractures in which the study protocol allowed surgeons to overrule both possible randomized treatment assignments based on their clinical judgment and the data contained a key covariate (injury severity) predictive of treatment received. 相似文献

10.

Model selection in finite mixture of regression models: a Bayesian approach with innovative weighted g priors and reversible jump Markov chain Monte Carlo implementation

《Journal of Statistical Computation and Simulation》2012,82(12):2456-2478

Finite mixture of regression (FMR) models are aimed at characterizing subpopulation heterogeneity stemming from different sets of covariates that impact different groups in a population. We address the contemporary problem of simultaneously conducting covariate selection and determining the number of mixture components from a Bayesian perspective that can incorporate prior information. We propose a Gibbs sampling algorithm with reversible jump Markov chain Monte Carlo implementation to accomplish concurrent covariate selection and mixture component determination in FMR models. Our Bayesian approach contains innovative features compared to previously developed reversible jump algorithms. In addition, we introduce component-adaptive weighted g priors for regression coefficients, and illustrate their improved performance in covariate selection. Numerical studies show that the Gibbs sampler with reversible jump implementation performs well, and that the proposed weighted priors can be superior to non-adaptive unweighted priors. 相似文献

11.

Conditional Akaike information under covariate shift with application to small area estimation

下载免费PDF全文

Yuki Kawakubo Shonosuke Sugasawa Tatsuya Kubokawa 《Revue canadienne de statistique》2018,46(2):316-335

In this study, we consider the problem of selecting explanatory variables of fixed effects in linear mixed models under covariate shift, which is when the values of covariates in the model for prediction differ from those in the model for observed data. We construct a variable selection criterion based on the conditional Akaike information introduced by Vaida & Blanchard (2005). We focus especially on covariate shift in small area estimation and demonstrate the usefulness of the proposed criterion. In addition, numerical performance is investigated through simulations, one of which is a design‐based simulation using a real dataset of land prices. The Canadian Journal of Statistics 46: 316–335; 2018 © 2018 Statistical Society of Canada 相似文献

12.

The use of auxiliary variables in capture-recapture modelling: An overview

Kenneth H. Pollock 《Journal of applied statistics》2002,29(1-4):85-102

I review the use of auxiliary variables in capture-recapture models for estimation of demographic parameters (e.g. capture probability, population size, survival probability, and recruitment, emigration and immigration numbers). I focus on what has been done in current research and what still needs to be done. Typically in the literature, covariate modelling has made capture and survival probabilities functions of covariates, but there are good reasons also to make other parameters functions of covariates as well. The types of covariates considered include environmental covariates that may vary by occasion but are constant over animals, and individual animal covariates that are usually assumed constant over time. I also discuss the difficulties of using time-dependent individual animal covariates and some possible solutions. Covariates are usually assumed to be measured without error, and that may not be realistic. For closed populations, one approach to modelling heterogeneity in capture probabilities uses observable individual covariates and is thus related to the primary purpose of this paper. The now standard Huggins-Alho approach conditions on the captured animals and then uses a generalized Horvitz-Thompson estimator to estimate population size. This approach has the advantage of simplicity in that one does not have to specify a distribution for the covariates, and the disadvantage is that it does not use the full likelihood to estimate population size. Alternately one could specify a distribution for the covariates and implement a full likelihood approach to inference to estimate the capture function, the covariate probability distribution, and the population size. The general Jolly-Seber open model enables one to estimate capture probability, population sizes, survival rates, and birth numbers. Much of the focus on modelling covariates in program MARK has been for survival and capture probability in the Cormack-Jolly-Seber model and its generalizations (including tag-return models). These models condition on the number of animals marked and released. A related, but distinct, topic is radio telemetry survival modelling that typically uses a modified Kaplan-Meier method and Cox proportional hazards model for auxiliary variables. Recently there has been an emphasis on integration of recruitment in the likelihood, and research on how to implement covariate modelling for recruitment and perhaps population size is needed. The combined open and closed 'robust' design model can also benefit from covariate modelling and some important options have already been implemented into MARK. Many models are usually fitted to one data set. This has necessitated development of model selection criteria based on the AIC (Akaike Information Criteria) and the alternative of averaging over reasonable models. The special problems of estimating over-dispersion when covariates are included in the model and then adjusting for over-dispersion in model selection could benefit from further research. 相似文献

13.

Evaluating functional covariate‐environment interactions in the Cox regression model

Ling Zhou Haoqi Li Huazhen Lin Peter X.‐K. Song 《Revue canadienne de statistique》2019,47(2):204-221

Children exposed to mixtures of endocrine disrupting compounds such as phthalates are at high risk of experiencing significant friction in their growth and sexual maturation. This article is primarily motivated by a study that aims to assess the toxicants‐modified effects of risk factors related to the hazards of early or delayed onset of puberty among children living in Mexico City. To address the hypothesis of potential nonlinear modification of covariate effects, we propose a new Cox regression model with multiple functional covariate‐environment interactions, which allows covariate effects to be altered nonlinearly by mixtures of exposed toxicants. This new class of models is rather flexible and includes many existing semiparametric Cox models as special cases. To achieve efficient estimation, we develop the global partial likelihood method of inference, in which we establish key large‐sample results, including estimation consistency, asymptotic normality, semiparametric efficiency and the generalized likelihood ratio test for both parameters and nonparametric functions. The proposed methodology is examined via simulation studies and applied to the analysis of the motivating data, where maternal exposures to phthalates during the third trimester of pregnancy are found to be important risk modifiers for the age of attaining the first stage of puberty. The Canadian Journal of Statistics 47: 204–221; 2019 © 2019 Statistical Society of Canada 相似文献

14.

The use of auxiliary variables in capture-recapture modelling: an overview

Kenneth H. Pollock 《Journal of applied statistics》2002,29(1):85-102

I review the use of auxiliary variables in capture-recapture models for estimation of demographic parameters (e.g. capture probability, population size, survival probability, and recruitment, emigration and immigration numbers). I focus on what has been done in current research and what still needs to be done. Typically in the literature, covariate modelling has made capture and survival probabilities functions of covariates, but there are good reasons also to make other parameters functions of covariates as well. The types of covariates considered include environmental covariates that may vary by occasion but are constant over animals, and individual animal covariates that are usually assumed constant over time. I also discuss the difficulties of using time-dependent individual animal covariates and some possible solutions. Covariates are usually assumed to be measured without error, and that may not be realistic. For closed populations, one approach to modelling heterogeneity in capture probabilities uses observable individual covariates and is thus related to the primary purpose of this paper. The now standard Huggins-Alho approach conditions on the captured animals and then uses a generalized Horvitz-Thompson estimator to estimate population size. This approach has the advantage of simplicity in that one does not have to specify a distribution for the covariates, and the disadvantage is that it does not use the full likelihood to estimate population size. Alternately one could specify a distribution for the covariates and implement a full likelihood approach to inference to estimate the capture function, the covariate probability distribution, and the population size. The general Jolly-Seber open model enables one to estimate capture probability, population sizes, survival rates, and birth numbers. Much of the focus on modelling covariates in program MARK has been for survival and capture probability in the Cormack-Jolly-Seber model and its generalizations (including tag-return models). These models condition on the number of animals marked and released. A related, but distinct, topic is radio telemetry survival modelling that typically uses a modified Kaplan-Meier method and Cox proportional hazards model for auxiliary variables. Recently there has been an emphasis on integration of recruitment in the likelihood, and research on how to implement covariate modelling for recruitment and perhaps population size is needed. The combined open and closed 'robust' design model can also benefit from covariate modelling and some important options have already been implemented into MARK. Many models are usually fitted to one data set. This has necessitated development of model selection criteria based on the AIC (Akaike Information Criteria) and the alternative of averaging over reasonable models. The special problems of estimating over-dispersion when covariates are included in the model and then adjusting for over-dispersion in model selection could benefit from further research. 相似文献

15.

Maximum Likelihood Estimators in Regression Models for Error‐prone Group Testing Data

下载免费PDF全文

Xianzheng Huang Md Shamim Sarker Warasi 《Scandinavian Journal of Statistics》2017,44(4):918-931

Since Dorfman's seminal work on the subject, group testing has been widely adopted in epidemiological studies. In Dorfman's context of detecting syphilis, group testing entails pooling blood samples and testing the pools, as opposed to testing individual samples. A negative pool indicates all individuals in the pool free of syphilis antigen, whereas a positive pool suggests one or more individuals carry the antigen. With covariate information collected, researchers have considered regression models that allow one to estimate covariate‐adjusted disease probability. We study maximum likelihood estimators of covariate effects in these regression models when the group testing response is prone to error. We show that, when compared with inference drawn from individual testing data, inference based on group testing data can be more resilient to response misclassification in terms of bias and efficiency. We provide valuable guidance on designing the group composition to alleviate adverse effects of misclassification on statistical inference. 相似文献

16.

Information borrowing methods for covariate‐adjusted ROC curve

Zhong GUAN Jing QIN Biao ZHANG 《Revue canadienne de statistique》2012,40(3):569-587

In medical diagnostic testing problems, the covariate adjusted receiver operating characteristic (ROC) curves have been discussed recently for achieving the best separation between disease and control. Due to various restrictions such as cost, the availability of patients, and ethical issues quite frequently only limited information is available. As a result, we are unlikely to have a large enough overall sample size to support reliable direct estimations of ROCs for all the underlying covariates of interest. For example, some genetic factors are less commonly observable compared with others. To get an accurate covariate adjusted ROC estimation, novel statistical methods are needed to effectively utilize the limited information. Therefore, it is desirable to use indirect estimates that borrow strength by employing values of the variables of interest from neighbouring covariates. In this paper we discuss two semiparametric exponential tilting models, where the density functions from different covariate levels share a common baseline density, and the parameters in the exponential tilting component reflect the difference among the covariates. With the proposed models, the estimated covariate adjusted ROC is much smoother and more efficient than the nonparametric counterpart without borrowing information from neighbouring covariates. A simulation study and a real data application are reported. The Canadian Journal of Statistics 40: 569–587; 2012 © 2012 Statistical Society of Canada 相似文献

17.

Estimation and Prediction in the Presence of Spatial Confounding for Spatial Linear Models

下载免费PDF全文

Garritt L. Page Yajun Liu Zhuoqiong He Donchu Sun 《Scandinavian Journal of Statistics》2017,44(3):780-797

In studies that produce data with spatial structure, it is common that covariates of interest vary spatially in addition to the error. Because of this, the error and covariate are often correlated. When this occurs, it is difficult to distinguish the covariate effect from residual spatial variation. In an i.i.d. normal error setting, it is well known that this type of correlation produces biased coefficient estimates, but predictions remain unbiased. In a spatial setting, recent studies have shown that coefficient estimates remain biased, but spatial prediction has not been addressed. The purpose of this paper is to provide a more detailed study of coefficient estimation from spatial models when covariate and error are correlated and then begin a formal study regarding spatial prediction. This is carried out by investigating properties of the generalized least squares estimator and the best linear unbiased predictor when a spatial random effect and a covariate are jointly modelled. Under this setup, we demonstrate that the mean squared prediction error is possibly reduced when covariate and error are correlated. 相似文献

18.

A matching prior based on the modified profile likelihood in a generalized Weibull stress‐strength model

Xiaoyi Min Dongchu Sun 《Revue canadienne de statistique》2013,41(1):83-97

Bayesian inference of a generalized Weibull stress‐strength model (SSM) with more than one strength component is considered. For this problem, properly assigning priors for the reliabilities is challenging due to the presence of nuisance parameters. Matching priors, which are priors matching the posterior probabilities of certain regions with their frequentist coverage probabilities, are commonly used but difficult to derive in this problem. Instead, we apply an alternative method and derive a matching prior based on a modification of the profile likelihood. Simulation studies show that this proposed prior performs well in terms of frequentist coverage and estimation even when the sample sizes are minimal. The prior is applied to two real datasets. The Canadian Journal of Statistics 41: 83–97; 2013 © 2012 Statistical Society of Canada 相似文献

19.

Variable selection for partially linear proportional hazards model with covariate measurement error

Xiao Song Li Wang Shuangge Ma Hanwen Huang 《Journal of nonparametric statistics》2019,31(1):196-220

In survival analysis, we may encounter the following three problems: nonlinear covariate effect, variable selection and measurement error. Existing studies only address one or two of these problems. The goal of this study is to fill the knowledge gap and develop a novel approach to simultaneously address all three problems. Specifically, a partially time-varying coefficient proportional hazards model is proposed to more flexibly describe covariate effects. Corrected score and conditional score approaches are employed to accommodate potential measurement error. For the selection of relevant variables and regularised estimation, a penalisation approach is adopted. It is shown that the proposed approach has satisfactory asymptotic properties. It can be effectively realised using an iterative algorithm. The performance of the proposed approach is assessed via simulation studies and further illustrated by application to data from an AIDS clinical trial. 相似文献

20.

The Copula Information Criteria

Steffen Grønneberg Nils Lid Hjort 《Scandinavian Journal of Statistics》2014,41(2):436-459

We derive two types of Akaike information criterion (AIC)‐like model‐selection formulae for the semiparametric pseudo‐maximum likelihood procedure. We first adapt the arguments leading to the original AIC formula, related to empirical estimation of a certain Kullback–Leibler information distance. This gives a significantly different formula compared with the AIC, which we name the copula information criterion. However, we show that such a model‐selection procedure cannot exist for copula models with densities that grow very fast near the edge of the unit cube. This problem affects most popular copula models. We then derive what we call the cross‐validation copula information criterion, which exists under weak conditions and is a first‐order approximation to exact cross validation. This formula is very similar to the standard AIC formula but has slightly different motivation. A brief illustration with real data is given. 相似文献