期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On power and sample size calculations for Wald tests in generalized linear models

《Journal of statistical planning and inference》2005,128(1):43-59

A Wald test-based approach for power and sample size calculations has been presented recently for logistic and Poisson regression models using the asymptotic normal distribution of the maximum likelihood estimator, which is applicable to tests of a single parameter. Unlike the previous procedures involving the use of score and likelihood ratio statistics, there is no simple and direct extension of this approach for tests of more than a single parameter. In this article, we present a method for computing sample size and statistical power employing the discrepancy between the noncentral and central chi-square approximations to the distribution of the Wald statistic with unrestricted and restricted parameter estimates, respectively. The distinguishing features of the proposed approach are the accommodation of tests about multiple parameters, the flexibility of covariate configurations and the generality of overall response levels within the framework of generalized linear models. The general procedure is illustrated with some special situations that have motivated this research. Monte Carlo simulation studies are conducted to assess and compare its accuracy with existing approaches under several model specifications and covariate distributions. 相似文献

2.

Direct Modelling of Regression Effects for Transition Probabilities in Multistate Models 总被引：4，自引：0，他引：4

THOMAS H. SCHEIKE MEI-JIE ZHANG 《Scandinavian Journal of Statistics》2007,34(1):17-32

Abstract. A simple and standard approach for analysing multistate model data is to model all transition intensities and then compute a summary measure such as the transition probabilities based on this. This approach is relatively simple to implement but it is difficult to see what the covariate effects are on the scale of interest. In this paper, we consider an alternative approach that directly models the covariate effects on transition probabilities in multistate models. Our new approach is based on binomial modelling and inverse probability of censoring weighting techniques and is very simple to implement by standard software. We show how to do flexible regression models with possibly time-varying covariate effects. 相似文献

3.

Methods for missing covariates in logistic regression

Myunghee Cho Paik 《统计学通讯:模拟与计算》2013,42(1):1-19

Various methods have been suggested in the literature to handle a missing covariate in the presence of surrogate covariates. These methods belong to one of two paradigms. In the imputation paradigm, Pepe and Fleming (1991) and Reilly and Pepe (1995) suggested filling in missing covariates using the empirical distribution of the covariate obtained from the observed data. We can proceed one step further by imputing the missing covariate using nonparametric maximum likelihood estimates (NPMLE) of the density of the covariate. Recently Murphy and Van der Vaart (1998a) showed that such an approach yields a consistent, asymptotically normal, and semiparametric efficient estimate for the logistic regression coefficient. In the weighting paradigm, Zhao and Lipsitz (1992) suggested an estimating function using completely observed records after weighting inversely by the probability of observation. An extension of this weighting approach designed to achieve semiparametric efficient bound is considered by Robins, Hsieh and Newey (RHN) (1995). The two ends of each paradigm (NPMLE and RHN) attain the efficiency bound and are asymptotically equivalent. However, both require a substantial amount of computation. A question arises whether and when, in practical situations, this extensive computation is worthwhile. In this paper we investigate the performance of single and multiple imputation estimates, weighting estimates, semiparametric efficient estimates, and two new imputation estimates. Simulation studies suggest that the sample size should be substantially large (e.g. n=2000) for NPMLE and RHN to be more efficient than simpler imputation estimates. When the sample size is moderately large (n≤ 1500), simpler imputation estimates have as small a variance as semiparametric efficient estimates. 相似文献

4.

Goodness-of-fit Tests for GEE with Correlated Binary Data 总被引：3，自引：0，他引：3

WEI PAN 《Scandinavian Journal of Statistics》2002,29(1):101-110

The marginal logistic regression, in combination with GEE, is an increasingly important method in dealing with correlated binary data. As for independent binary data, when the number of possible combinations of the covariate values in a logistic regression model is much larger than the sample size, such as when the logistic model contains at least one continuous covariate, many existing chi-square goodness-of-fit tests either are not applicable or have some serious drawbacks. In this paper two residual based normal goodness-of-fit test statistics are proposed: the Pearson chi-square and an unweighted sum of residual squares. Easy-to-calculate approximations to the mean and variance of either statistic are also given. Their performance, in terms of both size and power, was satisfactory in our simulation studies. For illustration we apply them to a real data set. 相似文献

5.

Optimal Designs for Binary Logistic Regression with a Qualitative Classifier with Independent Levels

Karabi Nandy Sami Helle Antti Liski Erkki Liski 《统计学通讯:模拟与计算》2013,42(10):1962-1977

Dose response studies arise in many medical applications. Often, such studies are considered within the framework of binary-response experiments such as success-failure. In such cases, popular choices for modeling the probability of response are logistic or probit models. Design optimality has been well studied for the logistic model with a continuous covariate. A natural extension of the logistic model is to consider the presence of a qualitative classifier. In this work, we explore D-, A-, and E-optimal designs in a two-parameter, binary logistic regression model after introducing a binary, qualitative classifier with independent levels. 相似文献

6.

Calibrating covariate informed product partition models

Garritt L. Page Fernando A. Quintana 《Statistics and Computing》2018,28(5):1009-1031

Covariate informed product partition models incorporate the intuitively appealing notion that individuals or units with similar covariate values a priori have a higher probability of co-clustering than those with dissimilar covariate values. These methods have been shown to perform well if the number of covariates is relatively small. However, as the number of covariates increase, their influence on partition probabilities overwhelm any information the response may provide in clustering and often encourage partitions with either a large number of singleton clusters or one large cluster resulting in poor model fit and poor out-of-sample prediction. This same phenomenon is observed in Bayesian nonparametric regression methods that induce a conditional distribution for the response given covariates through a joint model. In light of this, we propose two methods that calibrate the covariate-dependent partition model by capping the influence that covariates have on partition probabilities. We demonstrate the new methods’ utility using simulation and two publicly available datasets. 相似文献

7.

Bias due to Ignoring the Sample Design in Case–Control Studies

John M. Neuhaus 《Australian & New Zealand Journal of Statistics》2002,44(3):285-293

Case–control studies allow efficient estimation of the associations of covariates with a binary response in settings where the probability of a positive response is small. It is well known that covariate–response associations can be consistently estimated using a logistic model by acting as if the case–control (retrospective) data were prospective, and that this result does not hold for other binary regression models. However, in practice an investigator may be interested in fitting a non–logistic link binary regression model and this paper examines the magnitude of the bias resulting from ignoring the case–control sample design with such models. The paper presents an approximation to the magnitude of this bias in terms of the sampling rates of cases and controls, as well as simulation results that show that the bias can be substantial. 相似文献

8.

The use of auxiliary variables in capture-recapture modelling: An overview

Kenneth H. Pollock 《Journal of applied statistics》2002,29(1-4):85-102

I review the use of auxiliary variables in capture-recapture models for estimation of demographic parameters (e.g. capture probability, population size, survival probability, and recruitment, emigration and immigration numbers). I focus on what has been done in current research and what still needs to be done. Typically in the literature, covariate modelling has made capture and survival probabilities functions of covariates, but there are good reasons also to make other parameters functions of covariates as well. The types of covariates considered include environmental covariates that may vary by occasion but are constant over animals, and individual animal covariates that are usually assumed constant over time. I also discuss the difficulties of using time-dependent individual animal covariates and some possible solutions. Covariates are usually assumed to be measured without error, and that may not be realistic. For closed populations, one approach to modelling heterogeneity in capture probabilities uses observable individual covariates and is thus related to the primary purpose of this paper. The now standard Huggins-Alho approach conditions on the captured animals and then uses a generalized Horvitz-Thompson estimator to estimate population size. This approach has the advantage of simplicity in that one does not have to specify a distribution for the covariates, and the disadvantage is that it does not use the full likelihood to estimate population size. Alternately one could specify a distribution for the covariates and implement a full likelihood approach to inference to estimate the capture function, the covariate probability distribution, and the population size. The general Jolly-Seber open model enables one to estimate capture probability, population sizes, survival rates, and birth numbers. Much of the focus on modelling covariates in program MARK has been for survival and capture probability in the Cormack-Jolly-Seber model and its generalizations (including tag-return models). These models condition on the number of animals marked and released. A related, but distinct, topic is radio telemetry survival modelling that typically uses a modified Kaplan-Meier method and Cox proportional hazards model for auxiliary variables. Recently there has been an emphasis on integration of recruitment in the likelihood, and research on how to implement covariate modelling for recruitment and perhaps population size is needed. The combined open and closed 'robust' design model can also benefit from covariate modelling and some important options have already been implemented into MARK. Many models are usually fitted to one data set. This has necessitated development of model selection criteria based on the AIC (Akaike Information Criteria) and the alternative of averaging over reasonable models. The special problems of estimating over-dispersion when covariates are included in the model and then adjusting for over-dispersion in model selection could benefit from further research. 相似文献

9.

The use of auxiliary variables in capture-recapture modelling: an overview

Kenneth H. Pollock 《Journal of applied statistics》2002,29(1):85-102

I review the use of auxiliary variables in capture-recapture models for estimation of demographic parameters (e.g. capture probability, population size, survival probability, and recruitment, emigration and immigration numbers). I focus on what has been done in current research and what still needs to be done. Typically in the literature, covariate modelling has made capture and survival probabilities functions of covariates, but there are good reasons also to make other parameters functions of covariates as well. The types of covariates considered include environmental covariates that may vary by occasion but are constant over animals, and individual animal covariates that are usually assumed constant over time. I also discuss the difficulties of using time-dependent individual animal covariates and some possible solutions. Covariates are usually assumed to be measured without error, and that may not be realistic. For closed populations, one approach to modelling heterogeneity in capture probabilities uses observable individual covariates and is thus related to the primary purpose of this paper. The now standard Huggins-Alho approach conditions on the captured animals and then uses a generalized Horvitz-Thompson estimator to estimate population size. This approach has the advantage of simplicity in that one does not have to specify a distribution for the covariates, and the disadvantage is that it does not use the full likelihood to estimate population size. Alternately one could specify a distribution for the covariates and implement a full likelihood approach to inference to estimate the capture function, the covariate probability distribution, and the population size. The general Jolly-Seber open model enables one to estimate capture probability, population sizes, survival rates, and birth numbers. Much of the focus on modelling covariates in program MARK has been for survival and capture probability in the Cormack-Jolly-Seber model and its generalizations (including tag-return models). These models condition on the number of animals marked and released. A related, but distinct, topic is radio telemetry survival modelling that typically uses a modified Kaplan-Meier method and Cox proportional hazards model for auxiliary variables. Recently there has been an emphasis on integration of recruitment in the likelihood, and research on how to implement covariate modelling for recruitment and perhaps population size is needed. The combined open and closed 'robust' design model can also benefit from covariate modelling and some important options have already been implemented into MARK. Many models are usually fitted to one data set. This has necessitated development of model selection criteria based on the AIC (Akaike Information Criteria) and the alternative of averaging over reasonable models. The special problems of estimating over-dispersion when covariates are included in the model and then adjusting for over-dispersion in model selection could benefit from further research. 相似文献

10.

A Better Alternative to Non-parametric Approaches for Adjusting for Covariate Measurement Errors in Logistic Regression

Shahadut Hossain Zahirul Hoque A. H. M. Saidul Hasan 《统计学通讯:模拟与计算》2016,45(8):2659-2677

In this article, we propose a flexible parametric (FP) approach for adjusting for covariate measurement errors in regression that can accommodate replicated measurements on the surrogate (mismeasured) version of the unobserved true covariate on all the study subjects or on a sub-sample of the study subjects as error assessment data. We utilize the general framework of the FP approach proposed by Hossain and Gustafson in 2009 for adjusting for covariate measurement errors in regression. The FP approach is then compared with the existing non-parametric approaches when error assessment data are available on the entire sample of the study subjects (complete error assessment data) considering covariate measurement error in a multiple logistic regression model. We also developed the FP approach when error assessment data are available on a sub-sample of the study subjects (partial error assessment data) and investigated its performance using both simulated and real life data. Simulation results reveal that, in comparable situations, the FP approach performs as good as or better than the competing non-parametric approaches in eliminating the bias that arises in the estimated regression parameters due to covariate measurement errors. Also, it results in better efficiency of the estimated parameters. Finally, the FP approach is found to perform adequately well in terms of bias correction, confidence coverage, and in achieving appropriate statistical power under partial error assessment data. 相似文献

11.

Paired comparison models applied to the design of the Major League baseball play-offs

Donald E. K. Martin 《Journal of applied statistics》1999,26(1):69-80

This paper presents an analysis of the eff ect of various baseball play-off configurations on the probability of advancing to the World Series. Play-off games are assumed to be independent. Several paired comparisons models are considered for modeling the probability of a home team winning a single game as a function of the winning percentages of the contestants over the course of the season. The uniform and logistic regression models are both adequate, whereas the Bradley-Terry model (modified for within-pair order eff ects, i.e. the home field advantage) is not. The single-game probabilities are then used to compute the probability of winning the play-off s under various structures. The extra round of play-off s, instituted in 1994, significantly lowers the probability of the team with the best record advancing to the World Series, whereas home field advantage and the diff erent possible play-offdraws have a minimal eff ect. 相似文献

12.

Analyzing dependence in incidence of diabetes and heart problem using generalized bivariate geometric models with covariates

M. Ataharul Islam Rafiqul I. Chowdhury K. S. Sultan 《Journal of applied statistics》2017,44(16):2890-2907

For analyzing incidence data on diabetes and health problems, the bivariate geometric probability distribution is a natural choice but remained unexplored largely due to lack of models linking covariates with the probabilities of bivariate incidence of correlated outcomes. In this paper, bivariate geometric models are proposed for two correlated incidence outcomes. The extended generalized linear models are developed to take into account covariate dependence of the bivariate probabilities of correlated incidence outcomes for diabetes and heart diseases for the elderly population. The estimation and test procedures are illustrated using the Health and Retirement Study data. Two models are shown in this paper, one based on conditional-marginal approach and the other one based on the joint probability distribution with an association parameter. The joint model with association parameter appears to be a very good choice for analyzing the covariate dependence of the joint incidence of diabetes and heart diseases. Bootstrapping is performed to measure the accuracy of estimates and the results indicate very small bias. 相似文献

13.

Hierarchical logistic regression models for imputation of unresolved enumeration status in undercount estimation

Belin TR Diffendal GJ Mack S Rubin DB Schafer JL Zaslavsky AM 《Journal of the American Statistical Association》1993,88(423):1,149-1,166

"In this article we describe a logistic regression modeling approach for nonresponse in the [U.S.] Post-Enumeration Survey (PES) that has desirable theoretical properties and that has performed well in practice.... In the 1990 PES, interviews were not obtained from approximately 1.2% of households in the sample, and approximately 2.1% of the individuals in interviewed households were considered unresolved after follow-up....The missing binary enumeration statuses for these unresolved cases were replaced with probabilities estimated under a statistical model that incorporated covariate information observed for these cases. This article describes an approach to modeling missing binary outcomes when there are a large number of covariates." 相似文献

14.

Parameter estimation in regression for long-term survival rate from censored data

《Journal of statistical planning and inference》2001,99(2):211-222

In recent years, regression models have been shown to be useful for predicting the long-term survival probabilities of patients in clinical trials. The importance of a regression model is that once the regression parameters are estimated information about the regressed quantity is immediate. A simple estimator is proposed for the regression parameters in a model for the long-term survival rate. The proposed estimator is seen to arise from an estimating function that has the missing information principle underlying its construction. When the covariate takes values in a finite set, the proposed estimating function is equivalent to an ad hoc estimating function proposed in the literature. However, in general, the two estimating functions lead to different estimators of the regression parameter. For discrete covariates, the asymptotic covariance matrix of the proposed estimator is simple to calculate using standard techniques involving the predictable covariation process of martingale transforms. An ad hoc extension to the case of a one-dimensional continuous covariate is proposed. Simplicity and generalizability are two attractive features of the proposed approach. The last mentioned feature is not enjoyed by the other estimator. 相似文献

15.

GMM nonparametric correction methods for logistic regression with error‐contaminated covariates and partially observed instrumental variables

Xiao Song Ching‐Yun Wang 《Scandinavian Journal of Statistics》2019,46(3):898-919

We consider logistic regression with covariate measurement error. Most existing approaches require certain replicates of the error‐contaminated covariates, which may not be available in the data. We propose generalized method of moments (GMM) nonparametric correction approaches that use instrumental variables observed in a calibration subsample. The instrumental variable is related to the underlying true covariates through a general nonparametric model, and the probability of being in the calibration subsample may depend on the observed variables. We first take a simple approach adopting the inverse selection probability weighting technique using the calibration subsample. We then improve the approach based on the GMM using the whole sample. The asymptotic properties are derived, and the finite sample performance is evaluated through simulation studies and an application to a real data set. 相似文献

16.

Variable selection for semiparametric regression models with iterated penalization

Dai Y Ma S 《Journal of nonparametric statistics》2012,24(2):283-298

Semiparametric regression models with multiple covariates are commonly encountered. When there are covariates not associated with response variable, variable selection may lead to sparser models, more lucid interpretations and more accurate estimation. In this study, we adopt a sieve approach for the estimation of nonparametric covariate effects in semiparametric regression models. We adopt a two-step iterated penalization approach for variable selection. In the first step, a mixture of the Lasso and group Lasso penalties are employed to conduct the first-round variable selection and obtain the initial estimate. In the second step, a mixture of the weighted Lasso and weighted group Lasso penalties, with weights constructed using the initial estimate, are employed for variable selection. We show that the proposed iterated approach has the variable selection consistency property, even when number of unknown parameters diverges with sample size. Numerical studies, including simulation and analysis of a diabetes dataset, show satisfactory performance of the proposed approach. 相似文献

17.

Empirical likelihood-based inference in nonlinear regression models with missing responses at random

Nian-Sheng Tang Pu-Ying Zhao 《Statistics》2013,47(6):1141-1159

This paper investigates the estimations of regression parameters and response mean in nonlinear regression models in the presence of missing response variables that are missing with missingness probabilities depending on covariates. We propose four empirical likelihood (EL)-based estimators for the regression parameters and the response mean. The resulting estimators are shown to be consistent and asymptotically normal under some general assumptions. To construct the confidence regions for the regression parameters as well as the response mean, we develop four EL ratio statistics, which are proven to have the χ² distribution asymptotically. Simulation studies and an artificial data set are used to illustrate the proposed methodologies. Empirical results show that the EL method behaves better than the normal approximation method and that the coverage probabilities and average lengths depend on the selection probability function. 相似文献

18.

To adjust or not to adjust for baseline when analyzing repeated binary responses? The case of complete data when treatment comparison at study end is of interest

下载免费PDF全文

Honghua Jiang Pandurang M. Kulkarni Craig H. Mallinckrodt Linda Shurzinske Geert Molenberghs Ilya Lipkovich 《Pharmaceutical statistics》2015,14(3):262-271

The benefits of adjusting for baseline covariates are not as straightforward with repeated binary responses as with continuous response variables. Therefore, in this study, we compared different methods for analyzing repeated binary data through simulations when the outcome at the study endpoint is of interest. Methods compared included chi‐square, Fisher's exact test, covariate adjusted/unadjusted logistic regression (Adj.logit/Unadj.logit), covariate adjusted/unadjusted generalized estimating equations (Adj.GEE/Unadj.GEE), covariate adjusted/unadjusted generalized linear mixed model (Adj.GLMM/Unadj.GLMM). All these methods preserved the type I error close to the nominal level. Covariate adjusted methods improved power compared with the unadjusted methods because of the increased treatment effect estimates, especially when the correlation between the baseline and outcome was strong, even though there was an apparent increase in standard errors. Results of the Chi‐squared test were identical to those for the unadjusted logistic regression. Fisher's exact test was the most conservative test regarding the type I error rate and also with the lowest power. Without missing data, there was no gain in using a repeated measures approach over a simple logistic regression at the final time point. Analysis of results from five phase III diabetes trials of the same compound was consistent with the simulation findings. Therefore, covariate adjusted analysis is recommended for repeated binary data when the study endpoint is of interest. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

19.

Simulation‐based sample‐sizing and power calculations in logistic regression with partial prior information

下载免费PDF全文

Andrew P. Grieve Shah‐Jalal Sarker 《Pharmaceutical statistics》2016,15(6):507-516

There have been many approximations developed for sample sizing of a logistic regression model with a single normally‐distributed stimulus. Despite this, it has been recognised that there is no consensus as to the best method. In pharmaceutical drug development, simulation provides a powerful tool to characterise the operating characteristics of complex adaptive designs and is an ideal method for determining the sample size for such a problem. In this paper, we address some issues associated with applying simulation to determine the sample size for a given power in the context of logistic regression. These include efficient methods for evaluating the convolution of a logistic function and a normal density and an efficient heuristic approach to searching for the appropriate sample size. We illustrate our approach with three case studies. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

20.

Statistical Analysis of ‘Probabilities of Causation’ Using Co‐variate Information

MANABU KUROKI ZHIHONG CAI 《Scandinavian Journal of Statistics》2011,38(3):564-577

Abstract. This article deals with two problems concering the probabilities of causation defined by Pearl (Causality: models, reasoning, and inference, 2nd edn, 2009, Cambridge University Press, New York) namely, the probability that one observed event was a necessary (or sufficient, or both) cause of another; one is to derive new bounds, and the other is to provide the covariate selection criteria. Tian & Pearl (Ann. Math. Artif. Intell., 28, 2000, 287–313) showed how to bound the probabilities of causation using information from experimental and observational studies, with minimal assumptions about the data‐generating process, and identifiable conditions for these probabilities. In this article, we derive narrower bounds using covariate information that is available from those studies. In addition, we propose the conditional monotonicity assumption so as to further narrow the bounds. Moreover, we discuss the covariate selection problem from the viewpoint of the estimation accuracy, and show that selecting a covariate that has a direct effect on an outcome variable cannot always improve the estimation accuracy, which is contrary to the situation in linear regression models. These results provide more accurate information for public policy, legal determination of responsibility and personal decision making. 相似文献