期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comparison of GEE1 and GEE2 estimation applied to clustered logistic regression

《Journal of Statistical Computation and Simulation》2012,82(4):361-378

Generalized estimating equations (GEE) have become a popular method for marginal regression modelling of data that occur in clusters. Features of the GEE methodology are the use of a ‘working covariance’, an approximation to the underlying covariance, which is used to improve the efficiency in estimating the regression coefficients, and the ‘sandwich’ estimate of variance, which provides a way of consistently estimating their standard errors. These techniques have been extended to include estimating equations for the underlying correlation structure, both to improve the efficiency of the regression coefficient estimates and to provide estimates of correlations between units in a cluster, when these are of interest. If the mean structure is of primary interest, then a simpler set of equations (GEE1) can be used, whereas if the underlying covariance structure is of interest in its own right, the use of the more complex GEE2 estimating equations is often recommended. In this paper, we compare the effect of increasing the complexity of the ‘working covariances’ on the variance of the parameter estimates, as well as the mean-squared error of the ‘sandwich’ estimate of variance. We give asymptotic expressions for these variances and mean-squared error terms. We use these to study the behaviour of different variants of GEE1 and GEE2 when we change the number of clusters, the cluster size, and the within-cluster correlation. We conclude that the extra complexity of the full GEE2 approach is not usually justified if the mean structure is of primary interest. 相似文献

2.

Incomplete covariates data in generalized linear models

《Journal of statistical planning and inference》1999,79(2):247-258

We consider regression analysis when part of covariates are incomplete in generalized linear models. The incomplete covariates could be due to measurement error or missing for some study subjects. We assume there exists a validation sample in which the data is complete and is a simple random subsample from the whole sample. Based on the idea of projection-solution method in Heyde (1997, Quasi-Likelihood and its Applications: A General Approach to Optimal Parameter Estimation. Springer, New York), a class of estimating functions is proposed to estimate the regression coefficients through the whole data. This method does not need to specify a correct parametric model for the incomplete covariates to yield a consistent estimate, and avoids the ‘curse of dimensionality’ encountered in the existing semiparametric method. Simulation results shows that the finite sample performance and efficiency property of the proposed estimates are satisfactory. Also this approach is computationally convenient hence can be applied to daily data analysis. 相似文献

3.

An alternative approach to the analysis of longitudinal data via generalized estimating equations

《Journal of statistical planning and inference》1997,63(1):39-54

The generalized estimating equations (GEE) introduced by Liang and Zeger (Biometrika 73 (1986) 13–22) have been widely used over the past decade to analyze longitudinal data. The method uses a generalized quasi-score function estimate for the regression coefficients, and moment estimates for the correlation parameters. Recently, Crowder (Biometrika 82 (1995) 407–410) has pointed out some pitfalls with the estimation of the correlation parameters in the GEE method. In this paper we present a new method for estimating the correlation parameters which overcomes those pitfalls. For some commonly assumed correlation structures, we obtain unique feasible estimates for the correlation parameters. Large sample properties of our estimates are also established. 相似文献

4.

Doubly robust empirical likelihood inference in covariate-missing data problems

Biao Zhang 《Statistics》2016,50(5):1173-1194

Missing covariate data occurs often in regression analysis. We study methods for estimating the regression coefficients in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866] on regression analyses with missing covariates, in which they pioneered the use of two working models, the working propensity score model and the working conditional score model. A recent approach to missing covariate data analysis is the empirical likelihood method of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503], which effectively combines unbiased estimating equations. In this paper, we consider an alternative likelihood approach based on the full likelihood of the observed data. This full likelihood-based method enables us to generate estimators for the vector of the regression coefficients that are (a) asymptotically equivalent to those of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the working propensity score model is correctly specified, and (b) doubly robust, like the augmented inverse probability weighting (AIPW) estimators of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Am Statist Assoc. 1994;89:846–866]. Thus, the proposed full likelihood-based estimators improve on the efficiency of the AIPW estimators when the working propensity score model is correct but the working conditional score model is possibly incorrect, and also improve on the empirical likelihood estimators of Qin, Zhang and Leung [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the reverse is true, that is, the working conditional score model is correct but the working propensity score model is possibly incorrect. In addition, we consider a regression method for estimation of the regression coefficients when the working conditional score model is correctly specified; the asymptotic variance of the resulting estimator is no greater than the semiparametric variance bound characterized by the theory of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866]. Finally, we compare the finite-sample performance of various estimators in a simulation study. 相似文献

5.

Smoothing combined estimating equations in quantile regression for longitudinal data

Chenlei Leng Weiping Zhang 《Statistics and Computing》2014,24(1):123-136

Quantile regression has become a powerful complement to the usual mean regression. A simple approach to use quantile regression in marginal analysis of longitudinal data is to assume working independence. However, this may incur potential efficiency loss. On the other hand, correctly specifying a working correlation in quantile regression can be difficult. We propose a new quantile regression model by combining multiple sets of unbiased estimating equations. This approach can account for correlations between the repeated measurements and produce more efficient estimates. Because the objective function is discrete and non-convex, we propose induced smoothing for fast and accurate computation of the parameter estimates, as well as their asymptotic covariance, using Newton-Raphson iteration. We further develop a robust quantile rank score test for hypothesis testing. We show that the resulting estimate is asymptotically normal and more efficient than the simple estimate using working independence. Extensive simulations and a real data analysis show the usefulness of the method. 相似文献

6.

Bayesian variable selection in Poisson change-point regression analysis

S. Min 《统计学通讯:模拟与计算》2017,46(3):2267-2282

In this article, we develop a Bayesian variable selection method that concerns selection of covariates in the Poisson change-point regression model with both discrete and continuous candidate covariates. Ranging from a null model with no selected covariates to a full model including all covariates, the Bayesian variable selection method searches the entire model space, estimates posterior inclusion probabilities of covariates, and obtains model averaged estimates on coefficients to covariates, while simultaneously estimating a time-varying baseline rate due to change-points. For posterior computation, the Metropolis-Hastings within partially collapsed Gibbs sampler is developed to efficiently fit the Poisson change-point regression model with variable selection. We illustrate the proposed method using simulated and real datasets. 相似文献

7.

Fitting generalized linear models to retrospectively sampled clusters with categorical responses

R.J. O'Hara Hines 《Revue canadienne de statistique》1997,25(2):159-174

We use simulations based on data on injury severity in car accidents to compare methods for the analysis of very large data sets containing clusters of individuals for which the measured response is polytomous. Retrospective sampling of clusters is used to expedite the analysis of the large data set while at the same time obtaining information about rare, but important, outcomes. An additional complication in the analysis of such data sets is that there can be two types of covariates: those which vary within a cluster and those which vary only among clusters. Weighted generalized estimating equations are developed to obtain consistent estimates of the regression coefficients in a proportional-odds model, along with a weighted robust covariance matrix to estimate the variabilities of these estimated coefficients. 相似文献

8.

On maximum likelihood estimation of the semi-parametric Cox model with time-varying covariates

Mark Thackham Jun Ma 《Journal of applied statistics》2020,47(9):1511

Including time-varying covariates is a popular extension to the Cox model and a suitable approach for dealing with non-proportional hazards. However, partial likelihood (PL) estimation of this model has three shortcomings: (i) estimated regression coefficients can be less accurate in small samples with heavy censoring; (ii) the baseline hazard is not directly estimated and (iii) a covariance matrix for both the regression coefficients and the baseline hazard is not easily produced.We address these by developing a maximum likelihood (ML) approach to jointly estimate regression coefficients and baseline hazard using a constrained optimisation ensuring the latter''s non-negativity. We demonstrate asymptotic properties of these estimates and show via simulation their increased accuracy compared to PL estimates in small samples and show our method produces smoother baseline hazard estimates than the Breslow estimator.Finally, we apply our method to two examples, including an important real-world financial example to estimate time to default for retail home loans. We demonstrate using our ML estimate for the baseline hazard can give much clearer corroboratory evidence of the ‘humped hazard’, whereby the risk of loan default rises to a peak and then later falls. 相似文献

9.

Conditional mix-GEE models for longitudinal data with unspecified random-effects distributions

Yanchun Xing Lili Xu Zhichuan Zhu 《统计学通讯:理论与方法》2018,47(4):862-876

In the longitudinal studies, the mixture generalized estimation equation (mix-GEE) was proposed to improve the efficiency of the fixed-effects estimator for addressing the working correlation structure misspecification. When the subject-specific effect is one of interests, mixed-effects models were widely used to analyze longitudinal data. However, most of the existing approaches assume a normal distribution for the random effects, and this could affect the efficiency of the fixed-effects estimator. In this article, a conditional mixture generalized estimating equation (cmix-GEE) approach based on the advantage of mix-GEE and conditional quadratic inference function (CQIF) method is developed. The advantage of our new approach is that it does not require the normality assumption for random effects and can accommodate the serial correlation between observations within the same cluster. The feature of our proposed approach is that the estimators of the regression parameters are more efficient than CQIF even if the working correlation structure is not correctly specified. In addition, according to the estimates of some mixture proportions, the true working correlation matrix can be identified. We establish the asymptotic results for the fixed-effects parameter estimators. Simulation studies were conducted to evaluate our proposed method. 相似文献

10.

Variance function in regression analysis of longitudinal data using the generalized estimating equation approach

《Journal of Statistical Computation and Simulation》2012,82(12):2700-2709

Longitudinal or clustered response data arise in many applications such as biostatistics, epidemiology and environmental studies. The repeated responses cannot in general be assumed to be independent. One method of analysing such data is by using the generalized estimating equations (GEE) approach. The current GEE method for estimating regression effects in longitudinal data focuses on the modelling of the working correlation matrix assuming a known variance function. However, correct choice of the correlation structure may not necessarily improve estimation efficiency for the regression parameters if the variance function is misspecified [Wang YG, Lin X. Effects of variance-function misspecification in analysis of longitudinal data. Biometrics. 2005;61:413–421]. In this connection two problems arise: finding a correct variance function and estimating the parameters of the chosen variance function. In this paper, we study the problem of estimating the parameters of the variance function assuming that the form of the variance function is known and then the effect of a misspecified variance function on the estimates of the regression parameters. We propose a GEE approach to estimate the parameters of the variance function. This estimation approach borrows the idea of Davidian and Carroll [Variance function estimation. J Amer Statist Assoc. 1987;82:1079–1091] by solving a nonlinear regression problem where residuals are regarded as the responses and the variance function is regarded as the regression function. A limited simulation study shows that the proposed method performs at least as well as the modified pseudo-likelihood approach developed by Wang and Zhao [A modified pseudolikelihood approach for analysis of longitudinal data. Biometrics. 2007;63:681–689]. Both these methods perform better than the GEE approach. 相似文献

11.

Efficiency of reduced logistic regression models

Shelley B. Bull Celia M.T. Greenwood Allan Donner 《Revue canadienne de statistique》1994,22(3):319-334

One feature of the usual polychotomous logistic regression model for categorical outcomes is that a covariate must be included in all the regression equations. If a covariate is not important in all of them, the procedure will estimate unnecessary parameters. More flexible approaches allow different subsets of covariates in different regressions. One alternative uses individualized regressions which express the polychotomous model as a series of dichotomous models. Another uses a model in which a reduced set of parameters is simultaneously estimated for all the regressions. Large-sample efficiencies of these procedures were compared in a variety of circumstances in which there was a common baseline category for the outcome and the covariates were normally distributed. For a correctly specified model, the reduced estimates were over 100% efficient for nonzero slope parameters and up to 500% efficient when the baseline frequency and the effect of interest were small. The individualized estimates could have efficiencies less than 50% when the effect of interest was large, but were also up to 130% efficient when the baseline frequency was large and the effect of interest was small. Efficiency was usually enhanced by correlation among the covariates. For an underspecified reduced model, asymptotic bias in the reduced estimates was approximately proportional to the magnitude of the omitted parameter and to the reciprocal of the baseline frequency. 相似文献

12.

Statistical inference for heteroscedastic semi-varying coefficient EV models

Fanrong Zhao Weixing Song 《统计学通讯:理论与方法》2018,47(10):2432-2455

This paper proposes an estimation procedure for a class of semi-varying coefficient regression models when the covariates of the linear part are subject to measurement errors. Initial estimates for the regression and varying coefficients are first constructed by the profile least-squares procedure without input from heteroscedasticity, a bias-corrected kernel estimate for the variance function then is proposed, which in turn is used to define re-weighted bias-corrected estimates of the regression and varying coefficients. Large sample properties of the proposed estimates are thoroughly investigated. The finite-sample performance of the proposed estimates is assessed by an extensive simulation study and an application to the Boston housing data set. The simulation results show that the re-weighted bias-corrected estimates outperform the initial estimates and the naive estimates. 相似文献

13.

Optimum percentile estimating equations for nonlinear random coefficient models

《Journal of statistical planning and inference》2001,97(2):275-292

In nonlinear random coefficients models, the means or variances of response variables may not exist. In such cases, commonly used estimation procedures, e.g., (extended) least-squares (LS) and quasi-likelihood methods, are not applicable. This article solves this problem by proposing an estimate based on percentile estimating equations (PEE). This method does not require full distribution assumptions and leads to efficient estimates within the class of unbiased estimating equations. By minimizing the asymptotic variance of the PEE estimates, the optimum percentile estimating equations (OPEE) are derived. Several examples including Weibull regression show the flexibility of the PEE estimates. Under certain regularity conditions, the PEE estimates are shown to be strongly consistent and asymptotic normal, and the OPEE estimates have the minimal asymptotic variance. Compared with the parametric maximum likelihood estimates (MLE), the asymptotic efficiency of the OPEE estimates is more than 98%, while the LS-type of procedures can have infinite variances. When the observations have outliers or do not follow the distributions considered in model assumptions, the article shows that OPEE is more robust than the MLE, and the asymptotic efficiency in the model misspecification cases can be above 150%. 相似文献

14.

Matched case–control data analyses with missing covariates

M. C. Paik & R. L. Sacco 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(1):145-156

We consider methods for analysing matched case–control data when some covariates ( W ) are completely observed but other covariates ( X ) are missing for some subjects. In matched case–control studies, the complete-record analysis discards completely observed subjects if none of their matching cases or controls are completely observed. We investigate an imputation estimate obtained by solving a joint estimating equation for log-odds ratios of disease and parameters in an imputation model. Imputation estimates for coefficients of W are shown to have smaller bias and mean-square error than do estimates from the complete-record analysis. 相似文献

15.

NEW EFFICIENT ESTIMATION AND VARIABLE SELECTION METHODS FOR SEMIPARAMETRIC VARYING-COEFFICIENT PARTIALLY LINEAR MODELS 总被引：1，自引：0，他引：1

Kai B Li R Zou H 《Annals of statistics》2011,39(1):305-332

The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications. In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression procedure. We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate. Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors. In addition, it is shown that the loss in efficiency is at most 11.1% for estimating varying coefficient functions and is no greater than 13.6% for estimating parametric components. To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove that the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. Finally, we apply the new methods to analyze the plasma beta-carotene level data. 相似文献

16.

Modelling a non-stationary BINAR(1) Poisson process

《Journal of Statistical Computation and Simulation》2012,82(15):3106-3126

ABSTRACT

Non-stationarity in bivariate time series of counts may be induced by a number of time-varying covariates affecting the bivariate responses due to which the innovation terms of the individual series as well as the bivariate dependence structure becomes non-stationary. So far, in the existing models, the innovation terms of individual INAR(1) series and the dependence structure are assumed to be constant even though the individual time series are non-stationary. Under this assumption, the reliability of the regression and correlation estimates is questionable. Besides, the existing estimation methodologies such as the conditional maximum likelihood (CMLE) and the composite likelihood estimation are computationally intensive. To address these issues, this paper proposes a BINAR(1) model where the innovation series follow a bivariate Poisson distribution under some non-stationary distributional assumptions. The method of generalized quasi-likelihood (GQL) is used to estimate the regression effects while the serial and bivariate correlations are estimated using a robust moment estimation technique. The application of model and estimation method is made in the simulated data. The GQL method is also compared with the CMLE, generalized method of moments (GMM) and generalized estimating equation (GEE) approaches where through simulation studies, it is shown that GQL yields more efficient estimates than GMM and equally or slightly more efficient estimates than CMLE and GEE. 相似文献

17.

Modified SEE variable selection for varying coefficient instrumental variable models

《Statistical Methodology》2013

We consider the problem of variable selection for a class of varying coefficient models with instrumental variables. We focus on the case that some covariates are endogenous variables, and some auxiliary instrumental variables are available. An instrumental variable based variable selection procedure is proposed by using modified smooth-threshold estimating equations (SEEs). The proposed procedure can automatically eliminate the irrelevant covariates by setting the corresponding coefficient functions as zero, and simultaneously estimate the nonzero regression coefficients by solving the smooth-threshold estimating equations. The proposed variable selection procedure avoids the convex optimization problem, and is flexible and easy to implement. Simulation studies are carried out to assess the performance of the proposed variable selection method. 相似文献

18.

Measuring and Estimating Treatment Effect on Count Outcome in Randomized Trial and Observational Studies

Li Yin 《统计学通讯:理论与方法》2013,42(5):1080-1095

When estimating treatment effect on count outcome of given population, one uses different models in different studies, resulting in non-comparable measures of treatment effect. Here we show that the marginal rate differences in these studies are comparable measures of treatment effect. We estimate the marginal rate differences by log-linear models and show that their finite-sample maximum-likelihood estimates are unbiased and highly robust with respect to effects of dispersing covariates on outcome. We get approximate finite-sample distributions of these estimates by using the asymptotic normal distribution of estimates of the log-linear model parameters. This method can be easily applied to practice. 相似文献

19.

Empirical likelihood weighted composite quantile regression with partially missing covariates

Jing Sun Yunyan Ma 《Journal of nonparametric statistics》2017,29(1):137-150

This paper develops a novel weighted composite quantile regression (CQR) method for estimation of a linear model when some covariates are missing at random and the probability for missingness mechanism can be modelled parametrically. By incorporating the unbiased estimating equations of incomplete data into empirical likelihood (EL), we obtain the EL-based weights, and then re-adjust the inverse probability weighted CQR for estimating the vector of regression coefficients. Theoretical results show that the proposed method can achieve semiparametric efficiency if the selection probability function is correctly specified, therefore the EL weighted CQR is more efficient than the inverse probability weighted CQR. Besides, our algorithm is computationally simple and easy to implement. Simulation studies are conducted to examine the finite sample performance of the proposed procedures. Finally, we apply the new method to analyse the US news College data. 相似文献

20.

Covariate-adjusted cost–effectiveness ratios

《Journal of statistical planning and inference》1999,75(2):291-304

We describe a method for estimating the marginal cost–effectiveness ratio (CER) of two competing treatments or intervention strategies after adjusting for covariates that may influence the primary endpoint of survival. A Cox regression model is used for modeling covariates and estimates of both the cost and effectiveness parameters, which depend on the survival curve, are obtained from the estimated survival functions for each treatment at a specified covariate. Confidence intervals for the covariate-adjusted CER are presented. 相似文献