期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Variable selection for recurrent event data via nonconcave penalized estimating function

Xingwei Tong Liang Zhu Jianguo Sun 《Lifetime data analysis》2009,15(2):197-215

Variable selection is an important issue in all regression analysis and in this paper, we discuss this in the context of regression analysis of recurrent event data. Recurrent event data often occur in long-term studies in which individuals may experience the events of interest more than once and their analysis has recently attracted a great deal of attention (Andersen et al., Statistical models based on counting processes, 1993; Cook and Lawless, Biometrics 52:1311–1323, 1996, The analysis of recurrent event data, 2007; Cook et al., Biometrics 52:557–571, 1996; Lawless and Nadeau, Technometrics 37:158-168, 1995; Lin et al., J R Stat Soc B 69:711–730, 2000). However, it seems that there are no established approaches to the variable selection with respect to recurrent event data. For the problem, we adopt the idea behind the nonconcave penalized likelihood approach proposed in Fan and Li (J Am Stat Assoc 96:1348–1360, 2001) and develop a nonconcave penalized estimating function approach. The proposed approach selects variables and estimates regression coefficients simultaneously and an algorithm is presented for this process. We show that the proposed approach performs as well as the oracle procedure in that it yields the estimates as if the correct submodel was known. Simulation studies are conducted for assessing the performance of the proposed approach and suggest that it works well for practical situations. The proposed methodology is illustrated by using the data from a chronic granulomatous disease study. 相似文献

2.

Penalized empirical likelihood inference for sparse additive hazards regression with a diverging number of covariates

Shanshan Wang Liming Xiang 《Statistics and Computing》2017,27(5):1347-1364

High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by applications in high-throughput genomic data analysis. In this paper, we propose a class of regularization methods, integrating both the penalized empirical likelihood and pseudoscore approaches, for variable selection and estimation in sparse and high-dimensional additive hazards regression models. When the number of covariates grows with the sample size, we establish asymptotic properties of the resulting estimator and the oracle property of the proposed method. It is shown that the proposed estimator is more efficient than that obtained from the non-concave penalized likelihood approach in the literature. Based on a penalized empirical likelihood ratio statistic, we further develop a nonparametric likelihood approach for testing the linear hypothesis of regression coefficients and constructing confidence regions consequently. Simulation studies are carried out to evaluate the performance of the proposed methodology and also two real data sets are analyzed. 相似文献

3.

Variable selection and estimation for multivariate panel count data via the seamless‐${\it L}_{{\rm 0}}$ penalty

Haixiang Zhang Jianguo Sun Dehui Wang 《Revue canadienne de statistique》2013,41(2):368-385

This paper considers regression analysis of multivariate panel count data with the focus on variable selection and estimation of significant covariate effects. For the problem, we adopt the penalized estimating equation approach with a focus on the use of the seamless‐$L_0$ penalty. The proposed approach selects variables and estimates regression coefficients simultaneously and the asymptotic properties of the resulting estimates are established. The procedure can be easily carried out with the Newton–Raphson algorithm and is evaluated by simulation studies. Also it is applied to a motivating data set arising from a skin cancer study. The Canadian Journal of Statistics 41: 368–385; 2013 © 2013 Statistical Society of Canada 相似文献

4.

Semiparametric analysis of panel count data with correlated observation and follow-up times

Xin He Xingwei Tong Jianguo Sun 《Lifetime data analysis》2009,15(2):177-196

This paper discusses regression analysis of panel count data that often arise in longitudinal studies concerning occurrence rates of certain recurrent events. Panel count data mean that each study subject is observed only at discrete time points rather than under continuous observation. Furthermore, both observation and follow-up times can vary from subject to subject and may be correlated with the recurrent events. For inference, we propose some shared frailty models and estimating equations are developed for estimation of regression parameters. The proposed estimates are consistent and have asymptotically a normal distribution. The finite sample properties of the proposed estimates are investigated through simulation and an illustrative example from a cancer study is provided. 相似文献

5.

Semiparametric transformation models for multivariate panel count data with dependent observation process

Li N Park DH Sun J Kim K 《Revue canadienne de statistique》2011,39(3):458-474

This article discusses regression analysis of multivariate panel count data in which the observation process may contain relevant information about or be related to the underlying recurrent event processes of interest. Such data occur if a recurrent event study involves several related types of recurrent events and the observation scheme or process may be subject-specific. For the problem, a class of semiparametric transformation models is presented, which provides a great flexibility for modelling the effects of covariates on the recurrent event processes. For estimation of regression parameters, an estimating equation-based inference procedure is developed and the asymptotic properties of the resulting estimates are established. Also the proposed approach is evaluated by simulation studies and applied to the data arising from a skin cancer chemoprevention trial. 相似文献

6.

Bayesian Approach in Nonparametric Count Regression with Binomial Kernel

Nabil Zougab Smail Adjabi Célestin C. Kokonendji 《统计学通讯:模拟与计算》2013,42(5):1052-1063

Recently, Kokonendji et al. have adapted the well-known Nadaraya–Watson kernel estimator for estimating the count function m in the context of nonparametric discrete regression. The authors have also investigated the bandwidth selection using the cross-validation method. In this article, we propose a Bayesian approach in the context of nonparametric count regression for estimating the bandwidth and the variance of the model error, which has not been estimated in Kokonendji et al. The model error is considered as Gaussian with mean of zero and a variance of σ². The Bayes estimates cannot be obtained in closed form and then, we use the well-known Markov chain Monte Carlo (MCMC) technique to compute the Bayes estimates under the squared errors loss function. The performance of this proposed approach and the cross-validation method are compared through simulation and real count data. 相似文献

7.

Efficient regression modeling for correlated and overdispersed count data

《统计学通讯:理论与方法》2012,41(24):6005-6018

Abstract

The objective of this paper is to propose an efficient estimation procedure in a marginal mean regression model for longitudinal count data and to develop a hypothesis test for detecting the presence of overdispersion. We extend the matrix expansion idea of quadratic inference functions to the negative binomial regression framework that entails accommodating both the within-subject correlation and overdispersion issue. Theoretical and numerical results show that the proposed procedure yields a more efficient estimator asymptotically than the one ignoring either the within-subject correlation or overdispersion. When the overdispersion is absent in data, the proposed method might hinder the estimation efficiency in practice, yet the Poisson regression based regression model is fitted to the data sufficiently well. Therefore, we construct the hypothesis test that recommends an appropriate model for the analysis of the correlated count data. Extensive simulation studies indicate that the proposed test can identify the effective model consistently. The proposed procedure is also applied to a transportation safety study and recommends the proposed negative binomial regression model. 相似文献

8.

A two-stage procedure to pool information across quantile levels in linear quantile regression

Anthony Kuk 《Journal of Statistical Computation and Simulation》2018,88(14):2852-2864

In linear quantile regression, the regression coefficients for different quantiles are typically estimated separately. Efforts to improve the efficiency of estimators are often based on assumptions of commonality among the slope coefficients. We propose instead a two-stage procedure whereby the regression coefficients are first estimated separately and then smoothed over quantile level. Due to the strong correlation between coefficient estimates at nearby quantile levels, existing bandwidth selectors will pick bandwidths that are too small. To remedy this, we use 10-fold cross-validation to determine a common bandwidth inflation factor for smoothing the intercept as well as slope estimates. Simulation results suggest that the proposed method is effective in pooling information across quantile levels, resulting in estimates that are typically more efficient than the separately obtained estimates and the interquantile shrinkage estimates derived using a fused penalty function. The usefulness of the proposed method is demonstrated in a real data example. 相似文献

9.

Nonparametric inference for panel count data with competing risks

E. P. Sreedevi P. G. Sankaran 《Journal of applied statistics》2021,48(16):3102

In survival and reliability studies, panel count data arise when we investigate a recurrent event process and each study subject is observed only at discrete time points. If recurrent events of several types are possible, we obtain panel count data with competing risks. Such data arise frequently from transversal studies on recurrent events in demography, epidemiology and reliability experiments where the individuals cannot be observed continuously. In the present paper, we propose an isotonic regression estimator for the cause specific mean function of the underlying recurrent event process of a competing risks panel count data. Further, a nonparametric test is proposed to compare the cause specific mean functions of the panel count competing risks data. Asymptotic properties of the proposed estimator and test statistic are studied. A simulation study is conducted to assess the finite sample behaviour of the proposed estimator and test statistic. Finally, the procedures developed are applied to a real data arising from skin cancer chemo prevention trial. 相似文献

10.

ON SELECTING VARIABLES AND ASSESSING THEIR PERFORMANCE IN LINEAR DISCRIMINANT ANALYSIS

S. Ganeshanandam W.J. Krzanowski 《Australian & New Zealand Journal of Statistics》1989,31(3):433-447

Linear discriminant analysis between two populations is considered in this paper. Error rate is reviewed as a criterion for selection of variables, and a stepwise procedure is outlined that selects variables on the basis of empirical estimates of error. Problems with assessment of the selected variables are highlighted. A leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself. Monte Carlo simulations, of multivariate binary as well as multivariate normal data, demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods. 相似文献

11.

Accounting for Uncertainty in Heteroscedasticity in Nonlinear Regression

Lim C Sen PK Peddada SD 《Journal of statistical planning and inference》2012,142(5):1047-1062

Toxicologists and pharmacologists often describe toxicity of a chemical using parameters of a nonlinear regression model. Thus estimation of parameters of a nonlinear regression model is an important problem. The estimates of the parameters and their uncertainty estimates depend upon the underlying error variance structure in the model. Typically, a priori the researcher would not know if the error variances are homoscedastic (i.e., constant across dose) or if they are heteroscedastic (i.e., the variance is a function of dose). Motivated by this concern, in this paper we introduce an estimation procedure based on preliminary test which selects an appropriate estimation procedure accounting for the underlying error variance structure. Since outliers and influential observations are common in toxicological data, the proposed methodology uses M-estimators. The asymptotic properties of the preliminary test estimator are investigated; in particular its asymptotic covariance matrix is derived. The performance of the proposed estimator is compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using a data set obtained from the National Toxicology Program. 相似文献

12.

Random Effects Modeling and the Zero-Inflated Poisson Distribution

Anthea Monod 《统计学通讯:理论与方法》2014,43(4):664-680

Overdispersion due to a large proportion of zero observations in data sets is a common occurrence in many applications of many fields of research; we consider such scenarios in count panel (longitudinal) data. A well-known and widely implemented technique for handling such data is that of random effects modeling, which addresses the serial correlation inherent in panel data, as well as overdispersion. To deal with the excess zeros, a zero-inflated Poisson distribution has come to be canonical, which relaxes the equal mean-variance specification of a traditional Poisson model and allows for the larger variance characteristic of overdispersed data. A natural proposal then to approach count panel data with overdispersion due to excess zeros is to combine these two methodologies, deriving a likelihood from the resulting conditional probability. In performing simulation studies, we find that this approach in fact poses problems of identifiability. In this article, we construct and explain in full detail why a model obtained from the marriage of two classical and well-established techniques is unidentifiable and provide results of simulation studies demonstrating this effect. A discussion on alternative methodologies to resolve the problem is provided in the conclusion. 相似文献

13.

Semiparametric partially linear varying coefficient models with panel count data

Xin He Xuenan Feng Xingwei Tong Xingqiu Zhao 《Lifetime data analysis》2017,23(3):439-466

This paper studies semiparametric regression analysis of panel count data, which arise naturally when recurrent events are considered. Such data frequently occur in medical follow-up studies and reliability experiments, for example. To explore the nonlinear interactions between covariates, we propose a class of partially linear models with possibly varying coefficients for the mean function of the counting processes with panel count data. The functional coefficients are estimated by B-spline function approximations. The estimation procedures are based on maximum pseudo-likelihood and likelihood approaches and they are easy to implement. The asymptotic properties of the resulting estimators are established, and their finite-sample performance is assessed by Monte Carlo simulation studies. We also demonstrate the value of the proposed method by the analysis of a cancer data set, where the new modeling approach provides more comprehensive information than the usual proportional mean model. 相似文献

14.

An additive–multiplicative mean model for panel count data with dependent observation and dropout processes

Guanglei Yu Yang Li Liang Zhu Hui Zhao Jianguo Sun Leslie L. Robison 《Scandinavian Journal of Statistics》2019,46(2):414-431

This paper discusses regression analysis of panel count data with dependent observation and dropout processes. For the problem, a general mean model is presented that can allow both additive and multiplicative effects of covariates on the underlying point process. In addition, the proportional rates model and the accelerated failure time model are employed to describe possible covariate effects on the observation process and the dropout or follow‐up process, respectively. For estimation of regression parameters, some estimating equation‐based procedures are developed and the asymptotic properties of the proposed estimators are established. In addition, a resampling approach is proposed for estimating a covariance matrix of the proposed estimator and a model checking procedure is also provided. Results from an extensive simulation study indicate that the proposed methodology works well for practical situations, and it is applied to a motivating set of real data. 相似文献

15.

Estimation and variable selection for generalised partially linear single-index models

Peng Lai Ye Tian 《Journal of nonparametric statistics》2014,26(1):171-185

In this paper, we study the problem of estimation and variable selection for generalised partially linear single-index models based on quasi-likelihood, extending existing studies on variable selection for partially linear single-index models to binary and count responses. To take into account the unit norm constraint of the index parameter, we use the ‘delete-one-component’ approach. The asymptotic normality of the estimates is demonstrated. Furthermore, the smoothly clipped absolute deviation penalty is added for variable selection of parameters both in the nonparametric part and the parametric part, and the oracle property of the variable selection procedure is shown. Finally, some simulation studies are carried out to illustrate the finite sample performance. 相似文献

16.

A nonparametric time-varying coefficient model for panel count data

Huadong Zhao Wanzhu Tu 《Journal of nonparametric statistics》2018,30(3):640-661

In this research, we describe a nonparametric time-varying coefficient model for the analysis of panel count data. We extend the traditional panel count data models by incorporating B-splines estimates of time-varying coefficients. We show that the proposed model can be implemented using a nonparametric maximum pseudo-likelihood method. We further examine the theoretical properties of the estimators of model parameters. The operational characteristics of the proposed method are evaluated through a simulation study. For illustration, we analyse data from a study of childhood wheezing, and describe the time-varying effect of an inflammatory marker on the risk of wheezing. 相似文献

17.

Variable selection in gamma regression models via artificial bee colony algorithm

Emre Dunder Mehmet Ali Cengiz 《Journal of applied statistics》2018,45(1):8-16

Variable selection is an important task in regression analysis. Performance of the statistical model highly depends on the determination of the subset of predictors. There are several methods to select most relevant variables to construct a good model. However in practice, the dependent variable may have positive continuous values and not normally distributed. In such situations, gamma distribution is more suitable than normal for building a regression model. This paper introduces an heuristic approach to perform variable selection using artificial bee colony optimization for gamma regression models. We evaluated the proposed method against with classical selection methods such as backward and stepwise. Both simulation studies and real data set examples proved the accuracy of our selection procedure. 相似文献

18.

GEE-based zero-inflated generalized Poisson model for clustered over or under-dispersed count data

Fatemeh Sarvi Hossein Mahjub 《Journal of Statistical Computation and Simulation》2019,89(14):2711-2732

The zero-inflated regression models such as zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB) or zero-inflated generalized Poisson (ZIGP) regression models can model the count data with excess zeros. The ZINB model can handle over-dispersed and the ZIGP model can handle the over or under-dispersed count data with excess zeros as well. Moreover, the count data may be correlated because of data collection procedure or special study design. The clustered sampling approach is one of the examples in which the correlation among subjects could be defined. In such situations, a marginal model using generalized estimating equation (GEE) approach can incorporate these correlations and lead up to the relationships at the population level. In this study, the GEE-based zero-inflated generalized Poisson regression model was proposed to fit over and under-dispersed clustered count data with excess zeros. 相似文献

19.

Non-iterative Estimation and Variable Selection in the Single-index Quantile Regression Model

C. N. Kuruwita 《统计学通讯:模拟与计算》2016,45(10):3615-3628

A new estimation procedure is proposed for the single-index quantile regression model. Compared to existing work, this approach is non-iterative and hence, computationally efficient. The proposed method not only estimates the index parameter and the link function but also selects variables simultaneously. The performance of the variable selection is enhanced by a fully adaptive penalty function motivated by the sliced inverse regression technique. Finite sample performance is studied through a simulation study that compares the proposed method with existing work under several criteria. A data analysis is given that highlights the usefulness of the proposed methodology. 相似文献

20.

Variable selection of the quantile varying coefficient regression models

Weihua Zhao Riquan Zhang Yazhao Lv Jicai Liu 《Journal of the Korean Statistical Society》2013,42(3):343-358

As a useful supplement to mean regression, quantile regression is a completely distribution-free approach and is more robust to heavy-tailed random errors. In this paper, a variable selection procedure for quantile varying coefficient models is proposed by combining local polynomial smoothing with adaptive group LASSO. With an appropriate selection of tuning parameters by the BIC criterion, the theoretical properties of the new procedure, including consistency in variable selection and the oracle property in estimation, are established. The finite sample performance of the newly proposed method is investigated through simulation studies and the analysis of Boston house price data. Numerical studies confirm that the newly proposed procedure (QKLASSO) has both robustness and efficiency for varying coefficient models irrespective of error distribution, which is a good alternative and necessary supplement to the KLASSO method. 相似文献