Analysis of massive datasets is challenging owing to limitations of computer primary memory. Composite quantile regression (CQR) is a robust and efficient estimation method. In this paper, we extend CQR to massive datasets and propose a divide-and-conquer CQR method. The basic idea is to split the entire dataset into several blocks, applying the CQR method for data in each block, and finally combining these regression results via weighted average. The proposed approach significantly reduces the required amount of primary memory, and the resulting estimate will be as efficient as if the entire data set is analysed simultaneously. Moreover, to improve the efficiency of CQR, we propose a weighted CQR estimation approach. To achieve sparsity with high-dimensional covariates, we develop a variable selection procedure to select significant parametric components and prove the method possessing the oracle property. Both simulations and data analysis are conducted to illustrate the finite sample performance of the proposed methods. 相似文献
Truncation is a known feature of bone marrow transplant (BMT) registry data, for which the survival time of a leukemia patient is left truncated by the waiting time to transplant. It was recently noted that a longer waiting time was linked to poorer survival. A straightforward solution is a Cox model on the survival time with the waiting time as both truncation variable and covariate. The Cox model should also include other recognized risk factors as covariates. In this article, we focus on estimating the distribution function of waiting time and the probability of selection under the aforementioned Cox model. 相似文献
For survival endpoints in subgroup selection, a score conversion model is often used to convert the set of biomarkers for each patient into a univariate score and using the median of the univariate scores to divide the patients into biomarker‐positive and biomarker‐negative subgroups. However, this may lead to bias in patient subgroup identification regarding the 2 issues: (1) treatment is equally effective for all patients and/or there is no subgroup difference; (2) the median value of the univariate scores as a cutoff may be inappropriate if the sizes of the 2 subgroups are differ substantially. We utilize a univariate composite score method to convert the set of patient's candidate biomarkers to a univariate response score. We propose applying the likelihood ratio test (LRT) to assess homogeneity of the sampled patients to address the first issue. In the context of identification of the subgroup of responders in adaptive design to demonstrate improvement of treatment efficacy (adaptive power), we suggest that subgroup selection is carried out if the LRT is significant. For the second issue, we utilize a likelihood‐based change‐point algorithm to find an optimal cutoff. Our simulation study shows that type I error generally is controlled, while the overall adaptive power to detect treatment effects sacrifices approximately 4.5% for the simulation designs considered by performing the LRT; furthermore, the change‐point algorithm outperforms the median cutoff considerably when the subgroup sizes differ substantially. 相似文献
This paper discusses regression analysis of panel count data with dependent observation and dropout processes. For the problem, a general mean model is presented that can allow both additive and multiplicative effects of covariates on the underlying point process. In addition, the proportional rates model and the accelerated failure time model are employed to describe possible covariate effects on the observation process and the dropout or follow‐up process, respectively. For estimation of regression parameters, some estimating equation‐based procedures are developed and the asymptotic properties of the proposed estimators are established. In addition, a resampling approach is proposed for estimating a covariance matrix of the proposed estimator and a model checking procedure is also provided. Results from an extensive simulation study indicate that the proposed methodology works well for practical situations, and it is applied to a motivating set of real data. 相似文献
Rank aggregation aims at combining rankings of a set of items assigned by a sample of rankers to generate a consensus ranking. A typical solution is to adopt a distance-based approach to minimize the sum of the distances to the observed rankings. However, this simple sum may not be appropriate when the quality of rankers varies. This happens when rankers with different backgrounds may have different cognitive levels of examining the items. In this paper, we develop a new distance-based model by allowing different weights for different rankers. Under this model, the weight associated with a ranker is used to measure his/her cognitive level of ranking of the items, and these weights are unobserved and exponentially distributed. Maximum likelihood method is used for model estimation. Extensions to the cases of incomplete rankings and mixture modeling are also discussed. Empirical applications demonstrate that the proposed model produces better rank aggregation than those generated by Borda and the unweighted distance-based models.