期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A real survival analysis application via variable selection methods for Cox's proportional hazards model

Emmanouil Androulakis Kalliopi Mylona Filia Vonta 《Journal of applied statistics》2010,37(8):1399-1406

Variable selection is fundamental to high-dimensional statistical modeling in diverse fields of sciences. In our health study, different statistical methods are applied to analyze trauma annual data, collected by 30 General Hospitals in Greece. The dataset consists of 6334 observations and 111 factors that include demographic, transport, and clinical data. The statistical methods employed in this work are the nonconcave penalized likelihood methods, Smoothly Clipped Absolute Deviation, Least Absolute Shrinkage and Selection Operator, and Hard, the maximum partial likelihood estimation method, and the best subset variable selection, adjusted to Cox's proportional hazards model and used to detect possible risk factors, which affect the length of stay in a hospital. A variety of different statistical models are considered, with respect to the combinations of factors while censored observations are present. A comparative survey reveals several differences between results and execution times of each method. Finally, we provide useful biological justification of our results. 相似文献

2.

On improving convergence rate of Bernstein polynomial density estimator

Gaku Igarashi Yoshihide Kakizawa 《Journal of nonparametric statistics》2014,26(1):61-84

This paper is concerned with the Bernstein estimator [Vitale, R.A. (1975), ‘A Bernstein Polynomial Approach to Density Function Estimation’, in Statistical Inference and Related Topics, ed. M.L. Puri, 2, New York: Academic Press, pp. 87–99] to estimate a density with support [0, 1]. One of the major contributions of this paper is an application of a multiplicative bias correction [Terrell, G.R., and Scott, D.W. (1980), ‘On Improving Convergence Rates for Nonnegative Kernel Density Estimators’, The Annals of Statistics, 8, 1160–1163], which was originally developed for the standard kernel estimator. Moreover, the renormalised multiplicative bias corrected Bernstein estimator is studied rigorously. The mean squared error (MSE) in the interior and mean integrated squared error of the resulting bias corrected Bernstein estimators as well as the additive bias corrected Bernstein estimator [Leblanc, A. (2010), ‘A Bias-reduced Approach to Density Estimation Using Bernstein Polynomials’, Journal of Nonparametric Statistics, 22, 459–475] are shown to be O(n^?8/9) when the underlying density has a fourth-order derivative, where n is the sample size. The condition under which the MSE near the boundary is O(n^?8/9) is also discussed. Finally, numerical studies based on both simulated and real data sets are presented. 相似文献

3.

Model selection with distributed SCAD penalty

Puyu Wang Yong Liang 《Journal of applied statistics》2018,45(11):1938-1955

In this paper, we focus on the feature extraction and variable selection of massive data which is divided and stored in different linked computers. Specifically, we study the distributed model selection with the Smoothly Clipped Absolute Deviation (SCAD) penalty. Based on the Alternating Direction Method of Multipliers (ADMM) algorithm, we propose distributed SCAD algorithm and prove its convergence. The results of variable selection of the distributed approach are same with the results of the non-distributed approach. Numerical studies show that our method is both effective and efficient which performs well in distributed data analysis. 相似文献

4.

A note on score statistic for grouped data

Hyo-Il Park 《Journal of the Korean Statistical Society》2009,38(4):331-337

In this paper, we consider modifying the score statistic proposed by Prentice and Gloeckler [Prentice, R. L., & Gloeckler, L. A. (1978). Regression analysis of grouped data with applications to breast cancer data. Biometrics, 34, 57–67] for the grouped data under the proportional hazards model. For this matter, we apply the likelihood method and derive the scores without re-parameterization as a discrete model. Then we illustrate the test with an example and compare the efficiency with the test of Prentice and Gloeckler’s statistic by obtaining empirical powers through simulation study. Also we discuss some possible extension and estimated variances of the score statistic as concluding remarks. 相似文献

5.

Asymptotic normality of locally modelled regression estimator for functional data

Zhiyong Zhou Zhengyan Lin 《Journal of nonparametric statistics》2016,28(1):116-131

We focus on the nonparametric regression of a scalar response on a functional explanatory variable. As an alternative to the well-known Nadaraya-Watson estimator for regression function in this framework, the locally modelled regression estimator performs very well [cf. [Barrientos-Marin, J., Ferraty, F., and Vieu, P. (2010), ‘Locally Modelled Regression and Functional Data’, Journal of Nonparametric Statistics, 22, 617–632]. In this paper, the asymptotic properties of locally modelled regression estimator for functional data are considered. The mean-squared convergence as well as asymptotic normality for the estimator are established. We also adapt the empirical likelihood method to construct the point-wise confidence intervals for the regression function and derive the Wilk's phenomenon for the empirical likelihood inference. Furthermore, a simulation study is presented to illustrate our theoretical results. 相似文献

6.

Modern variable selection for longitudinal semi-parametric models with missing data

J. Kowalski S. Hao T. Chen Y. Liang J. Liu L. Ge 《Journal of applied statistics》2018,45(14):2548-2562

Penalized methods for variable selection such as the Smoothly Clipped Absolute Deviation penalty have been increasingly applied to aid variable section in regression analysis. Much of the literature has focused on parametric models, while a few recent studies have shifted the focus and developed their applications for the popular semi-parametric, or distribution-free, generalized estimating equations (GEEs) and weighted GEE (WGEE). However, although the WGEE is composed of one main and one missing-data module, available methods only focus on the main module, with no variable selection for the missing-data module. In this paper, we develop a new approach to further extend the existing methods to enable variable selection for both modules. The approach is illustrated by both real and simulated study data. 相似文献

7.

Adaptive nonparametric estimation in the presence of dependence

Nicolas Asin 《Journal of nonparametric statistics》2017,29(4):694-730

We consider nonparametric estimation problems in the presence of dependent data, notably nonparametric regression with random design and nonparametric density estimation. The proposed estimation procedure is based on a dimension reduction. The minimax optimal rate of convergence of the estimator is derived assuming a sufficiently weak dependence characterised by fast decreasing mixing coefficients. We illustrate these results by considering classical smoothness assumptions. However, the proposed estimator requires an optimal choice of a dimension parameter depending on certain characteristics of the function of interest, which are not known in practice. The main issue addressed in our work is an adaptive choice of this dimension parameter combining model selection and Lepski's method. It is inspired by the recent work of Goldenshluger and Lepski [(2011), ‘Bandwidth Selection in Kernel Density Estimation: Oracle Inequalities and Adaptive Minimax Optimality’, The Annals of Statistics, 39, 1608–1632]. We show that this data-driven estimator can attain the lower risk bound up to a constant provided a fast decay of the mixing coefficients. 相似文献

8.

Estimating cross quantile residual ratio with left-truncated semi-competing risks data

Jing Yang Limin Peng 《Lifetime data analysis》2018,24(4):652-674

A semi-competing risks setting often arises in biomedical studies, involving both a nonterminal event and a terminal event. Cross quantile residual ratio (Yang and Peng in Biometrics 72:770–779, 2016) offers a flexible and robust perspective to study the dependency between the nonterminal and the terminal events which can shed useful scientific insight. In this paper, we propose a new nonparametric estimator of this dependence measure with left truncated semi-competing risks data. The new estimator overcomes the limitation of the existing estimator that is resulted from demanding a strong assumption on the truncation mechanism. We establish the asymptotic properties of the proposed estimator and develop inference procedures accordingly. Simulation studies suggest good finite-sample performance of the proposed method. Our proposal is illustrated via an application to Denmark diabetes registry data. 相似文献

9.

Asymptotics of bivariate penalised splines

Luo Xiao 《Journal of nonparametric statistics》2019,31(2):289-314

We study the class of bivariate penalised splines that use tensor product splines and a smoothness penalty. Similar to Claeskens, G., Krivobokova, T., and Opsomer, J.D. [(2009), ‘Asymptotic Properties of Penalised Spline Estimators’, Biometrika, 96(3), 529–544] for the univariate penalised splines, we show that, depending on the number of knots and penalty, the global asymptotic convergence rate of bivariate penalised splines is either similar to that of tensor product regression splines or to that of thin plate splines. In each scenario, the bivariate penalised splines are found rate optimal in the sense of Stone, C.J. [(12, 1982), ‘Optimal Global Rates of Convergence for Nonparametric Regression’, The Annals of Statistics, 10(4), 1040–1053] for a corresponding class of functions with appropriate smoothness. For the scenario where a small number of knots is used, we obtain expressions for the local asymptotic bias and variance and derive the point-wise and uniform asymptotic normality. The theoretical results are applicable to tensor product regression splines. 相似文献

10.

Improved local polynomial estimation in time series regression

Juliane Geller 《Journal of nonparametric statistics》2018,30(1):1-27

We propose a modification of local polynomial estimation which improves the efficiency of the conventional method when the observation errors are correlated. The procedure is based on a pre-transformation of the data as a generalization of the pre-whitening procedure introduced by Xiao et al. [(2003), ‘More Efficient Local Polynomial Estimation in Nonparametric Regression with Autocorrelated Errors’, Journal of the American Statistical Association, 98, 980–992]. While these authors assumed a linear process representation for the error process, we avoid any structural assumption. We further allow the regressors and the errors to be dependent. More importantly, we show that the inclusion of both leading and lagged variables in the approximation of the error terms outperforms the best approximation based on lagged variables only. Establishing its asymptotic distribution, we show that the proposed estimator is more efficient than the standard local polynomial estimator. As a by-product we prove a suitable version of a central limit theorem which allows us to improve the asymptotic normality result for local polynomial estimators by Masry and Fan [(1997), ‘Local Polynomial Estimation of Regression Functions for Mixing Processes’, Scandinavian Journal of Statistics, 24, 165–179]. A simulation study confirms the efficiency of our estimator on finite samples. An application to climate data also shows that our new method leads to an estimator with decreased variability. 相似文献

11.

Sure independence screening for analyzing supersaturated designs

K. Drosou 《统计学通讯:模拟与计算》2013,42(7):1979-1995

ABSTRACT

Supersaturated designs (SSDs) constitute a large class of fractional factorial designs which can be used for screening out the important factors from a large set of potentially active ones. A major advantage of these designs is that they reduce the experimental cost dramatically, but their crucial disadvantage is the confounding involved in the statistical analysis. Identification of active effects in SSDs has been the subject of much recent study. In this article we present a two-stage procedure for analyzing two-level SSDs assuming a main-effect only model, without including any interaction terms. This method combines sure independence screening (SIS) with different penalty functions; such as Smoothly Clipped Absolute Deviation (SCAD), Lasso and MC penalty achieving both the down-selection and the estimation of the significant effects, simultaneously. Insights on using the proposed methodology are provided through various simulation scenarios and several comparisons with existing approaches, such as stepwise in combination with SCAD and Dantzig Selector (DS) are presented as well. Results of the numerical study and real data analysis reveal that the proposed procedure can be considered as an advantageous tool due to its extremely good performance for identifying active factors. 相似文献

12.

Bayesian bandwidth estimation for a functional nonparametric regression model with mixed types of regressors and unknown error density

Han Lin Shang 《Journal of nonparametric statistics》2014,26(3):599-615

We investigate the issue of bandwidth estimation in a functional nonparametric regression model with function-valued, continuous real-valued and discrete-valued regressors under the framework of unknown error density. Extending from the recent work of Shang (2013 Shang, H.L. (2013), ‘Bayesian Bandwidth Estimation for a Nonparametric Functional Regression Model with Unknown Error Density’, Computational Statistics &; Data Analysis, 67, 185–198. doi: 10.1016/j.csda.2013.05.006[Crossref], [Web of Science ®] , [Google Scholar]) [‘Bayesian Bandwidth Estimation for a Nonparametric Functional Regression Model with Unknown Error Density’, Computational Statistics &; Data Analysis, 67, 185–198], we approximate the unknown error density by a kernel density estimator of residuals, where the regression function is estimated by the functional Nadaraya–Watson estimator that admits mixed types of regressors. We derive a likelihood and posterior density for the bandwidth parameters under the kernel-form error density, and put forward a Bayesian bandwidth estimation approach that can simultaneously estimate the bandwidths. Simulation studies demonstrated the estimation accuracy of the regression function and error density for the proposed Bayesian approach. Illustrated by a spectroscopy data set in the food quality control, we applied the proposed Bayesian approach to select the optimal bandwidths in a functional nonparametric regression model with mixed types of regressors. 相似文献

13.

Mixture cure rate models with accelerated failures and nonparametric form of covariate effects

Tianlei Chen 《Journal of nonparametric statistics》2018,30(1):216-237

Two-component mixture cure rate model is popular in cure rate data analysis with the proportional hazards and accelerated failure time (AFT) models being the major competitors for modelling the latency component. [Wang, L., Du, P., and Liang, H. (2012), ‘Two-Component Mixture Cure Rate Model with Spline Estimated Nonparametric Components’, Biometrics, 68, 726–735] first proposed a nonparametric mixture cure rate model where the latency component assumes proportional hazards with nonparametric covariate effects in the relative risk. Here we consider a mixture cure rate model where the latency component assumes AFTs with nonparametric covariate effects in the acceleration factor. Besides the more direct physical interpretation than the proportional hazards, our model has an additional scalar parameter which adds more complication to the computational algorithm as well as the asymptotic theory. We develop a penalised EM algorithm for estimation together with confidence intervals derived from the Louis formula. Asymptotic convergence rates of the parameter estimates are established. Simulations and the application to a melanoma study shows the advantages of our new method. 相似文献

14.

Estimation of multivariate conditional-tail-expectation using Kendall's process

Elena Di Bernardino Clémentine Prieur 《Journal of nonparametric statistics》2014,26(2):241-267

This paper deals with the problem of estimating the multivariate version of the Conditional-Tail-Expectation, proposed by Di Bernardino et al. [(2013), ‘Plug-in Estimation of Level Sets in a Non-Compact Setting with Applications in Multivariable Risk Theory’, ESAIM: Probability and Statistics, (17), 236–256]. We propose a new nonparametric estimator for this multivariate risk-measure, which is essentially based on Kendall's process [Genest and Rivest, (1993), ‘Statistical Inference Procedures for Bivariate Archimedean Copulas’, Journal of American Statistical Association, 88(423), 1034–1043]. Using the central limit theorem for Kendall's process, proved by Barbe et al. [(1996), ‘On Kendall's Process’, Journal of Multivariate Analysis, 58(2), 197–229], we provide a functional central limit theorem for our estimator. We illustrate the practical properties of our nonparametric estimator on simulations and on two real test cases. We also propose a comparison study with the level sets-based estimator introduced in Di Bernardino et al. [(2013), ‘Plug-In Estimation of Level Sets in A Non-Compact Setting with Applications in Multivariable Risk Theory’, ESAIM: Probability and Statistics, (17), 236–256] and with (semi-)parametric approaches. 相似文献

15.

An inverse-probability-weighted approach to estimation of the bivariate survival function under left-truncation and right-censoring

《Journal of statistical planning and inference》2006,136(12):4365-4384

In this note, we consider estimating the bivariate survival function when both survival times are subject to random left truncation and one of the survival times is subject to random right censoring. Motivated by Satten and Datta [2001. The Kaplan–Meier estimator as an inverse-probability-of-censoring weighted average. Amer. Statist. 55, 207–210], we propose an inverse-probability-weighted (IPW) estimator. It involves simultaneous estimation of the bivariate survival function of the truncation variables and that of the censoring variable and the truncation variable of the uncensored components. We prove that (i) when there is no censoring, the IPW estimator reduces to NPMLE of van der Laan [1996a. Nonparametric estimation of the bivariate survival function with truncated data. J. Multivariate Anal. 58, 107–131] and (ii) when there is random left truncation and right censoring on only one of the components and the other component is always observed, the IPW estimator reduces to the estimator of Gijbels and Gürler [1998. Covariance function of a bivariate distribution function estimator for left truncated and right censored data. Statist. Sin. 1219–1232]. Based on Theorem 3.1 of van der Laan [1996a. Nonparametric estimation of the bivariate survival function with truncated data. J. Multivariate Anal. 58, 107–131, 1996b. Efficient estimation of the bivariate censoring model and repairing NPMLE. Ann. Statist. 24, 596–627], we prove that the IPW estimator is consistent under certain conditions. Finally, we examine the finite sample performance of the IPW estimator in some simulation studies. For the special case that censoring time is independent of truncation time, a simulation study is conducted to compare the performances of the IPW estimator against that of the estimator proposed by van der Laan [1996a. Nonparametric estimation of the bivariate survival function with truncated data. J. Multivariate Anal. 58, 107–131, 1996b. Efficient estimation of the bivariate censoring model and repairing NPMLE. Ann. Statist. 24, 596–627]. For the special case (i), a simulation study is conducted to compare the performances of the IPW estimator against that of the estimator proposed by Huang et al. (2001. Nonnparametric estimation of marginal distributions under bivariate truncation with application to testing for age-of-onset application. Statist. Sin. 11, 1047–1068). 相似文献

16.

Variable selection for recurrent event data via nonconcave penalized estimating function

Xingwei Tong Liang Zhu Jianguo Sun 《Lifetime data analysis》2009,15(2):197-215

Variable selection is an important issue in all regression analysis and in this paper, we discuss this in the context of regression analysis of recurrent event data. Recurrent event data often occur in long-term studies in which individuals may experience the events of interest more than once and their analysis has recently attracted a great deal of attention (Andersen et al., Statistical models based on counting processes, 1993; Cook and Lawless, Biometrics 52:1311–1323, 1996, The analysis of recurrent event data, 2007; Cook et al., Biometrics 52:557–571, 1996; Lawless and Nadeau, Technometrics 37:158-168, 1995; Lin et al., J R Stat Soc B 69:711–730, 2000). However, it seems that there are no established approaches to the variable selection with respect to recurrent event data. For the problem, we adopt the idea behind the nonconcave penalized likelihood approach proposed in Fan and Li (J Am Stat Assoc 96:1348–1360, 2001) and develop a nonconcave penalized estimating function approach. The proposed approach selects variables and estimates regression coefficients simultaneously and an algorithm is presented for this process. We show that the proposed approach performs as well as the oracle procedure in that it yields the estimates as if the correct submodel was known. Simulation studies are conducted for assessing the performance of the proposed approach and suggest that it works well for practical situations. The proposed methodology is illustrated by using the data from a chronic granulomatous disease study. 相似文献

17.

Optimal nonparametric estimator of the area under ROC curve based on clustered data

Yougui Wu 《统计学通讯:理论与方法》2020,49(6):1446-1461

Abstract

In diagnostic trials, clustered data are obtained when several subunits of the same patient are observed. Intracluster correlations need to be taken into account when analyzing such clustered data. A nonparametric method has been proposed by Obuchowski (1997 Obuchowski, N. A. 1997. Nonparametric analysis of clustered ROC curve data. Biometrics 53 (2):567–78.[Crossref], [PubMed], [Web of Science ®] , [Google Scholar]) to estimate the Receiver Operating Characteristic curve area (AUC) for such clustered data. However, Obuchowski’s estimator is not efficient as it gives equal weight to all pairwise rankings within and between cluster. In this paper, we propose a more efficient nonparametric AUC estimator with two sets of optimal weights. Simulation results show that the loss of efficiency of Obuchowski’s estimator for a single AUC or the AUC difference can be substantial when there is a moderate intracluster test correlation and the cluster size is large. The efficiency gain of our weighted AUC estimator for a single AUC or the AUC difference is further illustrated using the data from a study of screening tests for neonatal hearing. 相似文献

18.

Bayesian variable selection in a class of mixture models for ordinal data: a comparative study

《Journal of Statistical Computation and Simulation》2012,82(10):1926-1944

In this paper, we consider a special finite mixture model named Combination of Uniform and shifted Binomial (CUB), recently introduced in the statistical literature to analyse ordinal data expressing the preferences of raters with regards to items or services. Our aim is to develop a variable selection procedure for this model using a Bayesian approach. Bayesian methods for variable selection and model choice have become increasingly popular in recent years, due to advances in Markov chain Monte Carlo computational algorithms. Several methods have been proposed in the case of linear and generalized linear models (GLM). In this paper, we adapt to the CUB model some of these algorithms: the Kuo–Mallick method together with its ‘metropolized’ version and the Stochastic Search Variable Selection method. Several simulated examples are used to illustrate the algorithms and to compare their performance. Finally, an application to real data is introduced. 相似文献

19.

Small area estimation of the mean using non-parametric M-quantile regression: a comparison when a linear mixed model does not hold

《Journal of Statistical Computation and Simulation》2012,82(8):945-964

The demand for reliable statistics in subpopulations, when only reduced sample sizes are available, has promoted the development of small area estimation methods. In particular, an approach that is now widely used is based on the seminal work by Battese et al. [An error-components model for prediction of county crop areas using survey and satellite data, J. Am. Statist. Assoc. 83 (1988), pp. 28–36] that uses linear mixed models (MM). We investigate alternatives when a linear MM does not hold because, on one side, linearity may not be assumed and/or, on the other, normality of the random effects may not be assumed. In particular, Opsomer et al. [Nonparametric small area estimation using penalized spline regression, J. R. Statist. Soc. Ser. B 70 (2008), pp. 265–283] propose an estimator that extends the linear MM approach to the case in which a linear relationship may not be assumed using penalized splines regression. From a very different perspective, Chambers and Tzavidis [M-quantile models for small area estimation, Biometrika 93 (2006), pp. 255–268] have recently proposed an approach for small-area estimation that is based on M-quantile (MQ) regression. This allows for models robust to outliers and to distributional assumptions on the errors and the area effects. However, when the functional form of the relationship between the qth MQ and the covariates is not linear, it can lead to biased estimates of the small area parameters. Pratesi et al. [Semiparametric M-quantile regression for estimating the proportion of acidic lakes in 8-digit HUCs of the Northeastern US, Environmetrics 19(7) (2008), pp. 687–701] apply an extended version of this approach for the estimation of the small area distribution function using a non-parametric specification of the conditional MQ of the response variable given the covariates [M. Pratesi, M.G. Ranalli, and N. Salvati, Nonparametric m-quantile regression using penalized splines, J. Nonparametric Stat. 21 (2009), pp. 287–304]. We will derive the small area estimator of the mean under this model, together with its mean-squared error estimator and compare its performance to the other estimators via simulations on both real and simulated data. 相似文献

20.

Interval estimation of the negative binomial dispersion parameter

《Journal of Statistical Computation and Simulation》2012,82(12):1993-2002

Inference concerning the negative binomial dispersion parameter, denoted by c, is important in many biological and biomedical investigations. Properties of the maximum-likelihood estimator of c and its bias-corrected version have been studied extensively, mainly, in terms of bias and efficiency [W.W. Piegorsch, Maximum likelihood estimation for the negative binomial dispersion parameter, Biometrics 46 (1990), pp. 863–867; S.J. Clark and J.N. Perry, Estimation of the negative binomial parameter κ by maximum quasi-likelihood, Biometrics 45 (1989), pp. 309–316; K.K. Saha and S.R. Paul, Bias corrected maximum likelihood estimator of the negative binomial dispersion parameter, Biometrics 61 (2005), pp. 179–185]. However, not much work has been done on the construction of confidence intervals (C.I.s) for c. The purpose of this paper is to study the behaviour of some C.I. procedures for c. We study, by simulations, three Wald type C.I. procedures based on the asymptotic distribution of the method of moments estimate (mme), the maximum-likelihood estimate (mle) and the bias-corrected mle (bcmle) [K.K. Saha and S.R. Paul, Bias corrected maximum likelihood estimator of the negative binomial dispersion parameter, Biometrics 61 (2005), pp. 179–185] of c. All three methods show serious under-coverage. We further study parametric bootstrap procedures based on these estimates of c, which significantly improve the coverage probabilities. The bootstrap C.I.s based on the mle (Boot-MLE method) and the bcmle (Boot-BCM method) have coverages that are significantly better (empirical coverage close to the nominal coverage) than the corresponding bootstrap C.I. based on the mme, especially for small sample size and highly over-dispersed data. However, simulation results on lengths of the C.I.s show evidence that all three bootstrap procedures have larger average coverage lengths. Therefore, for practical data analysis, the bootstrap C.I. Boot-MLE or Boot-BCM should be used, although Boot-MLE method seems to be preferable over the Boot-BCM method in terms of both coverage and length. Furthermore, Boot-MLE needs less computation than Boot-BCM. 相似文献