首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We study sparse high dimensional additive model fitting via penalization with sparsity-smoothness penalties. We review several existing algorithms that have been developed for this problem in the recent literature, highlighting the connections between them, and present some computationally efficient algorithms for fitting such models. Furthermore, using reasonable assumptions and exploiting recent results on group LASSO-like procedures, we take advantage of several oracle results which yield asymptotic optimality of estimators for high-dimensional but sparse additive models. Finally, variable selection procedures are compared with some high-dimensional testing procedures available in the literature for testing the presence of additive components.  相似文献   

2.
A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under general nonparametric models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, a data-driven thresholding and an iterative nonparametric independence screening (INIS) are also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods.  相似文献   

3.
Several approaches have been suggested for fitting linear regression models to censored data. These include Cox's propor­tional hazard models based on quasi-likelihoods. Methods of fitting based on least squares and maximum likelihoods have also been proposed. The methods proposed so far all require special purpose optimization routines. We describe an approach here which requires only a modified standard least squares routine.

We present methods for fitting a linear regression model to censored data by least squares and method of maximum likelihood. In the least squares method, the censored values are replaced by their expectations, and the residual sum of squares is minimized. Several variants are suggested in the ways in which the expect­ation is calculated. A parametric (assuming a normal error model) and two non-parametric approaches are described. We also present a method for solving the maximum likelihood equations in the estimation of the regression parameters in the censored regression situation. It is shown that the solutions can be obtained by a recursive algorithm which needs only a least squares routine for optimization. The suggested procesures gain considerably in computational officiency. The Stanford Heart Transplant data is used to illustrate the various methods.  相似文献   

4.
Abstract.  We consider non-parametric additive quantile regression estimation by kernel-weighted local linear fitting. The estimator is based on localizing the characterization of quantile regression as the minimizer of the appropriate 'check function'. A backfitting algorithm and a heuristic rule for selecting the smoothing parameter are explored. We also study the estimation of average-derivative quantile regression under the additive model. The techniques are illustrated by a simulated example and a real data set.  相似文献   

5.
Lots of semi-parametric and nonparametric models are used to fit nonlinear time series data. They include partially linear time series models, nonparametric additive models, and semi-parametric single index models. In this article, we focus on fitting time series data by partially linear additive model. Combining the orthogonal series approximation and the adaptive sparse group LASSO regularization, we select the important variables between and within the groups simultaneously. Specially, we propose a two-step algorithm to obtain the grouped sparse estimators. Numerical studies show that the proposed method outperforms LASSO method in both fitting and forecasting. An empirical analysis is used to illustrate the methodology.  相似文献   

6.
Abstract.  Imagine we have two different samples and are interested in doing semi- or non-parametric regression analysis in each of them, possibly on the same model. In this paper, we consider the problem of testing whether a specific covariate has different impacts on the regression curve in these two samples. We compare the regression curves of different samples but are interested in specific differences instead of testing for equality of the whole regression function. Our procedure does allow for random designs, different sample sizes, different variance functions, different sets of regressors with different impact functions, etc. As we use the marginal integration approach, this method can be applied to any strong, weak or latent separable model as well as to additive interaction models to compare the lower dimensional separable components between the different samples. Thus, in the case of having separable models, our procedure includes the possibility of comparing the whole regression curves, thereby avoiding the curse of dimensionality. It is shown that bootstrap fails in theory and practice. Therefore, we propose a subsampling procedure with automatic choice of subsample size. We present a complete asymptotic theory and an extensive simulation study.  相似文献   

7.
Local Likelihood Estimation in Generalized Additive Models   总被引:2,自引:0,他引:2  
ABSTRACT.  Generalized additive models are a popular class of multivariate non-parametric regression models, due in large part to the ease of use of the local scoring estimation algorithm. However, the theoretical properties of the local scoring estimator are poorly understood. In this article, we propose a local likelihood estimator for generalized additive models that is closely related to the local scoring estimator fitted by local polynomial regression. We derive the statistical properties of the estimator and show that it achieves the same asymptotic convergence rate as a one-dimensional local polynomial regression estimator. We also propose a wild bootstrap estimator for calculating point-wise confidence intervals for the additive component functions. The practical behaviour of the proposed estimator is illustrated through a simulation experiment.  相似文献   

8.
Abstract. We review and extend some statistical tools that have proved useful for analysing functional data. Functional data analysis primarily is designed for the analysis of random trajectories and infinite‐dimensional data, and there exists a need for the development of adequate statistical estimation and inference techniques. While this field is in flux, some methods have proven useful. These include warping methods, functional principal component analysis, and conditioning under Gaussian assumptions for the case of sparse data. The latter is a recent development that may provide a bridge between functional and more classical longitudinal data analysis. Besides presenting a brief review of functional principal components and functional regression, we develop some concepts for estimating functional principal component scores in the sparse situation. An extension of the so‐called generalized functional linear model to the case of sparse longitudinal predictors is proposed. This extension includes functional binary regression models for longitudinal data and is illustrated with data on primary biliary cirrhosis.  相似文献   

9.
Generalized additive mixed models are proposed for overdispersed and correlated data, which arise frequently in studies involving clustered, hierarchical and spatial designs. This class of models allows flexible functional dependence of an outcome variable on covariates by using nonparametric regression, while accounting for correlation between observations by using random effects. We estimate nonparametric functions by using smoothing splines and jointly estimate smoothing parameters and variance components by using marginal quasi-likelihood. Because numerical integration is often required by maximizing the objective functions, double penalized quasi-likelihood is proposed to make approximate inference. Frequentist and Bayesian inferences are compared. A key feature of the method proposed is that it allows us to make systematic inference on all model components within a unified parametric mixed model framework and can be easily implemented by fitting a working generalized linear mixed model by using existing statistical software. A bias correction procedure is also proposed to improve the performance of double penalized quasi-likelihood for sparse data. We illustrate the method with an application to infectious disease data and we evaluate its performance through simulation.  相似文献   

10.
High dimensional models are getting much attention from diverse research fields involving very many parameters with a moderate size of data. Model selection is an important issue in such a high dimensional data analysis. Recent literature on theoretical understanding of high dimensional models covers a wide range of penalized methods including LASSO and SCAD. This paper presents a systematic overview of the recent development in high dimensional statistical models. We provide a brief review on the recent development of theory, methods, and guideline on applications of several penalized methods. The review includes appropriate settings to be implemented and limitations along with potential solution for each of the reviewed method. In particular, we provide a systematic review of statistical theory of the high dimensional methods by considering a unified high-dimensional modeling framework together with high level conditions. This framework includes (generalized) linear regression and quantile regression as its special cases. We hope our review helps researchers in this field to have a better understanding of the area and provides useful information to future study.  相似文献   

11.
ABSTRACT

We propose a new semiparametric Weibull cure rate model for fitting nonlinear effects of explanatory variables on the mean, scale and cure rate parameters. The regression model is based on the generalized additive models for location, scale and shape, for which any or all distribution parameters can be modeled as parametric linear and/or nonparametric smooth functions of explanatory variables. We present methods to select additive terms, model estimation and validation, where all computational codes are presented in a simple way such that any R user can fit the new model. Biases of the parameter estimates caused by models specified erroneously are investigated through Monte Carlo simulations. We illustrate the usefulness of the new model by means of two applications to real data. We provide computational codes to fit the new regression model in the R software.  相似文献   

12.
Thin plate regression splines   总被引:2,自引:0,他引:2  
Summary. I discuss the production of low rank smoothers for d  ≥ 1 dimensional data, which can be fitted by regression or penalized regression methods. The smoothers are constructed by a simple transformation and truncation of the basis that arises from the solution of the thin plate spline smoothing problem and are optimal in the sense that the truncation is designed to result in the minimum possible perturbation of the thin plate spline smoothing problem given the dimension of the basis used to construct the smoother. By making use of Lanczos iteration the basis change and truncation are computationally efficient. The smoothers allow the use of approximate thin plate spline models with large data sets, avoid the problems that are associated with 'knot placement' that usually complicate modelling with regression splines or penalized regression splines, provide a sensible way of modelling interaction terms in generalized additive models, provide low rank approximations to generalized smoothing spline models, appropriate for use with large data sets, provide a means for incorporating smooth functions of more than one variable into non-linear models and improve the computational efficiency of penalized likelihood models incorporating thin plate splines. Given that the approach produces spline-like models with a sparse basis, it also provides a natural way of incorporating unpenalized spline-like terms in linear and generalized linear models, and these can be treated just like any other model terms from the point of view of model selection, inference and diagnostics.  相似文献   

13.
We propose two new procedures based on multiple hypothesis testing for correct support estimation in high‐dimensional sparse linear models. We conclusively prove that both procedures are powerful and do not require the sample size to be large. The first procedure tackles the atypical setting of ordered variable selection through an extension of a testing procedure previously developed in the context of a linear hypothesis. The second procedure is the main contribution of this paper. It enables data analysts to perform support estimation in the general high‐dimensional framework of non‐ordered variable selection. A thorough simulation study and applications to real datasets using the R package mht shows that our non‐ordered variable procedure produces excellent results in terms of correct support estimation as well as in terms of mean square errors and false discovery rate, when compared to common methods such as the Lasso, the SCAD penalty, forward regression or the false discovery rate procedure (FDR).  相似文献   

14.
We develop a fast variational approximation scheme for Gaussian process (GP) regression, where the spectrum of the covariance function is subjected to a sparse approximation. Our approach enables uncertainty in covariance function hyperparameters to be treated without using Monte Carlo methods and is robust to overfitting. Our article makes three contributions. First, we present a variational Bayes algorithm for fitting sparse spectrum GP regression models that uses nonconjugate variational message passing to derive fast and efficient updates. Second, we propose a novel adaptive neighbourhood technique for obtaining predictive inference that is effective in dealing with nonstationarity. Regression is performed locally at each point to be predicted and the neighbourhood is determined using a measure defined based on lengthscales estimated from an initial fit. Weighting dimensions according to lengthscales, this downweights variables of little relevance, leading to automatic variable selection and improved prediction. Third, we introduce a technique for accelerating convergence in nonconjugate variational message passing by adapting step sizes in the direction of the natural gradient of the lower bound. Our adaptive strategy can be easily implemented and empirical results indicate significant speedups.  相似文献   

15.
In this paper, we consider partially linear additive models with an unknown link function, which include single‐index models and additive models as special cases. We use polynomial spline method for estimating the unknown link function as well as the component functions in the additive part. We establish that convergence rates for all nonparametric functions are the same as in one‐dimensional nonparametric regression. For a faster rate of the parametric part, we need to define appropriate ‘projection’ that is more complicated than that defined previously for partially linear additive models. Compared to previous approaches, a distinct advantage of our estimation approach in implementation is that estimation directly reduces estimation in the single‐index model and can thus deal with much larger dimensional problems than previous approaches for additive models with unknown link functions. Simulations and a real dataset are used to illustrate the proposed model.  相似文献   

16.
Summary.  Multivariate failure time data arise when data consist of clusters in which the failure times may be dependent. A popular approach to such data is the marginal proportional hazards model with estimation under the working independence assumption. In some contexts, however, it may be more reasonable to use the marginal additive hazards model. We derive asymptotic properties of the Lin and Ying estimators for the marginal additive hazards model for multivariate failure time data. Furthermore we suggest estimating equations for the regression parameters and association parameters in parametric shared frailty models with marginal additive hazards by using the Lin and Ying estimators. We give the large sample properties of the estimators arising from these estimating equations and investigate their small sample properties by Monte Carlo simulation. A real example is provided for illustration.  相似文献   

17.
We present a new algorithm for boosting generalized additive models for location, scale and shape (GAMLSS) that allows to incorporate stability selection, an increasingly popular way to obtain stable sets of covariates while controlling the per-family error rate. The model is fitted repeatedly to subsampled data, and variables with high selection frequencies are extracted. To apply stability selection to boosted GAMLSS, we develop a new “noncyclical” fitting algorithm that incorporates an additional selection step of the best-fitting distribution parameter in each iteration. This new algorithm has the additional advantage that optimizing the tuning parameters of boosting is reduced from a multi-dimensional to a one-dimensional problem with vastly decreased complexity. The performance of the novel algorithm is evaluated in an extensive simulation study. We apply this new algorithm to a study to estimate abundance of common eider in Massachusetts, USA, featuring excess zeros, overdispersion, nonlinearity and spatiotemporal structures. Eider abundance is estimated via boosted GAMLSS, allowing both mean and overdispersion to be regressed on covariates. Stability selection is used to obtain a sparse set of stable predictors.  相似文献   

18.
Censored median regression has proved useful for analyzing survival data in complicated situations, say, when the variance is heteroscedastic or the data contain outliers. In this paper, we study the sparse estimation for censored median regression models, which is an important problem for high dimensional survival data analysis. In particular, a new procedure is proposed to minimize an inverse-censoring-probability weighted least absolute deviation loss subject to the adaptive LASSO penalty and result in a sparse and robust median estimator. We show that, with a proper choice of the tuning parameter, the procedure can identify the underlying sparse model consistently and has desired large-sample properties including root-n consistency and the asymptotic normality. The procedure also enjoys great advantages in computation, since its entire solution path can be obtained efficiently. Furthermore, we propose a resampling method to estimate the variance of the estimator. The performance of the procedure is illustrated by extensive simulations and two real data applications including one microarray gene expression survival data.  相似文献   

19.
Regression analysis is one of the most commonly used techniques in statistics. When the dimension of independent variables is high, it is difficult to conduct efficient non-parametric analysis straightforwardly from the data. As an important alternative to the additive and other non-parametric models, varying-coefficient models can reduce the modelling bias and avoid the "curse of dimensionality" significantly. In addition, the coefficient functions can easily be estimated via a simple local regression. Based on local polynomial techniques, we provide the asymptotic distribution for the maximum of the normalized deviations of the estimated coefficient functions away from the true coefficient functions. Using this result and the pre-asymptotic substitution idea for estimating biases and variances, simultaneous confidence bands for the underlying coefficient functions are constructed. An important question in the varying coefficient models is whether an estimated coefficient function is statistically significantly different from zero or a constant. Based on newly derived asymptotic theory, a formal procedure is proposed for testing whether a particular parametric form fits a given data set. Simulated and real-data examples are used to illustrate our techniques.  相似文献   

20.
In the present paper we find finite dimensional spaces W of alternatives with high power for a given class of tests and non-parametric alternatives. On the orthogonal complement of W the power function is flat. These methods can be used to reduce the dimension of interesting alternatives. We sketch a device how to calculate (approximately) an alternative with maximum power of a fixed test on a given ball of certain non-parametric alternatives.

The calculations are done within different asymptotic models specified by signal detection tests. Specific tests are Kolmogorov–Smirnov type tests, integral tests (like the Anderson and Darling test) and Rényi tests for hazard based models. The statistical meaning and interpretation of the spaces of alternatives with high power is discussed. These alternatives belong to least favorable directions of a class of statistical functionals which are linear combinations of quantile functions. For various cases their meaning is explained for parametric submodels, in particular for location alternatives.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号