首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
There are several procedures for fitting generalized additive models, i.e. regression models for an exponential family response where the influence of each single covariates is assumed to have unknown, potentially non-linear shape. Simulated data are used to compare a smoothing parameter optimization approach for selection of smoothness and of covariates, a stepwise approach, a mixed model approach, and a procedure based on boosting techniques. In particular it is investigated how the performance of procedures is linked to amount of information, type of response, total number of covariates, number of influential covariates, and extent of non-linearity. Measures for comparison are prediction performance, identification of influential covariates, and smoothness of fitted functions. One result is that the mixed model approach returns sparse fits with frequently over-smoothed functions, while the functions are less smooth for the boosting approach and variable selection is less strict. The other approaches are in between with respect to these measures. The boosting procedure is seen to perform very well when little information is available and/or when a large number of covariates is to be investigated. It is somewhat surprising that in scenarios with low information the fitting of a linear model, even with stepwise variable selection, has not much advantage over the fitting of an additive model when the true underlying structure is linear. In cases with more information the prediction performance of all procedures is very similar. So, in difficult data situations the boosting approach can be recommended, in others the procedures can be chosen conditional on the aim of the analysis.  相似文献   

2.
We propose a general framework for regression models with functional response containing a potentially large number of flexible effects of functional and scalar covariates. Special emphasis is put on historical functional effects, where functional response and functional covariate are observed over the same interval and the response is only influenced by covariate values up to the current grid point. Historical functional effects are mostly used when functional response and covariate are observed on a common time interval, as they account for chronology. Our formulation allows for flexible integration limits including, e.g., lead or lag times. The functional responses can be observed on irregular curve-specific grids. Additionally, we introduce different parameterizations for historical effects and discuss identifiability issues.The models are estimated by a component-wise gradient boosting algorithm which is suitable for models with a potentially high number of covariate effects, even more than observations, and inherently does model selection. By minimizing corresponding loss functions, different features of the conditional response distribution can be modeled, including generalized and quantile regression models as special cases. The methods are implemented in the open-source R package FDboost. The methodological developments are motivated by biotechnological data on Escherichia coli fermentations, but cover a much broader model class.  相似文献   

3.
Lots of semi-parametric and nonparametric models are used to fit nonlinear time series data. They include partially linear time series models, nonparametric additive models, and semi-parametric single index models. In this article, we focus on fitting time series data by partially linear additive model. Combining the orthogonal series approximation and the adaptive sparse group LASSO regularization, we select the important variables between and within the groups simultaneously. Specially, we propose a two-step algorithm to obtain the grouped sparse estimators. Numerical studies show that the proposed method outperforms LASSO method in both fitting and forecasting. An empirical analysis is used to illustrate the methodology.  相似文献   

4.
We develop a hierarchical Gaussian process model for forecasting and inference of functional time series data. Unlike existing methods, our approach is especially suited for sparsely or irregularly sampled curves and for curves sampled with nonnegligible measurement error. The latent process is dynamically modeled as a functional autoregression (FAR) with Gaussian process innovations. We propose a fully nonparametric dynamic functional factor model for the dynamic innovation process, with broader applicability and improved computational efficiency over standard Gaussian process models. We prove finite-sample forecasting and interpolation optimality properties of the proposed model, which remain valid with the Gaussian assumption relaxed. An efficient Gibbs sampling algorithm is developed for estimation, inference, and forecasting, with extensions for FAR(p) models with model averaging over the lag p. Extensive simulations demonstrate substantial improvements in forecasting performance and recovery of the autoregressive surface over competing methods, especially under sparse designs. We apply the proposed methods to forecast nominal and real yield curves using daily U.S. data. Real yields are observed more sparsely than nominal yields, yet the proposed methods are highly competitive in both settings. Supplementary materials, including R code and the yield curve data, are available online.  相似文献   

5.
An exploratory model analysis device we call CDF knotting is introduced. It is a technique we have found useful for exploring relationships between points in the parameter space of a model and global properties of associated distribution functions. It can be used to alert the model builder to a condition we call lack of distinguishability which is to nonlinear models what multicollinearity is to linear models. While there are simple remedial actions to deal with multicollinearity in linear models, techniques such as deleting redundant variables in those models do not have obvious parallels for nonlinear models. In some of these nonlinear situations, however, CDF knotting may lead to alternative models with fewer parameters whose distribution functions are very similar to those of the original overparameterized model. We also show how CDF knotting can be exploited as a mathematical tool for deriving limiting distributions and illustrate the technique for the 3-parameterWeibull family obtaining limiting forms and moment ratios which correct and extend previously published results. Finally, geometric insights obtained by CDF knotting are verified relative to data fitting and estimation.  相似文献   

6.
Structural equation models (SEM) have been extensively used in behavioral, social, and psychological research to model relations between the latent variables and the observations. Most software packages for the fitting of SEM rely on frequentist methods. Traditional models and software are not appropriate for analysis of the dependent observations such as time-series data. In this study, a structural equation model with a time series feature is introduced. A Bayesian approach is used to solve the model with the aid of the Markov chain Monte Carlo method. Bayesian inferences as well as prediction with the proposed time series structural equation model can also reveal certain unobserved relationships among the observations. The approach is successfully employed using real Asian, American and European stock return data.  相似文献   

7.
The B-spline representation is a common tool to improve the fitting of smooth nonlinear functions, it offers a fitting as a piecewise polynomial. The regions that define the pieces are separated by a sequence of knots. The main difficulty in this type of modeling is the choice of the number and the locations of these knots. The Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm provides a solution to simultaneously select these two parameters by considering the knots as free parameters. This algorithm belongs to the MCMC techniques that allow simulations from target distributions on spaces of varying dimension. The aim of the present investigation is to use this algorithm in the framework of the analysis of survival time, for the Cox model in particular. In fact, the relation between the hazard ratio function and the covariates being assumed to be log-linear, this assumption is too restrictive. Thus, we propose to use the RJMCMC algorithm to model the log hazard ratio function by a B-spline representation with an unknown number of knots at unknown locations. This method is illustrated with two real data sets: the Stanford heart transplant data and lung cancer survival data. Another application of the RJMCMC is selecting the significant covariates, and a simulation study is performed.  相似文献   

8.
Dynamic regression models (also known as distributed lag models) are widely used in engineering for quality control and in economics for forecasting. In this article I propose a procedure for specifying such models in practice. The proposed procedure requires no prewhitening and can directly handle the nonstationary series. Furthermore, the procedure cross-validates prior beliefs about causal relationships between variables with empirical findings to ensure the suitability of model structure. An illustrative example is given.  相似文献   

9.
The authors consider a novel class of nonlinear time series models based on local mixtures of regressions of exponential family models, where the covariates include functions of lags of the dependent variable. They give conditions to guarantee consistency of the maximum likelihood estimator for correctly specified models, with stationary and nonstationary predictors. They show that consistency of the maximum likelihood estimator still holds under model misspecification. They also provide probabilistic results for the proposed model when the vector of predictors contains only lags of transformations of the modeled time series. They illustrate the consistency of the maximum likelihood estimator and the probabilistic properties via Monte Carlo simulations. Finally, they present an application using real data.  相似文献   

10.
We present a new algorithm for boosting generalized additive models for location, scale and shape (GAMLSS) that allows to incorporate stability selection, an increasingly popular way to obtain stable sets of covariates while controlling the per-family error rate. The model is fitted repeatedly to subsampled data, and variables with high selection frequencies are extracted. To apply stability selection to boosted GAMLSS, we develop a new “noncyclical” fitting algorithm that incorporates an additional selection step of the best-fitting distribution parameter in each iteration. This new algorithm has the additional advantage that optimizing the tuning parameters of boosting is reduced from a multi-dimensional to a one-dimensional problem with vastly decreased complexity. The performance of the novel algorithm is evaluated in an extensive simulation study. We apply this new algorithm to a study to estimate abundance of common eider in Massachusetts, USA, featuring excess zeros, overdispersion, nonlinearity and spatiotemporal structures. Eider abundance is estimated via boosted GAMLSS, allowing both mean and overdispersion to be regressed on covariates. Stability selection is used to obtain a sparse set of stable predictors.  相似文献   

11.
Functional regression models that relate functional covariates to a scalar response are becoming more common due to the availability of functional data and computational advances. We introduce a functional nonlinear model with a scalar response where the true parameter curve is monotone. Using the Newton-Raphson method within a backfitting procedure, we discuss a penalized least squares criterion for fitting the functional nonlinear model with the smoothing parameter selected using generalized cross validation. Connections between a nonlinear mixed effects model and our functional nonlinear model are discussed, thereby providing an additional model fitting procedure using restricted maximum likelihood for smoothing parameter selection. Simulated relative efficiency gains provided by a monotone parameter curve estimator relative to an unconstrained parameter curve estimator are presented. In addition, we provide an application of our model with data from ozonesonde measurements of stratospheric ozone in which the measurements are biased as a function of altitude.  相似文献   

12.
13.
Bayesian model building techniques are developed for data with a strong time series structure and possibly exogenous explanatory variables that have strong explanatory and predictive power. The emphasis is on finding whether there are any explanatory variables that might be used for modelling if the data have a strong time series structure that should also be included. We use a time series model that is linear in past observations and that can capture both stochastic and deterministic trend, seasonality and serial correlation. We propose the plotting of absolute predictive error against predictive standard deviation. A series of such plots is utilized to determine which of several nested and non-nested models is optimal in terms of minimizing the dispersion of the predictive distribution and restricting predictive outliers. We apply the techniques to modelling monthly counts of fatal road crashes in Australia where economic, consumption and weather variables are available and we find that three such variables should be included in addition to the time series filter. The approach leads to graphical techniques to determine strengths of relationships between the dependent variable and covariates and to detect model inadequacy as well as determining useful numerical summaries.  相似文献   

14.
Functional time series whose sample elements are recorded sequentially over time are frequently encountered with increasing technology. Recent studies have shown that analyzing and forecasting of functional time series can be performed easily using functional principal component analysis and existing univariate/multivariate time series models. However, the forecasting performance of such functional time series models may be affected by the presence of outlying observations which are very common in many scientific fields. Outliers may distort the functional time series model structure, and thus, the underlying model may produce high forecast errors. We introduce a robust forecasting technique based on weighted likelihood methodology to obtain point and interval forecasts in functional time series in the presence of outliers. The finite sample performance of the proposed method is illustrated by Monte Carlo simulations and four real-data examples. Numerical results reveal that the proposed method exhibits superior performance compared with the existing method(s).  相似文献   

15.
Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data.  相似文献   

16.
The consistency of model selection criterion BIC has been well and widely studied for many nonlinear regression models. However, few of them had considered models with lag variables as regressors and auto-correlated errors in time series settings, which is common in both linear and nonlinear time series modeling. This paper studies a dynamic semi-varying coefficient model with ARMA errors, using an approach based on spectrum analysis of time series. The consistency property of the proposed model selection criteria is established and an implementation procedure of model selection is proposed for practitioners. Simulation studies have also been conducted to numerically show the consistency property.  相似文献   

17.
This paper proposes and investigates a class of Markov Poisson regression models in which Poisson rate functions of covariates are conditional on unobserved states which follow a finite-state Markov chain. Features of the proposed model, estimation, inference, bootstrap confidence intervals, model selection and other implementation issues are discussed. Monte Carlo studies suggest that the proposed estimation method is accurate and reliable for single- and multiple-subject time series data; the choice of starting probabilities for the Markov process has little eff ect on the parameter estimates; and penalized likelihood criteria are reliable for determining the number of states. Part 2 provides applications of the proposed model.  相似文献   

18.
Two types of state-switching models for U.S. real output have been proposed: models that switch randomly between states and models that switch states deterministically, as in the threshold autoregressive model of Potter. These models have been justified primarily on how well they fit the sample data, yielding statistically significant estimates of the model coefficients. Here we propose a new approach to the evaluation of an estimated nonlinear time series model that provides a complement to existing methods based on in-sample fit or on out-of-sample forecasting. In this new approach, a battery of distinct nonlinearity tests is applied to the sample data, resulting in a set of p-values for rejecting the null hypothesis of a linear generating mechanism. This set of p-values is taken to be a “stylized fact” characterizing the nonlinear serial dependence in the generating mechanism of the time series. The effectiveness of an estimated nonlinear model for this time series is then evaluated in terms of the congruence between this stylized fact and a set of nonlinearity test results obtained from data simulated using the estimated model. In particular, we derive a portmanteau statistic based on this set of nonlinearity test p-values that allows us to test the proposition that a given model adequately captures the nonlinear serial dependence in the sample data. We apply the method to several estimated state-switching models of U.S. real output.  相似文献   

19.
This paper considers model selection and forecasting issues in two closely related models for nonstationary periodic autoregressive time series [PAR]. Periodically integrated seasonal time series [PIAR] need a periodic differencing filter to remove the stochastic trend. On the other hand, when the nonperiodic first order differencing filter can be applied, one can have a periodic model with a nonseasonal unit root [PARI]. In this paper, we discuss and evaluate two testing strategies to select between these two models. Furthermore, we compare the relative forecasting performance of each model using Monte Carlo simulations and some U.K. macroeconomic seasonal time series. One result is that forecasting with PARI models while the data generating process is a PIAR process seems to be worse thanvice versa.  相似文献   

20.
ABSTRACT

Series hybrid models are one of the most widely-used hybrid models that in which a time series is assumed to be composed of two linear and nonlinear components. In this paper, the performance of two types of these hybrid models is evaluated for predicting stock prices in order to introduce the more reliable series hybrid model. For this purpose, ARIMA and MLPs are elected for constructing series hybrid models. Empirical results for forecasting three benchmark data sets indicate that despite of more popularity of the conventional ARIMA-ANN model, the ANN-ARIMA hybrid model can overall achieved more accurate results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号