Some asymptotic results on generalized penalized spline smoothing   总被引:2,自引:0,他引:2  
Summary.  The paper discusses asymptotic properties of penalized spline smoothing if the spline basis increases with the sample size. The proof is provided in a generalized smoothing model allowing for non-normal responses. The results are extended in two ways. First, assuming the spline coefficients to be a priori normally distributed links the smoothing framework to generalized linear mixed models. We consider the asymptotic rates such that the Laplace approximation is justified and the resulting fits in the mixed model correspond to penalized spline estimates. Secondly, we make use of a fully Bayesian viewpoint by imposing an a priori distribution on all parameters and coefficients. We argue that with the postulated rates at which the spline basis dimension increases with the sample size the posterior distribution of the spline coefficients is approximately normal. The validity of this result is investigated in finite samples by comparing Markov chain Monte Carlo results with their asymptotic approximation in a simulation study.  相似文献   

Generalized linear mixed models (GLMMs) are widely used to analyse non-normal response data with extra-variation, but non-robust estimators are still routinely used. We propose robust methods for maximum quasi-likelihood and residual maximum quasi-likelihood estimation to limit the influence of outlying observations in GLMMs. The estimation procedure parallels the development of robust estimation methods in linear mixed models, but with adjustments in the dependent variable and the variance component. The methods proposed are applied to three data sets and a comparison is made with the nonparametric maximum likelihood approach. When applied to a set of epileptic seizure data, the methods proposed have the desired effect of limiting the influence of outlying observations on the parameter estimates. Simulation shows that one of the residual maximum quasi-likelihood proposals has a smaller bias than those of the other estimation methods. We further discuss the equivalence of two GLMM formulations when the response variable follows an exponential family. Their extensions to robust GLMMs and their comparative advantages in modelling are described. Some possible modifications of the robust GLMM estimation methods are given to provide further flexibility for applying the method.  相似文献   

The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized linear mixed model (GLMM). However, it has been noticed that the PQL tends to underestimate variance components as well as regression coefficients in the previous literature. In this article, we numerically show that the biases of variance component estimates by PQL are systematically related to the biases of regression coefficient estimates by PQL, and also show that the biases of variance component estimates by PQL increase as random effects become more heterogeneous.  相似文献   

We consider two estimation schemes based on penalized quasilikelihood and quasi-pseudo-likelihood in Poisson mixed models. The asymptotic bias in regression coefficients and variance components estimated by penalized quasilikelihood (PQL) is studied for small values of the variance components. We show the PQL estimators of both regression coefficients and variance components in Poisson mixed models have a smaller order of bias compared to those for binomial data. Unbiased estimating equations based on quasi-pseudo-likelihood are proposed and are shown to yield consistent estimators under some regularity conditions. The finite sample performance of these two methods is compared through a simulation study.  相似文献   

Estimation and prediction in generalized linear mixed models are often hampered by intractable high dimensional integrals. This paper provides a framework to solve this intractability, using asymptotic expansions when the number of random effects is large. To that end, we first derive a modified Laplace approximation when the number of random effects is increasing at a lower rate than the sample size. Second, we propose an approximate likelihood method based on the asymptotic expansion of the log-likelihood using the modified Laplace approximation which is maximized using a quasi-Newton algorithm. Finally, we define the second order plug-in predictive density based on a similar expansion to the plug-in predictive density and show that it is a normal density. Our simulations show that in comparison to other approximations, our method has better performance. Our methods are readily applied to non-Gaussian spatial data and as an example, the analysis of the rhizoctonia root rot data is presented.  相似文献   

Breslow and Clayton (J Am Stat Assoc 88:9–25,1993) was, and still is, a highly influential paper mobilizing the use of generalized linear mixed models in epidemiology and a wide variety of fields. An important aspect is the feasibility in implementation through the ready availability of related software in SAS (SAS Institute, PROC GLIMMIX, SAS Institute Inc., URL , 2007), S-plus (Insightful Corporation, S-PLUS 8, Insightful Corporation, Seattle, WA, URL , 2007), and R (R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URL , 2006) for example, facilitating its broad usage. This paper reviews background to generalized linear mixed models and the inferential techniques which have been developed for them. To provide the reader with a flavor of the utility and wide applicability of this fundamental methodology we consider a few extensions including additive models, models for zero-heavy data, and models accommodating latent clusters.  相似文献   

In practice, it is not uncommon to encounter the situation that a discrete response is related to both a functional random variable and multiple real-value random variables whose impact on the response is nonlinear. In this paper, we consider the generalized partial functional linear additive models (GPFLAM) and present the estimation procedure. In GPFLAM, the nonparametric functions are approximated by polynomial splines and the infinite slope function is estimated based on the principal component basis function approximations. We obtain the estimator by maximizing the quasi-likelihood function. We investigate the finite sample properties of the estimation procedure via Monte Carlo simulation studies and illustrate our proposed model by a real data analysis.  相似文献   

This paper develops inference for the significance of features such as peaks and valleys observed in additive modeling through an extension of the SiZer-type methodology of Chaudhuri and Marron (1999) and Godtliebsen et al. (2002, 2004) to the case where the outcome is discrete. We consider the problem of determining the significance of features such as peaks or valleys in observed covariate effects both for the case of additive modeling where the main predictor of interest is univariate as well as the problem of studying the significance of features such as peaks, inclines, ridges and valleys when the main predictor of interest is geographical location. We work with low rank radial spline smoothers to allow to the handling of sparse designs and large sample sizes. Reducing the problem to a Generalised Linear Mixed Model (GLMM) framework enables derivation of simulation-based critical value approximations and guards against the problem of multiple inferences over a range of predictor values. Such a reduction also allows for easy adjustment for confounders including those which have an unknown or complex effect on the outcome. A simulation study indicates that our method has satisfactory power. Finally, we illustrate our methodology on several data sets.  相似文献   

Financial data exhibit complex structures and relations and it is therefore not always possible or expedient to find a suitable parametric functional form to adequately describe the data. To overcome this problem, nonparametric techniques can be used to extract the functional process directly from the data without any a priori specification of the functional shape. We take advantage of this flexibility and use a penalized spline approach to model, over time, the implied equity risk premiums of companies that belong to a local stock exchange index. In finance and macroeconomic research it is common practice to use simple averaging techniques to aggregate the single values, thus obtaining an overview of the stock market of a country or particular groups defined by stock-specific characteristics. The objective is to obtain common patterns or dependencies from individual characteristics. A precondition here is a substantial heterogeneity of the individual stocks, because otherwise one constituent can represent the whole index and the required diversification effect fails. Hence, in this paper we explore if and how this assumption is justified. The examined stock indices are the Dow Jones Industrial Index and the German DAX 30. It turns out that the constituents of both indices show very stock-specific behaviors of their equity risk premium over time. Thus the application of these indices in, e.g., macroeconomic research seems adequate.  相似文献   

It is well known that in a traditional outlier-free situation, the generalized quasi-likelihood (GQL) approach [B.C. Sutradhar, On exact quasilikelihood inference in generalized linear mixed models, Sankhya: Indian J. Statist. 66 (2004), pp. 261–289] performs very well to obtain the consistent as well as the efficient estimates for the parameters involved in the generalized linear mixed models (GLMMs). In this paper, we first examine the effect of the presence of one or more outliers on the GQL estimation for the parameters in such GLMMs, especially in two important models such as count and binary mixed models. The outliers appear to cause serious biases and hence inconsistency in the estimation. As a remedy, we then propose a robust GQL (RGQL) approach in order to obtain the consistent estimates for the parameters in the GLMMs in the presence of one or more outliers. An extensive simulation study is conducted to examine the consistency performance of the proposed RGQL approach.  相似文献   

Summary.  The pattern of absenteeism in the downsizing process of companies is a topic in focus in economics and social science. A general question is whether employees who are frequently absent are more likely to be selected to be laid off or in contrast whether employees to be dismissed are more likely to be absent for the remaining time of their working contract. We pursue an empirical and microeconomic investigation of these theses. We analyse longitudinal data that were collected in a German company over several years. We fit a semiparametric transition model based on a mixture Poisson distribution for the days of absenteeism per month. Prediction intervals are considered and the primary focus is on the period of downsizing. The data reveal clear evidence for the hypothesis that employees who are to be laid off are more frequently absent before leaving the company. Interestingly, though, no clear evidence is seen that employees being selected to leave the company are those with a bad absenteeism profile.  相似文献   

Longitudinal data frequently arises in various fields of applied sciences where individuals are measured according to some ordered variable, e.g. time. A common approach used to model such data is based on the mixed models for repeated measures. This model provides an eminently flexible approach to modeling of a wide range of mean and covariance structures. However, such models are forced into a rigidly defined class of mathematical formulas which may not be well supported by the data within the whole sequence of observations. A possible non-parametric alternative is a cubic smoothing spline, which is highly flexible and has useful smoothing properties. It can be shown that under normality assumption, the solution of the penalized log-likelihood equation is the cubic smoothing spline, and this solution can be further expressed as a solution of the linear mixed model. It is shown here how cubic smoothing splines can be easily used in the analysis of complete and balanced data. Analysis can be greatly simplified by using the unweighted estimator studied in the paper. It is shown that if the covariance structure of random errors belong to certain class of matrices, the unweighted estimator is the solution to the penalized log-likelihood function. This result is new in smoothing spline context and it is not only confined to growth curve settings. The connection to mixed models is used in developing a rough testing of group profiles. Numerical examples are presented to illustrate the techniques proposed.  相似文献   

Most regression problems in practice require flexible semiparametric forms of the predictor for modelling the dependence of responses on covariates. Moreover, it is often necessary to add random effects accounting for overdispersion caused by unobserved heterogeneity or for correlation in longitudinal or spatial data. We present a unified approach for Bayesian inference via Markov chain Monte Carlo simulation in generalized additive and semiparametric mixed models. Different types of covariates, such as the usual covariates with fixed effects, metrical covariates with non-linear effects, unstructured random effects, trend and seasonal components in longitudinal data and spatial covariates, are all treated within the same general framework by assigning appropriate Markov random field priors with different forms and degrees of smoothness. We applied the approach in several case-studies and consulting cases, showing that the methods are also computationally feasible in problems with many covariates and large data sets. In this paper, we choose two typical applications.  相似文献   

This article introduces a parametric robust way of determining the mean-variance relationship in the setting of generalized linear models. More specifically, the normal likelihood is properly amended to become asymptotically valid even if normality fails. Consequently, legitimate inference for the parametric relationship between mean and variance could be derived under model misspecification. More details are given to the scenario when the variance is proportional to an unknown power of the mean function. The efficacy of the novel technique is demonstrated via simulations and the analysis of two real data sets.  相似文献   

The authors describe a method for assessing model inadequacy in maximum likelihood estimation of a generalized linear mixed model. They treat the latent random effects in the model as missing data and develop the influence analysis on the basis of a Q‐function which is associated with the conditional expectation of the complete‐data log‐likelihood function in the EM algorithm. They propose a procedure to detect influential observations in six model perturbation schemes. They also illustrate their methodology in a hypothetical situation and in two real cases.  相似文献   

Abstract.  It is well known that one or more outlying points in the data may adversely affect the consistency of the quasi-likelihood or the likelihood estimators for the regression effects. Similar to the quasi-likelihood approach, the existing outliers-resistant Mallow's type quasi-likelihood (MQL) estimation approach may also produce biased regression estimators. As a remedy, by using a fully standardized score function in the MQL estimating equation, in this paper, we demonstrate that the fully standardized MQL estimators are almost unbiased ensuring its higher consistency performance. Both count and binary responses subject to one or more outliers are used in the study. The small sample as well as asymptotic results for the competitive estimators are discussed.  相似文献   

A generalized linear empirical Bayes model is developed for empirical Bayes analysis of several means in natural exponential families. A unified approach is presented for all natural exponential families with quadratic variance functions (the Normal, Poisson, Binomial, Gamma, and two others.) The hyperparameters are estimated using the extended quasi-likelihood of Nelder and Pregibon (1987), which is easily implemented via the GLIM package. The accuracy of these estimates is developed by asymptotic approximation of the variance. Two data examples are illustrated.  相似文献   

We propose a flexible functional approach for modelling generalized longitudinal data and survival time using principal components. In the proposed model the longitudinal observations can be continuous or categorical data, such as Gaussian, binomial or Poisson outcomes. We generalize the traditional joint models that treat categorical data as continuous data by using some transformations, such as CD4 counts. The proposed model is data-adaptive, which does not require pre-specified functional forms for longitudinal trajectories and automatically detects characteristic patterns. The longitudinal trajectories observed with measurement error or random error are represented by flexible basis functions through a possibly nonlinear link function, combining dimension reduction techniques resulting from functional principal component (FPC) analysis. The relationship between the longitudinal process and event history is assessed using a Cox regression model. Although the proposed model inherits the flexibility of non-parametric methods, the estimation procedure based on the EM algorithm is still parametric in computation, and thus simple and easy to implement. The computation is simplified by dimension reduction for random coefficients or FPC scores. An iterative selection procedure based on Akaike information criterion (AIC) is proposed to choose the tuning parameters, such as the knots of spline basis and the number of FPCs, so that appropriate degree of smoothness and fluctuation can be addressed. The effectiveness of the proposed approach is illustrated through a simulation study, followed by an application to longitudinal CD4 counts and survival data which were collected in a recent clinical trial to compare the efficiency and safety of two antiretroviral drugs.  相似文献   

