期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Markov-switching generalized additive models

Roland Langrock Thomas Kneib Richard Glennie Théo Michelot 《Statistics and Computing》2017,27(1):259-270

We consider Markov-switching regression models, i.e. models for time series regression analyses where the functional relationship between covariates and response is subject to regime switching controlled by an unobservable Markov chain. Building on the powerful hidden Markov model machinery and the methods for penalized B-splines routinely used in regression analyses, we develop a framework for nonparametrically estimating the functional form of the effect of the covariates in such a regression model, assuming an additive structure of the predictor. The resulting class of Markov-switching generalized additive models is immensely flexible, and contains as special cases the common parametric Markov-switching regression models and also generalized additive and generalized linear models. The feasibility of the suggested maximum penalized likelihood approach is demonstrated by simulation. We further illustrate the approach using two real data applications, modelling (i) how sales data depend on advertising spending and (ii) how energy price in Spain depends on the Euro/Dollar exchange rate. 相似文献

2.

Locally Efficient Semiparametric Estimators for Proportional Hazards Models with Measurement Error

Yuhang Xu Yehua Li Xiao Song 《Scandinavian Journal of Statistics》2016,43(2):558-572

We propose a new class of semiparametric estimators for proportional hazards models in the presence of measurement error in the covariates, where the baseline hazard function, the hazard function for the censoring time, and the distribution of the true covariates are considered as unknown infinite dimensional parameters. We estimate the model components by solving estimating equations based on the semiparametric efficient scores under a sequence of restricted models where the logarithm of the hazard functions are approximated by reduced rank regression splines. The proposed estimators are locally efficient in the sense that the estimators are semiparametrically efficient if the distribution of the error‐prone covariates is specified correctly and are still consistent and asymptotically normal if the distribution is misspecified. Our simulation studies show that the proposed estimators have smaller biases and variances than competing methods. We further illustrate the new method with a real application in an HIV clinical trial. 相似文献

3.

A dependent Dirichlet process model for survival data with competing risks

Shi Yushu Laud Purushottam Neuner Joan 《Lifetime data analysis》2021,27(1):156-176

In this paper, we first propose a dependent Dirichlet process (DDP) model using a mixture of Weibull models with each mixture component resembling a Cox model for survival data. We then build a Dirichlet process mixture model for competing risks data without regression covariates. Next we extend this model to a DDP model for competing risks regression data by using a multiplicative covariate effect on subdistribution hazards in the mixture components. Though built on proportional hazards (or subdistribution hazards) models, the proposed nonparametric Bayesian regression models do not require the assumption of constant hazard (or subdistribution hazard) ratio. An external time-dependent covariate is also considered in the survival model. After describing the model, we discuss how both cause-specific and subdistribution hazard ratios can be estimated from the same nonparametric Bayesian model for competing risks regression. For use with the regression models proposed, we introduce an omnibus prior that is suitable when little external information is available about covariate effects. Finally we compare the models’ performance with existing methods through simulations. We also illustrate the proposed competing risks regression model with data from a breast cancer study. An R package “DPWeibull” implementing all of the proposed methods is available at CRAN.

相似文献

4.

Dimensionally reduced mixtures of regression models

Angela Montanari Cinzia Viroli 《Journal of statistical planning and inference》2011,141(5):1744-1752

A mixture of regression models for multivariate observed variables which contextually involves a dimension reduction step through a linear factor model is proposed. The model estimation is performed via the EM-algorithm and a procedure to compute asymptotic standard errors for the parameter estimates is developed. The proposed approach is applied to the study of students satisfaction towards different aspects of their school as a function of various covariates. 相似文献

5.

Bayesian Semiparametric Modelling in Quantile Regression

ATHANASIOS KOTTAS MILOVAN KRNJAJI&#x; 《Scandinavian Journal of Statistics》2009,36(2):297-319

Abstract. We propose a Bayesian semiparametric methodology for quantile regression modelling. In particular, working with parametric quantile regression functions, we develop Dirichlet process mixture models for the error distribution in an additive quantile regression formulation. The proposed non‐parametric prior probability models allow the shape of the error density to adapt to the data and thus provide more reliable predictive inference than models based on parametric error distributions. We consider extensions to quantile regression for data sets that include censored observations. Moreover, we employ dependent Dirichlet processes to develop quantile regression models that allow the error distribution to change non‐parametrically with the covariates. Posterior inference is implemented using Markov chain Monte Carlo methods. We assess and compare the performance of our models using both simulated and real data sets. 相似文献

6.

Variable selection for semiparametric regression models with iterated penalization

Dai Y Ma S 《Journal of nonparametric statistics》2012,24(2):283-298

Semiparametric regression models with multiple covariates are commonly encountered. When there are covariates not associated with response variable, variable selection may lead to sparser models, more lucid interpretations and more accurate estimation. In this study, we adopt a sieve approach for the estimation of nonparametric covariate effects in semiparametric regression models. We adopt a two-step iterated penalization approach for variable selection. In the first step, a mixture of the Lasso and group Lasso penalties are employed to conduct the first-round variable selection and obtain the initial estimate. In the second step, a mixture of the weighted Lasso and weighted group Lasso penalties, with weights constructed using the initial estimate, are employed for variable selection. We show that the proposed iterated approach has the variable selection consistency property, even when number of unknown parameters diverges with sample size. Numerical studies, including simulation and analysis of a diabetes dataset, show satisfactory performance of the proposed approach. 相似文献

7.

Curve prediction and clustering with mixtures of Gaussian process functional regression models

J. Q. Shi B. Wang 《Statistics and Computing》2008,18(3):267-283

Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data. 相似文献

8.

Parameter Orthogonality in Mixed Regression Models for Survival Data

J. L. Hutton & P. J. Solomon 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1997,59(1):125-136

The implications of parameter orthogonality for the robustness of survival regression models are considered. The question of which of the proportional hazards or the accelerated life families of models would be more appropriate for analysis is usually ignored, and the proportional hazards family is applied, particularly in medicine, for convenience. Accelerated life models have conventionally been used in reliability applications. We propose a one-parameter family mixture survival model which includes both the accelerated life and the proportional hazards models. By orthogonalizing relative to the mixture parameter, we can show that, for small effects of the covariates, the regression parameters under the alternative families agree to within a constant. This recovers a known misspecification result. We use notions of parameter orthogonality to explore robustness to other types of misspecification including misspecified base-line hazards. The results hold in the presence of censoring. We also study the important question of when proportionality matters. 相似文献

9.

A semi-parametric cox’s regression model for zero-inflated left-censored time to event data

Roel Braekers Yves Grouwels 《统计学通讯:理论与方法》2013,42(7):1969-1988

Abstract

In some clinical, environmental, or economical studies, researchers are interested in a semi-continuous outcome variable which takes the value zero with a discrete probability and has a continuous distribution for the non-zero values. Due to the measuring mechanism, it is not always possible to fully observe some outcomes, and only an upper bound is recorded. We call this left-censored data and observe only the maximum of the outcome and an independent censoring variable, together with an indicator. In this article, we introduce a mixture semi-parametric regression model. We consider a parametric model to investigate the influence of covariates on the discrete probability of the value zero. For the non-zero part of the outcome, a semi-parametric Cox’s regression model is used to study the conditional hazard function. The different parameters in this mixture model are estimated using a likelihood method. Hereby the infinite dimensional baseline hazard function is estimated by a step function. As results, we show the identifiability and the consistency of the estimators for the different parameters in the model. We study the finite sample behaviour of the estimators through a simulation study and illustrate this model on a practical data example. 相似文献

10.

Covariate adjustment and estimation of mean response in randomised trials

下载免费PDF全文

Jonathan W. Bartlett 《Pharmaceutical statistics》2018,17(5):648-666

Analyses of randomised trials are often based on regression models which adjust for baseline covariates, in addition to randomised group. Based on such models, one can obtain estimates of the marginal mean outcome for the population under assignment to each treatment, by averaging the model‐based predictions across the empirical distribution of the baseline covariates in the trial. We identify under what conditions such estimates are consistent, and in particular show that for canonical generalised linear models, the resulting estimates are always consistent. We show that a recently proposed variance estimator underestimates the variance of the estimator around the true marginal population mean when the baseline covariates are not fixed in repeated sampling and provide a simple adjustment to remedy this. We also describe an alternative semiparametric estimator, which is consistent even when the outcome regression model used is misspecified. The different estimators are compared through simulations and application to a recently conducted trial in asthma. 相似文献

11.

Inferences in dynamic logit models in semi-parametric setup for repeated binary data

Nan Zheng Brajendra C. Sutradhar 《Journal of Statistical Computation and Simulation》2018,88(7):1295-1313

Binary dynamic fixed and mixed logit models are extensively studied in the literature. These models are developed to examine the effects of certain fixed covariates through a parametric regression function as a part of the models. However, there are situations where one may like to consider more covariates in the model but their direct effect is not of interest. In this paper we propose a generalization of the existing binary dynamic logit (BDL) models to the semi-parametric longitudinal setup to address this issue of additional covariates. The regression function involved in such a semi-parametric BDL model contains (i) a parametric linear regression function in some primary covariates, and (ii) a non-parametric function in certain secondary covariates. We use a simple semi-parametric conditional quasi-likelihood approach for consistent estimation of the non-parametric function, and a semi-parametric likelihood approach for the joint estimation of the main regression and dynamic dependence parameters of the model. The finite sample performance of the estimation approaches is examined through a simulation study. The asymptotic properties of the estimators are also discussed. The proposed model and the estimation approaches are illustrated by reanalysing a longitudinal infectious disease data. 相似文献

12.

Smooth expectiles for panel data using penalized splines

Linda Schulze Waltrup Göran Kauermann 《Statistics and Computing》2017,27(1):271-282

Expectile regression is a topic which became popular in the last years. It includes ordinary mean regression as special case but is more general as it offers the possibility to also model non-central parts of a distribution. Semi-parametric expectile models have recently been developed and it is easy to perform flexible expectile estimation with modern software like R. We extend the model class by allowing for panel observations, i.e. clustered data with repeated measurements taken at the same individual. A random (individual) effect is incorporated in the model which accounts for the dependence structure in the data. We fit expectile sheets, meaning that not a single expectile is estimated but a whole range of expectiles is estimated simultaneously. The presented model allows for multiple covariates, where a semi-parametric approach with penalized splines is pursued to fit smooth expectile curves. We apply our methods to panel data from the German Socio-Economic Panel. 相似文献

13.

Joint regression modeling for missing categorical covariates in generalized linear models

Luis Carlos Pérez-Ruiz Gabriel Escarela 《Journal of applied statistics》2018,45(15):2741-2759

Missing covariates data is a common issue in generalized linear models (GLMs). A model-based procedure arising from properly specifying joint models for both the partially observed covariates and the corresponding missing indicator variables represents a sound and flexible methodology, which lends itself to maximum likelihood estimation as the likelihood function is available in computable form. In this paper, a novel model-based methodology is proposed for the regression analysis of GLMs when the partially observed covariates are categorical. Pair-copula constructions are used as graphical tools in order to facilitate the specification of the high-dimensional probability distributions of the underlying missingness components. The model parameters are estimated by maximizing the weighted log-likelihood function by using an EM algorithm. In order to compare the performance of the proposed methodology with other well-established approaches, which include complete-cases and multiple imputation, several simulation experiments of Binomial, Poisson and Normal regressions are carried out under both missing at random and non-missing at random mechanisms scenarios. The methods are illustrated by modeling data from a stage III melanoma clinical trial. The results show that the methodology is rather robust and flexible, representing a competitive alternative to traditional techniques. 相似文献

14.

Mixture models in measurement error problems, with reference to epidemiological studies

Sylvia Richardson Laurent Leblond Isabelle Jaussent Peter J. Green 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2002,165(3):549-566

Summary. The paper focuses on a Bayesian treatment of measurement error problems and on the question of the specification of the prior distribution of the unknown covariates. It presents a flexible semiparametric model for this distribution based on a mixture of normal distributions with an unknown number of components. Implementation of this prior model as part of a full Bayesian analysis of measurement error problems is described in classical set-ups that are encountered in epidemiological studies: logistic regression between unknown covariates and outcome, with a normal or log-normal error model and a validation group. The feasibility of this combined model is tested and its performance is demonstrated in a simulation study that includes an assessment of the influence of misspecification of the prior distribution of the unknown covariates and a comparison with the semiparametric maximum likelihood method of Roeder, Carroll and Lindsay. Finally, the methodology is illustrated on a data set on coronary heart disease and cholesterol levels in blood. 相似文献

15.

Flexible competing risks regression modeling and goodness-of-fit

Scheike TH Zhang MJ 《Lifetime data analysis》2008,14(4):464-483

In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause-specific hazards. Another recent approach is to directly model the cumulative incidence by a proportional model (Fine and Gray, J Am Stat Assoc 94:496–509, 1999), and then obtain direct estimates of how covariates influences the cumulative incidence curve. We consider a simple and flexible class of regression models that is easy to fit and contains the Fine–Gray model as a special case. One advantage of this approach is that our regression modeling allows for non-proportional hazards. This leads to a new simple goodness-of-fit procedure for the proportional subdistribution hazards assumption that is very easy to use. The test is constructive in the sense that it shows exactly where non-proportionality is present. We illustrate our methods to a bone marrow transplant data from the Center for International Blood and Marrow Transplant Research (CIBMTR). Through this data example we demonstrate the use of the flexible regression models to analyze competing risks data when non-proportionality is present in the data. 相似文献

16.

Semiparametric latent variable regression models for spatiotemporal modelling of mobile source particles in the greater Boston area 总被引：1，自引：0，他引：1

Alexandros Gryparis Brent A. Coull Joel Schwartz Helen H. Suh 《Journal of the Royal Statistical Society. Series C, Applied statistics》2007,56(2):183-209

Summary. Traffic particle concentrations show considerable spatial variability within a metropolitan area. We consider latent variable semiparametric regression models for modelling the spatial and temporal variability of black carbon and elemental carbon concentrations in the greater Boston area. Measurements of these pollutants, which are markers of traffic particles, were obtained from several individual exposure studies that were conducted at specific household locations as well as 15 ambient monitoring sites in the area. The models allow for both flexible non-linear effects of covariates and for unexplained spatial and temporal variability in exposure. In addition, the different individual exposure studies recorded different surrogates of traffic particles, with some recording only outdoor concentrations of black or elemental carbon, some recording indoor concentrations of black carbon and others recording both indoor and outdoor concentrations of black carbon. A joint model for outdoor and indoor exposure that specifies a spatially varying latent variable provides greater spatial coverage in the area of interest. We propose a penalized spline formulation of the model that relates to generalized kriging of the latent traffic pollution variable and leads to a natural Bayesian Markov chain Monte Carlo algorithm for model fitting. We propose methods that allow us to control the degrees of freedom of the smoother in a Bayesian framework. Finally, we present results from an analysis that applies the model to data from summer and winter separately. 相似文献

17.

Local Linear Regression in Proportional Hazards Model with Censored Data

Xiaobing Zhao Xianyi Wu 《统计学通讯:理论与方法》2013,42(15):2761-2776

In this article we study the method of nonparametric regression based on a transformation model, under which an unknown transformation of the survival time is nonlinearly, even more, nonparametrically, related to the covariates with various error distributions, which are parametrically specified with unknown parameters. Local linear approximations and locally weighted least squares are applied to obtain estimators for the effects of covariates with censored observations. We show that the estimators are consistent and asymptotically normal. This transformation model, coupled with local linear approximation techniques, provides many alternatives to the more general proportional hazards models with nonparametric covariates. 相似文献

18.

Straightforward intermediate rank tensor product smoothing in mixed models 总被引：3，自引：0，他引：3

Simon N. Wood Fabian Scheipl Julian J. Faraway 《Statistics and Computing》2013,23(3):341-360

Tensor product smooths provide the natural way of representing smooth interaction terms in regression models because they are invariant to the units in which the covariates are measured, hence avoiding the need for arbitrary decisions about relative scaling of variables. They would also be the natural way to represent smooth interactions in mixed regression models, but for the fact that the tensor product constructions proposed to date are difficult or impossible to estimate using most standard mixed modelling software. This paper proposes a new approach to the construction of tensor product smooths, which allows the smooth to be written as the sum of some fixed effects and some sets of i.i.d. Gaussian random effects: no previously published construction achieves this. Because of the simplicity of this random effects structure, our construction is useable with almost any flexible mixed modelling software, allowing smooth interaction terms to be readily incorporated into any Generalized Linear Mixed Model. To achieve the computationally convenient separation of smoothing penalties, the construction differs from previous tensor product approaches in the penalties used to control smoothness, but the penalties have the advantage over several alternative approaches of being explicitly interpretable in terms of function shape. Like all tensor product smoothing methods, our approach builds up smooth functions of several variables from marginal smooths of lower dimension, but unlike much of the previous literature we treat the general case in which the marginal smooths can be any quadratically penalized basis expansion, and there can be any number of them. We also point out that the imposition of identifiability constraints on smoothers requires more care in the mixed model setting than it would in a simple additive model setting, and show how to deal with the issue. An interesting side effect of our construction is that an ANOVA-decomposition of the smooth can be read off from the estimates, although this is not our primary focus. We were motivated to undertake this work by applied problems in the analysis of abundance survey data, and two examples of this are presented. 相似文献

19.

Semiparametric lower bounds for tail index estimation

《Journal of statistical planning and inference》2006,136(3):705-729

We consider estimation of the tail index parameter from i.i.d. observations in Pareto and Weibull type models, using a local and asymptotic approach. The slowly varying function describing the non-tail behavior of the distribution is considered as an infinite dimensional nuisance parameter. Without further regularity conditions, we derive a local asymptotic normality (LAN) result for suitably chosen parametric submodels of the full semiparametric model. From this result, we immediately obtain the optimal rate of convergence of tail index parameter estimators for more specific models previously studied. On top of the optimal rate of convergence, our LAN result also gives the minimal limiting variance of estimators (regular for our parametric model) through the convolution theorem. We show that the classical Hill estimator is regular for the submodels introduced with limiting variance equal to the induced convolution theorem bound. We also discuss the Weibull model in this respect. 相似文献

20.

Exponentiated Weibull regression for time-to-event data

Shahedul A. Khan 《Lifetime data analysis》2018,24(2):328-354

The Weibull, log-logistic and log-normal distributions are extensively used to model time-to-event data. The Weibull family accommodates only monotone hazard rates, whereas the log-logistic and log-normal are widely used to model unimodal hazard functions. The increasing availability of lifetime data with a wide range of characteristics motivate us to develop more flexible models that accommodate both monotone and nonmonotone hazard functions. One such model is the exponentiated Weibull distribution which not only accommodates monotone hazard functions but also allows for unimodal and bathtub shape hazard rates. This distribution has demonstrated considerable potential in univariate analysis of time-to-event data. However, the primary focus of many studies is rather on understanding the relationship between the time to the occurrence of an event and one or more covariates. This leads to a consideration of regression models that can be formulated in different ways in survival analysis. One such strategy involves formulating models for the accelerated failure time family of distributions. The most commonly used distributions serving this purpose are the Weibull, log-logistic and log-normal distributions. In this study, we show that the exponentiated Weibull distribution is closed under the accelerated failure time family. We then formulate a regression model based on the exponentiated Weibull distribution, and develop large sample theory for statistical inference. We also describe a Bayesian approach for inference. Two comparative studies based on real and simulated data sets reveal that the exponentiated Weibull regression can be valuable in adequately describing different types of time-to-event data. 相似文献