首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 289 毫秒
1.
Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) vector with a few nonzero components, those covariates that are most important. This article extends the “global‐local” shrinkage idea to a scenario where one wishes to model multiple response variables simultaneously. Here, we have developed a variable selection method for a K‐outcome model (multivariate regression) that identifies the most important covariates across all outcomes. The prior for all regression coefficients is a mean zero normal with coefficient‐specific variance term that consists of a predictor‐specific factor (shared local shrinkage parameter) and a model‐specific factor (global shrinkage term) that differs in each model. The performance of our modeling approach is evaluated through simulation studies and a data example.  相似文献   

2.
Abstract.  We study a binary regression model using the complementary log–log link, where the response variable Δ is the indicator of an event of interest (for example, the incidence of cancer, or the detection of a tumour) and the set of covariates can be partitioned as ( X ,  Z ) where Z (real valued) is the primary covariate and X (vector valued) denotes a set of control variables. The conditional probability of the event of interest is assumed to be monotonic in Z , for every fixed X . A finite-dimensional (regression) parameter β describes the effect of X . We show that the baseline conditional probability function (corresponding to X  =  0 ) can be estimated by isotonic regression procedures and develop an asymptotically pivotal likelihood-ratio-based method for constructing (asymptotic) confidence sets for the regression function. We also show how likelihood-ratio-based confidence intervals for the regression parameter can be constructed using the chi-square distribution. An interesting connection to the Cox proportional hazards model under current status censoring emerges. We present simulation results to illustrate the theory and apply our results to a data set involving lung tumour incidence in mice.  相似文献   

3.
Summary. The paper focuses on a Bayesian treatment of measurement error problems and on the question of the specification of the prior distribution of the unknown covariates. It presents a flexible semiparametric model for this distribution based on a mixture of normal distributions with an unknown number of components. Implementation of this prior model as part of a full Bayesian analysis of measurement error problems is described in classical set-ups that are encountered in epidemiological studies: logistic regression between unknown covariates and outcome, with a normal or log-normal error model and a validation group. The feasibility of this combined model is tested and its performance is demonstrated in a simulation study that includes an assessment of the influence of misspecification of the prior distribution of the unknown covariates and a comparison with the semiparametric maximum likelihood method of Roeder, Carroll and Lindsay. Finally, the methodology is illustrated on a data set on coronary heart disease and cholesterol levels in blood.  相似文献   

4.
Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data.  相似文献   

5.
Beta-Binomial回归模型及其应用   总被引:1,自引:0,他引:1  
在成败型试验中或满意度支持率调查中,Beta-Binomial分布常被用来刻画具有偏大离差的计数型比例数据,由此提出Beta-Binomial回归模型,研究参数的最大似然估计方法并基于Newton-Raphson算法给出参数估计的迭代方法;重点讨论模型中回归参数和相关性参数存在的检验问题,提出Score检验方法并通过数值模拟研究Score检验统计量的检验功效问题;实例分析证明Beta-Binomial回归模型的有用性。  相似文献   

6.
We propose methods for Bayesian inference for missing covariate data with a novel class of semi-parametric survival models with a cure fraction. We allow the missing covariates to be either categorical or continuous and specify a parametric distribution for the covariates that is written as a sequence of one dimensional conditional distributions. We assume that the missing covariates are missing at random (MAR) throughout. We propose an informative class of joint prior distributions for the regression coefficients and the parameters arising from the covariate distributions. The proposed class of priors are shown to be useful in recovering information on the missing covariates especially in situations where the missing data fraction is large. Properties of the proposed prior and resulting posterior distributions are examined. Also, model checking techniques are proposed for sensitivity analyses and for checking the goodness of fit of a particular model. Specifically, we extend the Conditional Predictive Ordinate (CPO) statistic to assess goodness of fit in the presence of missing covariate data. Computational techniques using the Gibbs sampler are implemented. A real data set involving a melanoma cancer clinical trial is examined to demonstrate the methodology.  相似文献   

7.
Clustered survival data arise often in clinical trial design, where the correlated subunits from the same cluster are randomized to different treatment groups. Under such design, we consider the problem of constructing confidence interval for the difference of two median survival time given the covariates. We use Cox gamma frailty model to account for the within-cluster correlation. Based on the conditional confidence intervals, we can identify the possible range of covariates over which the two groups would provide different median survival times. The associated coverage probability and the expected length of the proposed interval are investigated via a simulation study. The implementation of the confidence intervals is illustrated using a real data set.  相似文献   

8.
Among the diverse frameworks that have been proposed for regression analysis of angular data, the projected multivariate linear model provides a particularly appealing and tractable methodology. In this model, the observed directional responses are assumed to correspond to the angles formed by latent bivariate normal random vectors that are assumed to depend upon covariates through a linear model. This implies an angular normal distribution for the observed angles, and incorporates a regression structure through a familiar and convenient relationship. In this paper we extend this methodology to accommodate clustered data (e.g., longitudinal or repeated measures data) by formulating a marginal version of the model and basing estimation on an EM‐like algorithm in which correlation among within‐cluster responses is taken into account by incorporating a working correlation matrix into the M step. A sandwich estimator is used for the parameter estimates’ covariance matrix. The methodology is motivated and illustrated using an example involving clustered measurements of microbril angle on loblolly pine (Pinus taeda L.) Simulation studies are presented that evaluate the finite sample properties of the proposed fitting method. In addition, the relationship between within‐cluster correlation on the latent Euclidean vectors and the corresponding correlation structure for the observed angles is explored.  相似文献   

9.
The authors define a class of “partially linear single‐index” survival models that are more flexible than the classical proportional hazards regression models in their treatment of covariates. The latter enter the proposed model either via a parametric linear form or a nonparametric single‐index form. It is then possible to model both linear and functional effects of covariates on the logarithm of the hazard function and if necessary, to reduce the dimensionality of multiple covariates via the single‐index component. The partially linear hazards model and the single‐index hazards model are special cases of the proposed model. The authors develop a likelihood‐based inference to estimate the model components via an iterative algorithm. They establish an asymptotic distribution theory for the proposed estimators, examine their finite‐sample behaviour through simulation, and use a set of real data to illustrate their approach.  相似文献   

10.
The distribution(s) of future response(s) given a set of data from an informative experiment is known as prediction distribution. The paper derives the prediction distribution(s) from a linear regression model with a multivari-ate Student-t error distribution using the structural relations of the model. We observe that the prediction distribution(s) are multivariate t-variate(s) with degrees of freedom which do not depend on the degrees of freedom of the error distribution.  相似文献   

11.
A five-parameter extended fatigue life model called the McDonald–Birnbaum–Saunders (McBS) distribution is proposed. It extends the Birnbaum–Saunders and beta Birnbaum–Saunders [G.M. Cordeiro and A.J. Lemonte, The β-Birnbaum–Saunders distribution: An improved distribution for fatigue life modeling. Comput. Statist. Data Anal. 55 (2011), pp. 1445–1461] distributions and also the new Kumaraswamy–Birnbaum–Saunders distribution. We obtain the ordinary moments, generating function, mean deviations and quantile function. The method of maximum likelihood is used to estimate the model parameters and its potentiality is illustrated with an application to a real fatigue data set. Further, we propose a new extended regression model based on the logarithm of the McBS distribution. This model can be very useful to the analysis of real data and could give more realistic fits than other special regression models.  相似文献   

12.
Expectile regression [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847] is a nice tool for estimating the conditional expectiles of a response variable given a set of covariates. Expectile regression at 50% level is the classical conditional mean regression. In many real applications having multiple expectiles at different levels provides a more complete picture of the conditional distribution of the response variable. Multiple linear expectile regression model has been well studied [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847; Efron B. Regression percentiles using asymmetric squared error loss, Stat Sin. 1991;1(93):125.], but it can be too restrictive for many real applications. In this paper, we derive a regression tree-based gradient boosting estimator for nonparametric multiple expectile regression. The new estimator, referred to as ER-Boost, is implemented in an R package erboost publicly available at http://cran.r-project.org/web/packages/erboost/index.html. We use two homoscedastic/heteroscedastic random-function-generator models in simulation to show the high predictive accuracy of ER-Boost. As an application, we apply ER-Boost to analyse North Carolina County crime data. From the nonparametric expectile regression analysis of this dataset, we draw several interesting conclusions that are consistent with the previous study using the economic model of crime. This real data example also provides a good demonstration of some nice features of ER-Boost, such as its ability to handle different types of covariates and its model interpretation tools.  相似文献   

13.
ABSTRACT

This paper analyses the behaviour of the goodness-of-fit tests for regression models. To this end, it uses statistics based on an estimation of the integrated regression function with missing observations either in the response variable or in some of the covariates. It proposes several versions of one empirical process, constructed from a previous estimation, that uses only the complete observations or replaces the missing observations with imputed values. In the case of missing covariates, a link model is used to fill the missing observations with other complete covariates. In all the situations, Bootstrap methodology is used to calibrate the distribution of the test statistics. A broad simulation study compares the different procedures based on empirical regression methodology, with smoothed tests previously studied in the literature. The comparison reflects the effect of the correlation between the covariates in the tests based on the imputed sample for missing covariates. In addition, the paper proposes a computational binning strategy to evaluate the tests based on an empirical process for large data sets. Finally, two applications to real data illustrate the performance of the tests.  相似文献   

14.
Recurrent event data often arise in biomedical studies, with examples including hospitalizations, infections, and treatment failures. In observational studies, it is often of interest to estimate the effects of covariates on the marginal recurrent event rate. The majority of existing rate regression methods assume multiplicative covariate effects. We propose a semiparametric model for the marginal recurrent event rate, wherein the covariates are assumed to add to the unspecified baseline rate. Covariate effects are summarized by rate differences, meaning that the absolute effect on the rate function can be determined from the regression coefficient alone. We describe modifications of the proposed method to accommodate a terminating event (e.g., death). Proposed estimators of the regression parameters and baseline rate are shown to be consistent and asymptotically Gaussian. Simulation studies demonstrate that the asymptotic approximations are accurate in finite samples. The proposed methods are applied to a state-wide kidney transplant data set.  相似文献   

15.
In recent years, a variety of regression models, including zero-inflated and hurdle versions, have been proposed to explain the case of a dependent variable with respect to exogenous covariates. Apart from the classical Poisson, negative binomial and generalised Poisson distributions, many proposals have appeared in the statistical literature, perhaps in response to the new possibilities offered by advanced software that now enables researchers to implement numerous special functions in a relatively simple way. However, we believe that a significant research gap remains, since very little attention has been paid to the quasi-binomial distribution, which was first proposed over fifty years ago. We believe this distribution might constitute a valid alternative to existing regression models, in situations in which the variable has bounded support. Therefore, in this paper we present a zero-inflated regression model based on the quasi-binomial distribution, taking into account the moments and maximum likelihood estimators, and perform a score test to compare the zero-inflated quasi-binomial distribution with the zero-inflated binomial distribution, and the zero-inflated model with the homogeneous model (the model in which covariates are not considered). This analysis is illustrated with two data sets that are well known in the statistical literature and which contain a large number of zeros.  相似文献   

16.
Abstract

In this article we propose a new mixed-effects regression model for fractional bounded response variables. Our model allows us to incorporate covariates directly to the expected value, so we can quantify exactly the influence of these covariates in the mean of the variable of interest rather than to the conditional mean. Estimation is carried out from a Bayesian perspective. Due to the complexity of the augmented posterior distribution, we use a Hamiltonian Monte Carlo algorithm, the No-U-Turn sampler, implemented using the Stan software. A simulation study was performed showing that our model has a better performance than other traditional longitudinal models for bounded variables. Finally, we applied our beta-inflated mean mixed-effects regression model to real data which consists of utilization of credit lines in the peruvian financial system.  相似文献   

17.
The proportional reversed hazards model explains the multiplicative effect of covariates on the baseline reversed hazard rate function of lifetimes. In the present study, we introduce a proportional cause-specific reversed hazards model. The proposed regression model facilitates the analysis of failure time data with multiple causes of failure under left censoring. We estimate the regression parameters using a partial likelihood approach. We provide Breslow's type estimators for the cumulative cause-specific reversed hazard rate functions. Asymptotic properties of the estimators are discussed. Simulation studies are conducted to assess their performance. We illustrate the applicability of the proposed model using a real data set.  相似文献   

18.
In 2009 a survey was performed in Veneto, a region in the north-east of Italy, to study the demand for wine and specifically for Passito, a typical Italian wine. The main goal of the study consisted in analyzing how the preferences and consumption habits of Passito vary depending on consumers’ characteristics. Specifically two kinds of statistical methods were applied: Covariate Uniform Binomial (CUB) model, a statistical approach for ordinal data to study the feeling toward Passito and the uncertainty of the respondents; classical logistic regression analysis, to describe how the attitude toward passito can be modeled as function of consumers’ covariates. Gender and residence were the most important covariates, useful in defining segments of consumers with significant differences in terms of Passito's preferences and consumption behavior. The logistic regression analysis allowed to complete the statistical analysis based on CUB models validating the results of the CUB model and estimating a model useful to predict the attitude toward the considered product for specific sub-groups of consumers.  相似文献   

19.
We are concerned with cumulative regression models for an ordered categorical response variable Y. We propose two methods to build partial residuals from regression on a subset Z1 of covariates Z., which take into regard the ordinal character of the response. The first method makes use of a multivariate GLM-representation of the model and produces residual measures for diagnostic purposes. The second uses a latent continuous variable model and yields new (adjusted) ordinal data Y*. Both methods are illustrated by a data set from forestry.  相似文献   

20.
When confronted with multiple covariates and a response variable, analysts sometimes apply a variable‐selection algorithm to the covariate‐response data to identify a subset of covariates potentially associated with the response, and then wish to make inferences about parameters in a model for the marginal association between the selected covariates and the response. If an independent data set were available, the parameters of interest could be estimated by using standard inference methods to fit the postulated marginal model to the independent data set. However, when applied to the same data set used by the variable selector, standard (“naive”) methods can lead to distorted inferences. The authors develop testing and interval estimation methods for parameters reflecting the marginal association between the selected covariates and response variable, based on the same data set used for variable selection. They provide theoretical justification for the proposed methods, present results to guide their implementation, and use simulations to assess and compare their performance to a sample‐splitting approach. The methods are illustrated with data from a recent AIDS study. The Canadian Journal of Statistics 37: 625–644; 2009 © 2009 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号