首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the causal analysis of survival data a time-based response is related to a set of explanatory variables. Definition of the relation between the time and the covariates may become a difficult task, particularly in the preliminary stage, when the information is limited. Through a nonparametric approach, we propose to estimate the survival function allowing to evaluate the relative importance of each potential explanatory variable, in a simple and explanatory fashion. To achieve this aim, each of the explanatory variables is used to partition the observed survival times. The observations are assumed to be partially exchangeable according to such partition. We then consider, conditionally on each partition, a hierarchical nonparametric Bayesian model on the hazard functions. We define and compare different prior distribution for the hazard functions.  相似文献   

2.
Adjustment for covariates is a time-honored tool in statistical analysis and is often implemented by including the covariates that one intends to adjust as additional predictors in a model. This adjustment often does not work well when the underlying model is misspecified. We consider here the situation where we compare a response between two groups. This response may depend on a covariate for which the distribution differs between the two groups one intends to compare. This creates the potential that observed differences are due to differences in covariate levels rather than “genuine” population differences that cannot be explained by covariate differences. We propose a bootstrap-based adjustment method. Bootstrap weights are constructed with the aim of aligning bootstrap–weighted empirical distributions of the covariate between the two groups. Generally, the proposed weighted-bootstrap algorithm can be used to align or match the values of an explanatory variable as closely as desired to those of a given target distribution. We illustrate the proposed bootstrap adjustment method in simulations and in the analysis of data on the fecundity of historical cohorts of French-Canadian women.  相似文献   

3.
Covariate informed product partition models incorporate the intuitively appealing notion that individuals or units with similar covariate values a priori have a higher probability of co-clustering than those with dissimilar covariate values. These methods have been shown to perform well if the number of covariates is relatively small. However, as the number of covariates increase, their influence on partition probabilities overwhelm any information the response may provide in clustering and often encourage partitions with either a large number of singleton clusters or one large cluster resulting in poor model fit and poor out-of-sample prediction. This same phenomenon is observed in Bayesian nonparametric regression methods that induce a conditional distribution for the response given covariates through a joint model. In light of this, we propose two methods that calibrate the covariate-dependent partition model by capping the influence that covariates have on partition probabilities. We demonstrate the new methods’ utility using simulation and two publicly available datasets.  相似文献   

4.
The accelerated hazard model in survival analysis assumes that the covariate effect acts the time scale of the baseline hazard rate. In this paper, we study the stochastic properties of the mixed accelerated hazard model since the covariate is considered basically unobservable. We build dependence structure between the population variable and the covariate, and also present some preservation properties. Using some well-known stochastic orders, we compare two mixed accelerated hazards models arising out of different choices of distributions for unobservable covariates or different baseline hazard rate functions.  相似文献   

5.
The varying-coefficient model is an important nonparametric statistical model since it allows appreciable flexibility on the structure of fitted model. For ultra-high dimensional heterogeneous data it is very necessary to examine how the effects of covariates vary with exposure variables at different quantile level of interest. In this paper, we extended the marginal screening methods to examine and select variables by ranking a measure of nonparametric marginal contributions of each covariate given the exposure variable. Spline approximations are employed to model marginal effects and select the set of active variables in quantile-adaptive framework. This ensures the sure screening property in quantile-adaptive varying-coefficient model. Numerical studies demonstrate that the proposed procedure works well for heteroscedastic data.  相似文献   

6.
With competing risks data, one often needs to assess the treatment and covariate effects on the cumulative incidence function. Fine and Gray proposed a proportional hazards regression model for the subdistribution of a competing risk with the assumption that the censoring distribution and the covariates are independent. Covariate‐dependent censoring sometimes occurs in medical studies. In this paper, we study the proportional hazards regression model for the subdistribution of a competing risk with proper adjustments for covariate‐dependent censoring. We consider a covariate‐adjusted weight function by fitting the Cox model for the censoring distribution and using the predictive probability for each individual. Our simulation study shows that the covariate‐adjusted weight estimator is basically unbiased when the censoring time depends on the covariates, and the covariate‐adjusted weight approach works well for the variance estimator as well. We illustrate our methods with bone marrow transplant data from the Center for International Blood and Marrow Transplant Research. Here, cancer relapse and death in complete remission are two competing risks.  相似文献   

7.
Varying-coefficient partially linear models provide a useful tools for modeling of covariate effects on the response variable in regression. One key question in varying-coefficient partially linear models is the choice of model structure, that is, how to decide which covariates have linear effect and which have non linear effect. In this article, we propose a profile method for identifying the covariates with linear effect or non linear effect. Our proposed method is a penalized regression approach based on group minimax concave penalty. Under suitable conditions, we show that the proposed method can correctly determine which covariates have a linear effect and which do not with high probability. The convergence rate of the linear estimator is established as well as the asymptotical normality. The performance of the proposed method is evaluated through a simulation study which supports our theoretical results.  相似文献   

8.
Multinomial logit (also termed multi-logit) models permit the analysis of the statistical relation between a categorical response variable and a set of explicative variables (called covariates or regressors). Although multinomial logit is widely used in both the social and economic sciences, the interpretation of regression coefficients may be tricky, as the effect of covariates on the probability distribution of the response variable is nonconstant and difficult to quantify. The ternary plots illustrated in this article aim at facilitating the interpretation of regression coefficients and permit the effect of covariates (either singularly or jointly considered) on the probability distribution of the dependent variable to be quantified. Ternary plots can be drawn both for ordered and for unordered categorical dependent variables, when the number of possible outcomes equals three (trinomial response variable); these plots allow not only to represent the covariate effects over the whole parameter space of the dependent variable but also to compare the covariate effects of any given individual profile. The method is illustrated and discussed through analysis of a dataset concerning the transition of master’s graduates of the University of Trento (Italy) from university to employment.  相似文献   

9.
A pivotal characteristic of credit defaults that is ignored by most credit scoring models is the rarity of the event. The most widely used model to estimate the probability of default is the logistic regression model. Since the dependent variable represents a rare event, the logistic regression model shows relevant drawbacks, for example, underestimation of the default probability, which could be very risky for banks. In order to overcome these drawbacks, we propose the generalized extreme value regression model. In particular, in a generalized linear model (GLM) with the binary-dependent variable we suggest the quantile function of the GEV distribution as link function, so our attention is focused on the tail of the response curve for values close to one. The estimation procedure used is the maximum-likelihood method. This model accommodates skewness and it presents a generalisation of GLMs with complementary log–log link function. We analyse its performance by simulation studies. Finally, we apply the proposed model to empirical data on Italian small and medium enterprises.  相似文献   

10.
Both continuous and categorical covariates are common in traditional Chinese medicine (TCM) research, especially in the clinical syndrome identification and in the risk prediction research. For groups of dummy variables which are generated by the same categorical covariate, it is important to penalize them group-wise rather than individually. In this paper, we discuss the group lasso method for a risk prediction analysis in TCM osteoporosis research. It is the first time to apply such a group-wise variable selection method in this field. It may lead to new insights of using the grouped penalization method to select appropriate covariates in the TCM research. The introduced methodology can select categorical and continuous variables, and estimate their parameters simultaneously. In our application of the osteoporosis data, four covariates (including both categorical and continuous covariates) are selected out of 52 covariates. The accuracy of the prediction model is excellent. Compared with the prediction model with different covariates, the group lasso risk prediction model can significantly decrease the error rate and help TCM doctors to identify patients with a high risk of osteoporosis in clinical practice. Simulation results show that the application of the group lasso method is reasonable for the categorical covariates selection model in this TCM osteoporosis research.  相似文献   

11.
In this paper, we propose a quantile approach to the multi-index semiparametric model for an ordinal response variable. Permitting non-parametric transformation of the response, the proposed method achieves a root-n rate of convergence and has attractive robustness properties. Further, the proposed model allows additional indices to model the remaining correlations between covariates and the residuals from the single-index, considerably reducing the error variance and thus leading to more efficient prediction intervals (PIs). The utility of the model is demonstrated by estimating PIs for functional status of the elderly based on data from the second longitudinal study of aging. It is shown that the proposed multi-index model provides significantly narrower PIs than competing models. Our approach can be applied to other areas in which the distribution of future observations must be predicted from ordinal response data.  相似文献   

12.
In this paper, we study the identification of Bayesian regression models, when an ordinal covariate is subject to unidirectional misclassification. Xia and Gustafson [Bayesian regression models adjusting for unidirectional covariate misclassification. Can J Stat. 2016;44(2):198–218] obtained model identifiability for non-binary regression models, when there is a binary covariate subject to unidirectional misclassification. In the current paper, we establish the moment identifiability of regression models for misclassified ordinal covariates with more than two categories, based on forms of observable moments. Computational studies are conducted that confirm the theoretical results. We apply the method to two datasets, one from the Medical Expenditure Panel Survey (MEPS), and the other from Translational Research Investigating Underlying Disparities in Acute Myocardial infarction Patients Health Status (TRIUMPH).  相似文献   

13.
张晶等 《统计研究》2020,37(11):57-67
近年来,我国消费金融发展迅速,但同时也面临着更加复杂的欺诈和信用风险,为了更好地对消费金融中借贷客户的信用风险进行监测,本文提出了基于稀疏结构连续比率模型的风控方法。相对于传统的二分类模型,该模型的特点是可以处理借贷客户被分为三类或三类以上的有序数据,估计系数的同时能从众多纷繁复杂的数据中自动筛选重要变量,并在变量筛选过程中考虑不同子模型系数的结构特征。通过蒙特卡洛模拟发现,本文所提出的稀疏结构连续比率模型在分类泛化误差和变量筛选上的表现都较好。最后将本文提出的模型应用到实际的消费金融信用风险分析中,针对传统征信信息不足的借款人,通过引入高频电商消费行为数据,利用本文提出的高维有序多分类模型能有效识别借款人的信用风险,可以弥补传统征信方法的不足。  相似文献   

14.
Statistical modeling of credit risk for retail clients is considered. Due to the lack of detailed updated information about the counterparty, traditional approaches such as Merton’s firm-value model, are not applicable. Moreover, the credit default data for retail clients typically exhibit a very small percentage of default rates. This motivates a statistical model based on survival analysis under extreme censoring for the time-to-default variable. The model incorporates the stochastic nature of default and is based on incomplete information. Consistency and asymptotic normality of maximum likelihood estimates of the parameters characterizing the time-to-default distribution are derived. A criterion for constructing confidence ellipsoids for the parameters is obtained from the asymptotic results. An extended model with explanatory variables is also discussed. The results are illustrated by a data example with 670 mortgages.  相似文献   

15.
In this study, we consider the problem of selecting explanatory variables of fixed effects in linear mixed models under covariate shift, which is when the values of covariates in the model for prediction differ from those in the model for observed data. We construct a variable selection criterion based on the conditional Akaike information introduced by Vaida & Blanchard (2005). We focus especially on covariate shift in small area estimation and demonstrate the usefulness of the proposed criterion. In addition, numerical performance is investigated through simulations, one of which is a design‐based simulation using a real dataset of land prices. The Canadian Journal of Statistics 46: 316–335; 2018 © 2018 Statistical Society of Canada  相似文献   

16.
We present an objective Bayes method for covariance selection in Gaussian multivariate regression models having a sparse regression and covariance structure, the latter being Markov with respect to a directed acyclic graph (DAG). Our procedure can be easily complemented with a variable selection step, so that variable and graphical model selection can be performed jointly. In this way, we offer a solution to a problem of growing importance especially in the area of genetical genomics (eQTL analysis). The input of our method is a single default prior, essentially involving no subjective elicitation, while its output is a closed form marginal likelihood for every covariate‐adjusted DAG model, which is constant over each class of Markov equivalent DAGs; our procedure thus naturally encompasses covariate‐adjusted decomposable graphical models. In realistic experimental studies, our method is highly competitive, especially when the number of responses is large relative to the sample size.  相似文献   

17.
以贝叶斯方法为基础构建了信用评级和违约概率模型,指出金融机构利用已有评级信息提高债务人信用风险评估准确性的途径,并以单个债务人违约概率度量方法和Merton理论为基础,考虑异质性导致的宏观经济冲击对债务人的不同影响,度量资产组合违约风险。利用相关数据对贝叶斯模型应用给出例证,结果表明贝叶斯方法具有更为灵活的框架和较好的预测能力。  相似文献   

18.
In this paper, we suggest a technique to quantify model risk, particularly model misspecification for binary response regression problems found in financial risk management, such as in credit risk modelling. We choose the probability of default model as one instance of many other credit risk models that may be misspecified in a financial institution. By way of illustrating the model misspecification for probability of default, we carry out quantification of two specific statistical predictive response techniques, namely the binary logistic regression and complementary log–log. The maximum likelihood estimation technique is employed for parameter estimation. The statistical inference, precisely the goodness of fit and model performance measurements, are assessed. Using the simulation dataset and Taiwan credit card default dataset, our finding reveals that with the same sample size and very small simulation iterations, the two techniques produce similar goodness-of-fit results but completely different performance measures. However, when the iterations increase, the binary logistic regression technique for balanced dataset reveals prominent goodness of fit and performance measures as opposed to the complementary log–log technique for both simulated and real datasets.  相似文献   

19.
Suppose that data are generated according to the model f ( y | x ; θ ) g ( x ), where y is a response and x are covariates. We derive and compare semiparametric likelihood and pseudolikelihood methods for estimating θ for situations in which units generated are not fully observed and in which it is impossible or undesirable to model the covariate distribution. The probability that a unit is fully observed may depend on y , and there may be a subset of covariates which is observed only for a subsample of individuals. Our key assumptions are that the probability that a unit has missing data depends only on which of a finite number of strata that ( y , x ) belongs to and that the stratum membership is observed for every unit. Applications include case–control studies in epidemiology, field reliability studies and broad classes of missing data and measurement error problems. Our results make fully efficient estimation of θ feasible, and they generalize and provide insight into a variety of methods that have been proposed for specific problems.  相似文献   

20.
We introduce extensions of stability selection, a method to stabilise variable selection methods introduced by Meinshausen and Bühlmann (J R Stat Soc 72:417–473, 2010). We propose to apply a base selection method repeatedly to random subsamples of observations and subsets of covariates under scrutiny, and to select covariates based on their selection frequency. We analyse the effects and benefits of these extensions. Our analysis generalizes the theoretical results of Meinshausen and Bühlmann (J R Stat Soc 72:417–473, 2010) from the case of half-samples to subsamples of arbitrary size. We study, in a theoretical manner, the effect of taking random covariate subsets using a simplified score model. Finally we validate these extensions on numerical experiments on both synthetic and real datasets, and compare the obtained results in detail to the original stability selection method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号