首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The conventional criteria for predictive model selection do not indicate the absolute goodness of models. For example, the quantity of Akaike Information Criterion (AIC) has meanings only when we compare AIC of different models for a given amount of data. Thus, the existing criteria do not tell us whether the quantity and quality of data is satisfactory, and hence we cannot judge whether we should collect more data to further improve the model or not. To solve such a practical problem, we propose a criterion RD that lies between 0 and 1. RD is an asymptotic estimate of the proportion of improvement in the predictive ability under a given error structure, where the predictive ability is defined by the expected logarithmic probability by which the next dataset (2nd dataset) occurs under a model constructed from the current dataset (1st dataset). That is, the predictive ability is defined by the expected logarithmic probability of the 2nd dataset evaluated at the model constructed from the 1st dataset. Appropriate choice of error structures is important in the calculation of RD. We illustrate examples of calculations of RD by using a small dataset about the moth abundance.  相似文献   

2.
Looking at predictive accuracy is a traditional method for comparing models. A natural method for approximating out-of-sample predictive accuracy is leave-one-out cross-validation (LOOCV)—we alternately hold out each case from a full dataset and then train a Bayesian model using Markov chain Monte Carlo without the held-out case; at last we evaluate the posterior predictive distribution of all cases with their actual observations. However, actual LOOCV is time-consuming. This paper introduces two methods, namely iIS and iWAIC, for approximating LOOCV with only Markov chain samples simulated from a posterior based on a full dataset. iIS and iWAIC aim at improving the approximations given by importance sampling (IS) and WAIC in Bayesian models with possibly correlated latent variables. In iIS and iWAIC, we first integrate the predictive density over the distribution of the latent variables associated with the held-out without reference to its observation, then apply IS and WAIC approximations to the integrated predictive density. We compare iIS and iWAIC with other approximation methods in three kinds of models: finite mixture models, models with correlated spatial effects, and a random effect logistic regression model. Our empirical results show that iIS and iWAIC give substantially better approximates than non-integrated IS and WAIC and other methods.  相似文献   

3.
In the case of prior knowledge about the unknown parameter, the Bayesian predictive density coincides with the Bayes estimator for the true density in the sense of the Kullback-Leibler divergence, but this is no longer true if we consider another loss function. In this paper we present a generalized Bayes rule to obtain Bayes density estimators with respect to any α-divergence, including the Kullback-Leibler divergence and the Hellinger distance. For curved exponential models, we study the asymptotic behaviour of these predictive densities. We show that, whatever prior we use, the generalized Bayes rule improves (in a non-Bayesian sense) the estimative density corresponding to a bias modification of the maximum likelihood estimator. It gives rise to a correspondence between choosing a prior density for the generalized Bayes rule and fixing a bias for the maximum likelihood estimator in the classical setting. A criterion for comparing and selecting prior densities is also given.  相似文献   

4.
Regression models are often used to make predictions. All the information needed is contained in the predictive distribution. However, this cannot be evaluated explicitly for most generalized linear models. We construct two approximations to this distribution and demonstrate their use on two sets of survival data, corresponding to the outcome of patients admitted to intensive care units and the survival times of leukaemia patients.Regression models are often used to make predictions. All the information needed is contained in the predictive distribution. However, this cannot be evaluated explicitly for most generalized linear models. We construct two approximations to this distribution and demonstrate their use on two sets of survival data, corresponding to the outcome of patients admitted to intensive care units and the survival times of leukaemia patients.Regression models are often used to make predictions. All the information needed is contained in the predictive distribution. However, this cannot be evaluated explicitly for most generalized linear models. We construct two approximations to this distribution and demonstrate their use on two sets of survival data, corresponding to the outcome of patients admitted to intensive care units and the survival times of leukaemia patients.Regression models are often used to make predictions. All the information needed is contained in the predictive distribution. However, this cannot be evaluated explicitly for most generalized linear models. We construct two approximations to this distribution and demonstrate their use on two sets of survival data, corresponding to the outcome of patients admitted to intensive care units and the survival times of leukaemia patients.  相似文献   

5.
In this article, we propose a method based on the Lagrangian probability distributions for developing new dependence models. Specifically, a generalized Poisson–gamma dependence model is derived. The maximum likelihood estimation (MLE) technique is proposed for estimating the dependence model parameters. Application of the generalized Poisson–gamma dependence model is illustrated by using an operational risks dataset.  相似文献   

6.
In this paper, we suggest a technique to quantify model risk, particularly model misspecification for binary response regression problems found in financial risk management, such as in credit risk modelling. We choose the probability of default model as one instance of many other credit risk models that may be misspecified in a financial institution. By way of illustrating the model misspecification for probability of default, we carry out quantification of two specific statistical predictive response techniques, namely the binary logistic regression and complementary log–log. The maximum likelihood estimation technique is employed for parameter estimation. The statistical inference, precisely the goodness of fit and model performance measurements, are assessed. Using the simulation dataset and Taiwan credit card default dataset, our finding reveals that with the same sample size and very small simulation iterations, the two techniques produce similar goodness-of-fit results but completely different performance measures. However, when the iterations increase, the binary logistic regression technique for balanced dataset reveals prominent goodness of fit and performance measures as opposed to the complementary log–log technique for both simulated and real datasets.  相似文献   

7.
Application of quantile regression models with measurement errors in predictors is becoming increasingly popular. High leverage points in predictors can have substantial impacts on these models. Here, we propose a predictive leverage statistic for these models, assuming that the measurement errors follow a multivariate normal distribution, and derive its exact distribution. We compare its performance versus known predictive leverage statistics using simulation and a real dataset. The proposed statistic is shown to have desirable features. It is also the first predictive leverage statistic having its distribution derived in a closed form.  相似文献   

8.
In this paper, a robust estimator is proposed for partially linear regression models. We first estimate the nonparametric component using the penalized regression spline, then we construct an estimator of parametric component by using robust S-estimator. We propose an iterative algorithm to solve the proposed optimization problem, and introduce a robust generalized cross-validation to select the penalized parameter. Simulation studies and a real data analysis illustrate that the our proposed method is robust against outliers in the dataset or errors with heavy tails.  相似文献   

9.
Count response data often exhibit departures from the assumptions of standard Poisson generalized linear models. In particular, cluster level correlation of the data and truncation at zero are two common characteristics of such data. This paper describes a random components truncated Poisson model that can be applied to clustered and zero‐truncated count data. Residual maximum likelihood method estimators for the parameters of this model are developed and their use is illustrated using a dataset of non‐zero counts of sheets with edge‐strain defects in iron sheets produced by the Mobarekeh Steel Complex, Iran. The paper also reports on a small‐scale simulation study that supports the estimation procedure.  相似文献   

10.
In this paper, a novel Bayesian framework is used to derive the posterior density function, predictive density for a single future response, a bivariate future response, and several future responses from the exponentiated Weibull model (EWM). We study three related types of models, the exponentiated exponential, exponentiated Weibull, and beta generalized exponential, which are all utilized to determine the goodness of fit of two real data sets. The statistical analysis indicates that the EWM best fits both data sets. We determine the predictive means, standard deviations, highest predictive density intervals, and the shape characteristics for a single future response. We also consider a new parameterization method to determine the posterior kernel densities for the parameters. The summary results of the parameters are calculated by using the Markov chain Monte Carlo method.  相似文献   

11.
Although prediction in mixed effects models usually concerns the random effects, in this paper we deal with the problem of prediction of a future, or yet unobserved, response random variable, belonging to a given cluster. In particular, the aim is to define computationally tractable prediction intervals, with conditional and unconditional coverage probability close to the target nominal value. This solution involves the conditional density of the future response random variable given the observed data, or a suitable high-order approximation based on the Laplace method. We prove that, unless the amount of data is very limited, the estimative or naive predictive procedure gives a relatively simple, feasible solution for response prediction. An application to generalized linear mixed models is presented.  相似文献   

12.
We develop a new robust stopping criterion for partial least squares regression (PLSR) component construction, characterized by a high level of stability. This new criterion is universal since it is suitable both for PLSR and extensions to generalized linear regression (PLSGLR). The criterion is based on a non-parametric bootstrap technique and must be computed algorithmically. It allows the testing of each successive component at a preset significance level \(\alpha \). In order to assess its performance and robustness with respect to various noise levels, we perform dataset simulations in which there is a preset and known number of components. These simulations are carried out for datasets characterized both by \(n>p\), with n the number of subjects and p the number of covariates, as well as for \(n<p\). We then use t-tests to compare the predictive performance of our approach with other common criteria. The stability property is in particular tested through re-sampling processes on a real allelotyping dataset. An important additional conclusion is that this new criterion gives globally better predictive performances than existing ones in both the PLSR and PLSGLR (logistic and poisson) frameworks.  相似文献   

13.
In this paper, a test is derived to assess the validity of heteroscedastic nonlinear regression models by a non‐parametric cosine regression method. For order selection, the paper proposes a data‐driven method that uses the parametric null model optimal order. This method yields a test that is asymptotically normally distributed under the null hypothesis and is consistent against any fixed alternative. Simulation studies that test the lack of fit of a generalized linear model are conducted to compare the performance of the proposed test with that of an existing non‐parametric kernel test. A dataset of esterase levels is used to demonstrate the proposed method in practice.  相似文献   

14.
Frailty models are used in the survival analysis to account for the unobserved heterogeneity in individual risks to disease and death. To analyze the bivariate data on related survival times (e.g., matched pairs experiments, twin or family data) the shared frailty models were suggested. Shared frailty models are used despite their limitations. To overcome their disadvantages correlated frailty models may be used. In this article, we introduce the gamma correlated frailty models with two different baseline distributions namely, the generalized log logistic, and the generalized Weibull. We introduce the Bayesian estimation procedure using Markov chain Monte Carlo (MCMC) technique to estimate the parameters involved in these models. We present a simulation study to compare the true values of the parameters with the estimated values. Also we apply these models to a real life bivariate survival dataset related to the kidney infection data and a better model is suggested for the data.  相似文献   

15.
Polytomous Item Response Theory (IRT) models are used by specialists to score assessments and questionnaires that have items with multiple response categories. In this article, we study the performance of five model comparison criteria for comparing fit of the graded response and generalized partial credit models using the same dataset when the choice between the two is unclear. Simulation study is conducted to analyze the sensitivity of priors and compare the performance of the criteria using the No-U-Turn Sampler algorithm, under a Bayesian approach. The results were used to select a model for an application in mental health data.  相似文献   

16.
We propose a class of Bayesian semiparametric mixed-effects models; its distinctive feature is the randomness of the grouping of observations, which can be inferred from the data. The model can be viewed under a more natural perspective, as a Bayesian semiparametric regression model on the log-scale; hence, in the original scale, the error is a mixture of Weibull densities mixed on both parameters by a normalized generalized gamma random measure, encompassing the Dirichlet process. As an estimate of the posterior distribution of the clustering of the random-effects parameters, we consider the partition minimizing the posterior expectation of a suitable class of loss functions. As a merely illustrative application of our model we consider a Kevlar fibre lifetime dataset (with censoring). We implement an MCMC scheme, obtaining posterior credibility intervals for the predictive distributions and for the quantiles of the failure times under different stress levels. Compared to a previous parametric Bayesian analysis, we obtain narrower credibility intervals and a better fit to the data. We found that there are three main clusters among the random-effects parameters, in accordance with previous frequentist analysis.  相似文献   

17.
神经网络模型与车险索赔频率预测   总被引:1,自引:0,他引:1       下载免费PDF全文
孟生旺 《统计研究》2012,29(3):22-26
汽车保险广受社会关注,且在财产保险公司具有举足轻重的地位,因此汽车保险的索赔频率预测模型一直是非寿险精算理论和应用研究的重点之一。目前最为流行的索赔频率预测模型是广义线性模型,其中包括泊松回归、负二项回归和泊松-逆高斯回归等。本文基于一组实际的车险损失数据,对索赔频率的各种广义线性模型与神经网络模型和回归树模型进行了比较,得出了一些新的结论,即神经网络模型的拟合效果优于广义线性模型,在广义线性模型中,泊松回归的拟合效果优于负二项回归和泊松-逆高斯回归。线性回归模型的拟合效果最差,回归树模型的拟合效果略好于线性回归模型。  相似文献   

18.
Bayesian methods have been extensively used in small area estimation. A linear model incorporating autocorrelated random effects and sampling errors was previously proposed in small area estimation using both cross-sectional and time-series data in the Bayesian paradigm. There are, however, many situations that we have time-related counts or proportions in small area estimation; for example, monthly dataset on the number of incidence in small areas. This article considers hierarchical Bayes generalized linear models for a unified analysis of both discrete and continuous data with incorporating cross-sectional and time-series data. The performance of the proposed approach is evaluated through several simulation studies and also by a real dataset.  相似文献   

19.
Abstract.  Typically, regression analysis for multistate models has been based on regression models for the transition intensities. These models lead to highly nonlinear and very complex models for the effects of covariates on state occupation probabilities. We present a technique that models the state occupation or transition probabilities in a multistate model directly. The method is based on the pseudo-values from a jackknife statistic constructed from non-parametric estimators for the probability in question. These pseudo-values are used as outcome variables in a generalized estimating equation to obtain estimates of model parameters. We examine this approach and its properties in detail for two special multistate model probabilities, the cumulative incidence function in competing risks and the current leukaemia-free survival used in bone marrow transplants. The latter is the probability a patient is alive and in either a first or second post-transplant remission. The techniques are illustrated on a dataset of leukaemia patients given a marrow transplant. We also discuss extensions of the model that are of current research interest.  相似文献   

20.
Hierarchical models enable the encoding of a variety of parametric structures. However, when presented with a large number of covariates upon which some component of a model hierarchy depends, the modeller may be unwilling or unable to specify a form for that dependence. Data-mining methods are designed to automatically discover relationships between many covariates and a response surface, easily accommodating non-linearities and higher-order interactions. We present a method of wrapping hierarchical models around data-mining methods, preserving the best qualities of the two paradigms. We fit the resulting semi-parametric models using an approximate Gibbs sampler called HEBBRU. Using a simulated dataset, we show that HEBBRU is useful for exploratory analysis and displays excellent predictive accuracy. Finally, we apply HEBBRU to an ornithological dataset drawn from the eBird database.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号