首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multivariate extreme events are typically modelled using multivariate extreme value distributions. Unfortunately, there exists no finite parametrization for the class of multivariate extreme value distributions. One common approach is to model extreme events using some flexible parametric subclass. This approach has been limited to only two or three dimensions, primarily because suitably flexible high-dimensional parametric models have prohibitively complex density functions. We present an approach that allows a number of popular flexible models to be used in arbitrarily high dimensions. The approach easily handles missing and censored data, and can be employed when modelling componentwise maxima and multivariate threshold exceedances. The approach is based on a representation using conditionally independent marginal components, conditioning on positive stable random variables. We use Bayesian inference, where the conditioning variables are treated as auxiliary variables within Markov chain Monte Carlo simulations. We demonstrate these methods with an application to sea-levels, using data collected at 10 sites on the east coast of England.  相似文献   

2.
Quantile regression models are a powerful tool for studying different points of the conditional distribution of univariate response variables. Their multivariate counterpart extension though is not straightforward, starting with the definition of multivariate quantiles. We propose here a flexible Bayesian quantile regression model when the response variable is multivariate, where we are able to define a structured additive framework for all predictor variables. We build on previous ideas considering a directional approach to define the quantiles of a response variable with multiple outputs, and we define noncrossing quantiles in every directional quantile model. We define a Markov chain Monte Carlo (MCMC) procedure for model estimation, where the noncrossing property is obtained considering a Gaussian process design to model the correlation between several quantile regression models. We illustrate the results of these models using two datasets: one on dimensions of inequality in the population, such as income and health; the second on scores of students in the Brazilian High School National Exam, considering three dimensions for the response variable.  相似文献   

3.
We consider varying coefficient models, which are an extension of the classical linear regression models in the sense that the regression coefficients are replaced by functions in certain variables (for example, time), the covariates are also allowed to depend on other variables. Varying coefficient models are popular in longitudinal data and panel data studies, and have been applied in fields such as finance and health sciences. We consider longitudinal data and estimate the coefficient functions by the flexible B-spline technique. An important question in a varying coefficient model is whether an estimated coefficient function is statistically different from a constant (or zero). We develop testing procedures based on the estimated B-spline coefficients by making use of nice properties of a B-spline basis. Our method allows longitudinal data where repeated measurements for an individual can be correlated. We obtain the asymptotic null distribution of the test statistic. The power of the proposed testing procedures are illustrated on simulated data where we highlight the importance of including the correlation structure of the response variable and on real data.  相似文献   

4.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

5.
Zero-inflated data are more frequent when the data represent counts. However, there are practical situations in which continuous data contain an excess of zeros. In these cases, the zero-inflated Poisson, binomial or negative binomial models are not suitable. In order to reduce this gap, we propose the zero-spiked gamma-Weibull (ZSGW) model by mixing a distribution which is degenerate at zero with the gamma-Weibull distribution, which has positive support. The model attempts to estimate simultaneously the effects of explanatory variables on the response variable and the zero-spiked. We consider a frequentist analysis and a non-parametric bootstrap for estimating the parameters of the ZSGW regression model. We derive the appropriate matrices for assessing local influence on the model parameters. We illustrate the performance of the proposed regression model by means of a real data set (copaiba oil resin production) from a study carried out at the Department of Forest Science of the Luiz de Queiroz School of Agriculture, University of São Paulo. Based on the ZSGW regression model, we determine the explanatory variables that can influence the excess of zeros of the resin oil production and identify influential observations. We also prove empirically that the proposed regression model can be superior to the zero-adjusted inverse Gaussian regression model to fit zero-inflated positive continuous data.  相似文献   

6.
We propose some statistical tools for diagnosing the class of generalized Weibull linear regression models [A.A. Prudente and G.M. Cordeiro, Generalized Weibull linear models, Comm. Statist. Theory Methods 39 (2010), pp. 3739–3755]. This class of models is an alternative means of analysing positive, continuous and skewed data and, due to its statistical properties, is very competitive with gamma regression models. First, we show that the Weibull model induces ma-ximum likelihood estimators asymptotically more efficient than the gamma model. Standardized residuals are defined, and their statistical properties are examined empirically. Some measures are derived based on the case-deletion model, including the generalized Cook's distance and measures for identifying influential observations on partial F-tests. The results of a simulation study conducted to assess behaviour of the global influence approach are also presented. Further, we perform a local influence analysis under the case-weights, response and explanatory variables perturbation schemes. The Weibull, gamma and other Weibull-type regression models are fitted into three data sets to illustrate the proposed diagnostic tools. Statistical analyses indicate that the Weibull model fitted into these data yields better fits than other common alternative models.  相似文献   

7.
Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.  相似文献   

8.
Models for Dependent Extremes Using Stable Mixtures   总被引:1,自引:0,他引:1  
Abstract.  This paper unifies and extends results on a class of multivariate extreme value (EV) models studied by Hougaard, Crowder and Tawn. In these models, both unconditional and conditional distributions are themselves EV distributions, and all lower-dimensional marginals and maxima belong to the class. One interpretation of the models is as size mixtures of EV distributions, where the mixing is by positive stable distributions. A second interpretation is as exponential-stable location mixtures (for Gumbel) or as power-stable scale mixtures (for non-Gumbel EV distributions). A third interpretation is through a peaks over thresholds model with a positive stable intensity. The mixing variables are used as a modelling tool and for better understanding and model checking. We study EV analogues of components of variance models, and new time series, spatial and continuous parameter models for extreme values. The results are applied to data from a pitting corrosion investigation.  相似文献   

9.
The Poisson distribution is a simple and popular model for count-data random variables, but it suffers from the equidispersion requirement, which is often not met in practice. While models for overdispersed counts have been discussed intensively in the literature, the opposite phenomenon, underdispersion, has received only little attention, especially in a time series context. We start with a detailed survey of distribution models allowing for underdispersion, discuss their properties and highlight possible disadvantages. After having identified two model families with attractive properties as well as only two model parameters, we combine these models with the INAR(1) model (integer-valued autoregressive), which is particularly well suited to obtain auotocorrelated counts with underdispersion. Properties of the resulting stationary INAR(1) models and approaches for parameter estimation are considered, as well as possible extensions to higher order autoregressions. Three real-data examples illustrate the application of the models in practice.  相似文献   

10.
Generalized additive models for location, scale and shape   总被引:10,自引:0,他引:10  
Summary.  A general class of statistical models for a univariate response variable is presented which we call the generalized additive model for location, scale and shape (GAMLSS). The model assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects. The distribution for the response variable in the GAMLSS can be selected from a very general family of distributions including highly skew or kurtotic continuous and discrete distributions. The systematic part of the model is expanded to allow modelling not only of the mean (or location) but also of the other parameters of the distribution of y , as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random-effects terms. Maximum (penalized) likelihood estimation is used to fit the (non)parametric models. A Newton–Raphson or Fisher scoring algorithm is used to maximize the (penalized) likelihood. The additive terms in the model are fitted by using a backfitting algorithm. Censored data are easily incorporated into the framework. Five data sets from different fields of application are analysed to emphasize the generality of the GAMLSS class of models.  相似文献   

11.
Bayesian selection of variables is often difficult to carry out because of the challenge in specifying prior distributions for the regression parameters for all possible models, specifying a prior distribution on the model space and computations. We address these three issues for the logistic regression model. For the first, we propose an informative prior distribution for variable selection. Several theoretical and computational properties of the prior are derived and illustrated with several examples. For the second, we propose a method for specifying an informative prior on the model space, and for the third we propose novel methods for computing the marginal distribution of the data. The new computational algorithms only require Gibbs samples from the full model to facilitate the computation of the prior and posterior model probabilities for all possible models. Several properties of the algorithms are also derived. The prior specification for the first challenge focuses on the observables in that the elicitation is based on a prior prediction y 0 for the response vector and a quantity a 0 quantifying the uncertainty in y 0. Then, y 0 and a 0 are used to specify a prior for the regression coefficients semi-automatically. Examples using real data are given to demonstrate the methodology.  相似文献   

12.
We propose a class of multidimensional Item Response Theory models for polytomously-scored items with ordinal response categories. This class extends an existing class of multidimensional models for dichotomously-scored items in which the latent abilities are represented by a random vector assumed to have a discrete distribution, with support points corresponding to different latent classes in the population. In the proposed approach, we allow for different parameterizations for the conditional distribution of the response variables given the latent traits, which depend on the type of link function and the constraints imposed on the item parameters. Moreover, we suggest a strategy for model selection that is based on a series of steps consisting of selecting specific features, such as the dimension of the model (number of latent traits), the number of latent classes, and the specific parameterization. In order to illustrate the proposed approach, we analyze a dataset from a study on anxiety and depression on a sample of oncological patients.  相似文献   

13.
We develop Bayesian models for density regression with emphasis on discrete outcomes. The problem of density regression is approached by considering methods for multivariate density estimation of mixed scale variables, and obtaining conditional densities from the multivariate ones. The approach to multivariate mixed scale outcome density estimation that we describe represents discrete variables, either responses or covariates, as discretised versions of continuous latent variables. We present and compare several models for obtaining these thresholds in the challenging context of count data analysis where the response may be over‐ and/or under‐dispersed in some of the regions of the covariate space. We utilise a nonparametric mixture of multivariate Gaussians to model the directly observed and the latent continuous variables. The paper presents a Markov chain Monte Carlo algorithm for posterior sampling, sufficient conditions for weak consistency, and illustrations on density, mean and quantile regression utilising simulated and real datasets.  相似文献   

14.
In practice, it is not uncommon to encounter the situation that a discrete response is related to both a functional random variable and multiple real-value random variables whose impact on the response is nonlinear. In this paper, we consider the generalized partial functional linear additive models (GPFLAM) and present the estimation procedure. In GPFLAM, the nonparametric functions are approximated by polynomial splines and the infinite slope function is estimated based on the principal component basis function approximations. We obtain the estimator by maximizing the quasi-likelihood function. We investigate the finite sample properties of the estimation procedure via Monte Carlo simulation studies and illustrate our proposed model by a real data analysis.  相似文献   

15.
We consider exact and approximate Bayesian computation in the presence of latent variables or missing data. Specifically we explore the application of a posterior predictive distribution formula derived in Sweeting And Kharroubi (2003), which is a particular form of Laplace approximation, both as an importance function and a proposal distribution. We show that this formula provides a stable importance function for use within poor man’s data augmentation schemes and that it can also be used as a proposal distribution within a Metropolis-Hastings algorithm for models that are not analytically tractable. We illustrate both uses in the case of a censored regression model and a normal hierarchical model, with both normal and Student t distributed random effects. Although the predictive distribution formula is motivated by regular asymptotic theory, it is not necessary that the likelihood has a closed form or that it possesses a local maximum.  相似文献   

16.
We propose and study properties of maximum likelihood estimators in the class of conditional transformation models. Based on a suitable explicit parameterization of the unconditional or conditional transformation function, we establish a cascade of increasingly complex transformation models that can be estimated, compared and analysed in the maximum likelihood framework. Models for the unconditional or conditional distribution function of any univariate response variable can be set up and estimated in the same theoretical and computational framework simply by choosing an appropriate transformation function and parameterization thereof. The ability to evaluate the distribution function directly allows us to estimate models based on the exact likelihood, especially in the presence of random censoring or truncation. For discrete and continuous responses, we establish the asymptotic normality of the proposed estimators. A reference software implementation of maximum likelihood‐based estimation for conditional transformation models that allows the same flexibility as the theory developed here was employed to illustrate the wide range of possible applications.  相似文献   

17.
Abstract

Augmented mixed beta regression models are suitable choices for modeling continuous response variables on the closed interval [0, 1]. The random eeceeects in these models are typically assumed to be normally distributed, but this assumption is frequently violated in some applied studies. In this paper, an augmented mixed beta regression model with skew-normal independent distribution for random effects are used. Next, we adopt a Bayesian approach for parameter estimation using the MCMC algorithm. The methods are then evaluated using some intensive simulation studies. Finally, the proposed models have applied to analyze a dataset from an Iranian Labor Force Survey.  相似文献   

18.
A pivotal characteristic of credit defaults that is ignored by most credit scoring models is the rarity of the event. The most widely used model to estimate the probability of default is the logistic regression model. Since the dependent variable represents a rare event, the logistic regression model shows relevant drawbacks, for example, underestimation of the default probability, which could be very risky for banks. In order to overcome these drawbacks, we propose the generalized extreme value regression model. In particular, in a generalized linear model (GLM) with the binary-dependent variable we suggest the quantile function of the GEV distribution as link function, so our attention is focused on the tail of the response curve for values close to one. The estimation procedure used is the maximum-likelihood method. This model accommodates skewness and it presents a generalisation of GLMs with complementary log–log link function. We analyse its performance by simulation studies. Finally, we apply the proposed model to empirical data on Italian small and medium enterprises.  相似文献   

19.
The purpose of this paper is to discuss response surface designs for multivariate generalized linear models (GLMs). Such models are considered whenever several response variables can be measured for each setting of a group of control variables, and the response variables are adequately represented by GLMs. The mean-squared error of prediction (MSEP) matrix is used to assess the quality of prediction associated with a given design. The MSEP incorporates both the prediction variance and the prediction bias, which results from using maximum likelihood estimates of the parameters of the fitted linear predictor. For a given design, quantiles of a scalar-valued function of the MSEP are obtained within a certain region of interest. The quantiles depend on the unknown parameters of the linear predictor. The dispersion of these quantiles over the space of the unknown parameters is determined and then depicted by the so-called quantile dispersion graphs. An application of the proposed methodology is presented using the special case of the bivariate binary distribution.  相似文献   

20.
Abstract.  Typically, regression analysis for multistate models has been based on regression models for the transition intensities. These models lead to highly nonlinear and very complex models for the effects of covariates on state occupation probabilities. We present a technique that models the state occupation or transition probabilities in a multistate model directly. The method is based on the pseudo-values from a jackknife statistic constructed from non-parametric estimators for the probability in question. These pseudo-values are used as outcome variables in a generalized estimating equation to obtain estimates of model parameters. We examine this approach and its properties in detail for two special multistate model probabilities, the cumulative incidence function in competing risks and the current leukaemia-free survival used in bone marrow transplants. The latter is the probability a patient is alive and in either a first or second post-transplant remission. The techniques are illustrated on a dataset of leukaemia patients given a marrow transplant. We also discuss extensions of the model that are of current research interest.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号