首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
The construction of a joint model for mixed discrete and continuous random variables that accounts for their associations is an important statistical problem in many practical applications. In this paper, we use copulas to construct a class of joint distributions of mixed discrete and continuous random variables. In particular, we employ the Gaussian copula to generate joint distributions for mixed variables. Examples include the robit-normal and probit-normal-exponential distributions, the first for modelling the distribution of mixed binary-continuous data and the second for a mixture of continuous, binary and trichotomous variables. The new class of joint distributions is general enough to include many mixed-data models currently available. We study properties of the distributions and outline likelihood estimation; a small simulation study is used to investigate the finite-sample properties of estimates obtained by full and pairwise likelihood methods. Finally, we present an application to discriminant analysis of multiple correlated binary and continuous data from a study involving advanced breast cancer patients.  相似文献   

2.
In many longitudinal studies multiple characteristics of each individual, along with time to occurrence of an event of interest, are often collected. In such data set, some of the correlated characteristics may be discrete and some of them may be continuous. In this paper, a joint model for analysing multivariate longitudinal data comprising mixed continuous and ordinal responses and a time to event variable is proposed. We model the association structure between longitudinal mixed data and time to event data using a multivariate zero-mean Gaussian process. For modeling discrete ordinal data we assume a continuous latent variable follows the logistic distribution and for continuous data a Gaussian mixed effects model is used. For the event time variable, an accelerated failure time model is considered under different distributional assumptions. For parameter estimation, a Bayesian approach using Markov Chain Monte Carlo is adopted. The performance of the proposed methods is illustrated using some simulation studies. A real data set is also analyzed, where different model structures are used. Model comparison is performed using a variety of statistical criteria.  相似文献   

3.
Latent variable models are widely used for jointly modeling of mixed data including nominal, ordinal, count and continuous data. In this paper, we consider a latent variable model for jointly modeling relationships between mixed binary, count and continuous variables with some observed covariates. We assume that, given a latent variable, mixed variables of interest are independent and count and continuous variables have Poisson distribution and normal distribution, respectively. As such data may be extracted from different subpopulations, consideration of an unobserved heterogeneity has to be taken into account. A mixture distribution is considered (for the distribution of the latent variable) which accounts the heterogeneity. The generalized EM algorithm which uses the Newton–Raphson algorithm inside the EM algorithm is used to compute the maximum likelihood estimates of parameters. The standard errors of the maximum likelihood estimates are computed by using the supplemented EM algorithm. Analysis of the primary biliary cirrhosis data is presented as an application of the proposed model.  相似文献   

4.
ABSTRACT

Data sets originating from wide range of research studies are composed of multiple variables that are correlated and of dissimilar types, primarily of count, binary/ordinal and continuous attributes. The present paper builds on the previous works on multivariate data generation and develops a framework for generating multivariate mixed data with a pre-specified correlation matrix. The generated data consist of components that are marginally count, binary, ordinal and continuous, where the count and continuous variables follow the generalized Poisson and normal distributions, respectively. The use of the generalized Poisson distribution provides a flexible mechanism which allows under- and over-dispersed count variables generally encountered in practice. A step-by-step algorithm is provided and its performance is evaluated using simulated and real-data scenarios.  相似文献   

5.
We study the correlation structure for a mixture of ordinal and continuous repeated measures using a Bayesian approach. We assume a multivariate probit model for the ordinal variables and a normal linear regression for the continuous variables, where latent normal variables underlying the ordinal data are correlated with continuous variables in the model. Due to the probit model assumption, we are required to sample a covariance matrix with some of the diagonal elements equal to one. The key computational idea is to use parameter-extended data augmentation, which involves applying the Metropolis-Hastings algorithm to get a sample from the posterior distribution of the covariance matrix incorporating the relevant restrictions. The methodology is illustrated through a simulated example and through an application to data from the UCLA Brain Injury Research Center.  相似文献   

6.
ABSTRACT

This article extends the literature on copulas with discrete or continuous marginals to the case where some of the marginals are a mixture of discrete and continuous components. We do so by carefully defining the likelihood as the density of the observations with respect to a mixed measure. The treatment is quite general, although we focus on mixtures of Gaussian and Archimedean copulas. The inference is Bayesian with the estimation carried out by Markov chain Monte Carlo. We illustrate the methodology and algorithms by applying them to estimate a multivariate income dynamics model. Supplementary materials for this article are available online.  相似文献   

7.
Using a multivariate latent variable approach, this article proposes some new general models to analyze the correlated bounded continuous and categorical (nominal or/and ordinal) responses with and without non-ignorable missing values. First, we discuss regression methods for jointly analyzing continuous, nominal, and ordinal responses that we motivated by analyzing data from studies of toxicity development. Second, using the beta and Dirichlet distributions, we extend the models so that some bounded continuous responses are replaced for continuous responses. The joint distribution of the bounded continuous, nominal and ordinal variables is decomposed into a marginal multinomial distribution for the nominal variable and a conditional multivariate joint distribution for the bounded continuous and ordinal variables given the nominal variable. We estimate the regression parameters under the new general location models using the maximum-likelihood method. Sensitivity analysis is also performed to study the influence of small perturbations of the parameters of the missing mechanisms of the model on the maximal normal curvature. The proposed models are applied to two data sets: BMI, Steatosis and Osteoporosis data and Tehran household expenditure budgets.  相似文献   

8.
A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to be observed outcomes. The model accommodates multilevel data structures, general covariate effects and distinct link functions and error distributions for each underlying variable. Within the framework proposed, novel models are developed for clustered multiple binary, unordered categorical and joint discrete and continuous outcomes. A Markov chain Monte Carlo sampling algorithm is described for estimating the posterior distributions of the parameters and latent variables. Because of the flexibility of the modelling framework and estimation procedure, extensions to ordered categorical outcomes and more complex data structures are straightforward. The methods are illustrated by using data from a reproductive toxicity study.  相似文献   

9.
Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets.  相似文献   

10.
The barely known continuous reciprocal inverse Gaussian distribution is used in this paper to introduce the Poisson-reciprocal inverse Gaussian discrete distribution. Several of its most relevant statistical properties are examined, some of them directly inherited from the reciprocal of the inverse Gaussian distribution. Furthermore, a mixed Poisson regression model that uses the reciprocal inverse Gaussian as mixing distribution is presented. Parameters estimation in this regression model is performed via an EM type algorithm. In light of the numerical results displayed in the paper, the distributions introduced in this work are competitive with the classical negative binomial and Poisson-inverse Gaussian distributions.  相似文献   

11.
The article considers Bayesian analysis of hierarchical models for count, binomial and multinomial data using efficient MCMC sampling procedures. To this end, an improved method of auxiliary mixture sampling is proposed. In contrast to previously proposed samplers the method uses a bounded number of latent variables per observation, independent of the intensity of the underlying Poisson process in the case of count data, or of the number of experiments in the case of binomial and multinomial data. The bounded number of latent variables results in a more general error distribution, which is a negative log-Gamma distribution with arbitrary integer shape parameter. The required approximations of these distributions by Gaussian mixtures have been computed. Overall, the improvement leads to a substantial increase in efficiency of auxiliary mixture sampling for highly structured models. The method is illustrated for finite mixtures of generalized linear models and an epidemiological case study.  相似文献   

12.
In this paper, a joint model for analyzing multivariate mixed ordinal and continuous responses, where continuous outcomes may be skew, is presented. For modeling the discrete ordinal responses, a continuous latent variable approach is considered and for describing continuous responses, a skew-normal mixed effects model is used. A Bayesian approach using Markov Chain Monte Carlo (MCMC) is adopted for parameter estimation. Some simulation studies are performed for illustration of the proposed approach. The results of the simulation studies show that the use of the separate models or the normal distributional assumption for shared random effects and within-subject errors of continuous and ordinal variables, instead of the joint modeling under a skew-normal distribution, leads to biased parameter estimates. The approach is used for analyzing a part of the British Household Panel Survey (BHPS) data set. Annual income and life satisfaction are considered as the continuous and the ordinal longitudinal responses, respectively. The annual income variable is severely skewed, therefore, the use of the normality assumption for the continuous response does not yield acceptable results. The results of data analysis show that gender, marital status, educational levels and the amount of money spent on leisure have a significant effect on annual income, while marital status has the highest impact on life satisfaction.  相似文献   

13.
We study the problem of classifying an individual into one of several populations based on mixed nominal, continuous, and ordinal data. Specifically, we obtain a classification procedure as an extension to the so-called location linear discriminant function, by specifying a general mixed-data model for the joint distribution of the mixed discrete and continuous variables. We outline methods for estimating misclassification error rates. Results of simulations of the performance of proposed classification rules in various settings vis-à-vis a robust mixed-data discrimination method are reported as well. We give an example utilizing data on croup in children.  相似文献   

14.
A copula can fully characterize the dependence of multiple variables. The purpose of this paper is to provide a Bayesian nonparametric approach to the estimation of a copula, and we do this by mixing over a class of parametric copulas. In particular, we show that any bivariate copula density can be arbitrarily accurately approximated by an infinite mixture of Gaussian copula density functions. The model can be estimated by Markov Chain Monte Carlo methods and the model is demonstrated on both simulated and real data sets.  相似文献   

15.
Model-based clustering using copulas with applications   总被引:1,自引:0,他引:1  
The majority of model-based clustering techniques is based on multivariate normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: (i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, and (ii) the explicit choice of marginal distributions for the clusters allows the modelling of multivariate data of various modes (either discrete or continuous) in a natural way. This paper introduces and studies the framework of copula-based finite mixture models for clustering applications. Estimation in the general case can be performed using standard EM, and, depending on the mode of the data, more efficient procedures are provided that can fully exploit the copula structure. The closure properties of the mixture models under marginalization are discussed, and for continuous, real-valued data parametric rotations in the sample space are introduced, with a parallel discussion on parameter identifiability depending on the choice of copulas for the components. The exposition of the methodology is accompanied and motivated by the analysis of real and artificial data.  相似文献   

16.
Not only are copula functions joint distribution functions in their own right, they also provide a link between multivariate distributions and their lower‐dimensional marginal distributions. Copulas have a structure that allows us to characterize all possible multivariate distributions, and therefore they have the potential to be a very useful statistical tool. Although copulas can be traced back to 1959, there is still much scope for new results, as most of the early work was theoretical rather than practical. We focus on simple practical tools based on conditional expectation, because such tools are not widely available. When dealing with data sets in which the dependence throughout the sample is variable, we suggest that copula‐based regression curves may be more accurate predictors of specific outcomes than linear models. We derive simple conditional expectation formulae in terms of copulas and apply them to a combination of simulated and real data.  相似文献   

17.
For clustering mixed categorical and continuous data, Lawrence and Krzanowski (1996) proposed a finite mixture model in which component densities conform to the location model. In the graphical models literature the location model is known as the homogeneous Conditional Gaussian model. In this paper it is shown that their model is not identifiable without imposing additional restrictions. Specifically, for g groups and m locations, (g!)m–1 distinct sets of parameter values (not including permutations of the group mixing parameters) produce the same likelihood function. Excessive shrinkage of parameter estimates in a simulation experiment reported by Lawrence and Krzanowski (1996) is shown to be an artifact of the model's non-identifiability. Identifiable finite mixture models can be obtained by imposing restrictions on the conditional means of the continuous variables. These new identified models are assessed in simulation experiments. The conditional mean structure of the continuous variables in the restricted location mixture models is similar to that in the underlying variable mixture models proposed by Everitt (1988), but the restricted location mixture models are more computationally tractable.  相似文献   

18.
Every bivariate distribution function with continuous marginals can be represented in terms of a unique copula, that is, in terms of a distribution function on the unit square with uniform marginals. This paper is concerned with a special class of copulas called Archimedean, which includes the uniform representation of many standard bivariate distributions. Conditions are given under which these copulas are stochastically ordered and pointwise limits of sequences of Archimedean copulas are examined. We also provide two new one-parameter families of bivariate distributions which include as limiting cases the Frechet bounds and the independence distribution.  相似文献   

19.
This paper describes the modelling and fitting of Gaussian Markov random field spatial components within a Generalized AdditiveModel for Location, Scale and Shape (GAMLSS) model. This allows modelling of any or all the parameters of the distribution for the response variable using explanatory variables and spatial effects. The response variable distribution is allowed to be a non-exponential family distribution. A new package developed in R to achieve this is presented. We use Gaussian Markov random fields to model the spatial effect in Munich rent data and explore some features and characteristics of the data. The potential of using spatial analysis within GAMLSS is discussed. We argue that the flexibility of parametric distributions, ability to model all the parameters of the distribution and diagnostic tools of GAMLSS provide an ideal environment for modelling spatial features of data.  相似文献   

20.
In this article, we have developed a Poisson-mixed inverse Gaussian (PMIG) distribution. The mixed inverse Gaussian distribution is a mixture of the inverse Gaussian distribution and its length-biased counterpart. A PMIG regression model is developed and the maximum likelihood estimation of the parameters is studied. A dataset dealing with the number of hospital stays among the elderly population is analyzed by using the PMIG and the PIG (Poisson-inverse Gaussian) regression models and it has been shown that the PMIG model fits the data better than the PIG model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号