首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 979 毫秒
1.
The continuous extension of a discrete random variable is amongst the computational methods used for estimation of multivariate normal copula-based models with discrete margins. Its advantage is that the likelihood can be derived conveniently under the theory for copula models with continuous margins, but there has not been a clear analysis of the adequacy of this method. We investigate the asymptotic and small-sample efficiency of two variants of the method for estimating the multivariate normal copula with univariate binary, Poisson, and negative binomial regressions, and show that they lead to biased estimates for the latent correlations, and the univariate marginal parameters that are not regression coefficients. We implement a maximum simulated likelihood method, which is based on evaluating the multidimensional integrals of the likelihood with randomized quasi-Monte Carlo methods. Asymptotic and small-sample efficiency calculations show that our method is nearly as efficient as maximum likelihood for fully specified multivariate normal copula-based models. An illustrative example is given to show the use of our simulated likelihood method.  相似文献   

2.
Cluster analysis is one of the most widely used method in statistical analyses, in which homogeneous subgroups are identified in a heterogeneous population. Due to the existence of the continuous and discrete mixed data in many applications, so far, some ordinary clustering methods such as, hierarchical methods, k-means and model-based methods have been extended for analysis of mixed data. However, in the available model-based clustering methods, by increasing the number of continuous variables, the number of parameters increases and identifying as well as fitting an appropriate model may be difficult. In this paper, to reduce the number of the parameters, for the model-based clustering mixed data of continuous (normal) and nominal data, a set of parsimonious models is introduced. Models in this set are extended, using the general location model approach, for modeling distribution of mixed variables and applying factor analyzer structure for covariance matrices. The ECM algorithm is used for estimating the parameters of these models. In order to show the performance of the proposed models for clustering, results from some simulation studies and analyzing two real data sets are presented.  相似文献   

3.
The construction of a joint model for mixed discrete and continuous random variables that accounts for their associations is an important statistical problem in many practical applications. In this paper, we use copulas to construct a class of joint distributions of mixed discrete and continuous random variables. In particular, we employ the Gaussian copula to generate joint distributions for mixed variables. Examples include the robit-normal and probit-normal-exponential distributions, the first for modelling the distribution of mixed binary-continuous data and the second for a mixture of continuous, binary and trichotomous variables. The new class of joint distributions is general enough to include many mixed-data models currently available. We study properties of the distributions and outline likelihood estimation; a small simulation study is used to investigate the finite-sample properties of estimates obtained by full and pairwise likelihood methods. Finally, we present an application to discriminant analysis of multiple correlated binary and continuous data from a study involving advanced breast cancer patients.  相似文献   

4.
In many longitudinal studies multiple characteristics of each individual, along with time to occurrence of an event of interest, are often collected. In such data set, some of the correlated characteristics may be discrete and some of them may be continuous. In this paper, a joint model for analysing multivariate longitudinal data comprising mixed continuous and ordinal responses and a time to event variable is proposed. We model the association structure between longitudinal mixed data and time to event data using a multivariate zero-mean Gaussian process. For modeling discrete ordinal data we assume a continuous latent variable follows the logistic distribution and for continuous data a Gaussian mixed effects model is used. For the event time variable, an accelerated failure time model is considered under different distributional assumptions. For parameter estimation, a Bayesian approach using Markov Chain Monte Carlo is adopted. The performance of the proposed methods is illustrated using some simulation studies. A real data set is also analyzed, where different model structures are used. Model comparison is performed using a variety of statistical criteria.  相似文献   

5.
Estimating the parameters of multivariate mixed Poisson models is an important problem in image processing applications, especially for active imaging or astronomy. The classical maximum likelihood approach cannot be used for these models since the corresponding masses cannot be expressed in a simple closed form. This paper studies a maximum pairwise likelihood approach to estimate the parameters of multivariate mixed Poisson models when the mixing distribution is a multivariate Gamma distribution. The consistency and asymptotic normality of this estimator are derived. Simulations conducted on synthetic data illustrate these results and show that the proposed estimator outperforms classical estimators based on the method of moments. An application to change detection in low-flux images is also investigated.  相似文献   

6.
Restricted maximum likelihood (REML) methods are traditionally used for analyzing mixed models. Based on a multivariate normal likelihood, these analyses are sensitive to outliers. Recently developed robust rank-based procedures offer a complete analysis of mixed model: estimation of fixed effects, standard errors, and estimation of variance components. The results of a large Monte Carlo study are presented, comparing these two analyses for many situations over multivariate normal and contaminated normal distributions. The rank-based analyses are much more powerful and efficient than the REML analyses over all non-normal situations, while losing little power for normal errors.  相似文献   

7.
Model-based clustering using copulas with applications   总被引:1,自引:0,他引:1  
The majority of model-based clustering techniques is based on multivariate normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: (i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, and (ii) the explicit choice of marginal distributions for the clusters allows the modelling of multivariate data of various modes (either discrete or continuous) in a natural way. This paper introduces and studies the framework of copula-based finite mixture models for clustering applications. Estimation in the general case can be performed using standard EM, and, depending on the mode of the data, more efficient procedures are provided that can fully exploit the copula structure. The closure properties of the mixture models under marginalization are discussed, and for continuous, real-valued data parametric rotations in the sample space are introduced, with a parallel discussion on parameter identifiability depending on the choice of copulas for the components. The exposition of the methodology is accompanied and motivated by the analysis of real and artificial data.  相似文献   

8.
For multivariate probit models, Spiess and Tutz suggest three alternative performance measures, which are all based on the decomposition of the variation. The multivariate probit model can be seen as a special case of the discrete copula model. This paper proposes some new measures based on the value of the likelihood function and the prediction-realization table. In addition, it generalizes the measures from Spiess and Tutz for the discrete copula model. Results of a simulation study designed to compare the different measures in various situations are presented.  相似文献   

9.
Several variable selection procedures are available for continuous time-to-event data. However, if time is measured in a discrete way and therefore many ties occur models for continuous time are inadequate. We propose penalized likelihood methods that perform efficient variable selection in discrete survival modeling with explicit modeling of the heterogeneity in the population. The method is based on a combination of ridge and lasso type penalties that are tailored to the case of discrete survival. The performance is studied in simulation studies and an application to the birth of the first child.  相似文献   

10.
An algorithm is presented for calculating the power for the logistic and proportional hazards models in which some of the covariates are discrete and the remainders are multivariate normal. The mean and covariance matrix of the multivariate normal covariates may depend on the discrete covariates.

The algorithm, which finds the power of the Wald test, uses the result that the information matrix can be calculated using univariate numerical integration even when there are several continuous covariates. The algorithm is checked using simulation and in certain situations gives more accurate results than current methods which are based on simple formulae. The algorithm is used to explore properties of these models, in particular, the power gain from a prognostic covariate in the analysis of a clinical trial or observational study. The methods can be extended to determine power for other generalized linear models.  相似文献   

11.
We develop Bayesian models for density regression with emphasis on discrete outcomes. The problem of density regression is approached by considering methods for multivariate density estimation of mixed scale variables, and obtaining conditional densities from the multivariate ones. The approach to multivariate mixed scale outcome density estimation that we describe represents discrete variables, either responses or covariates, as discretised versions of continuous latent variables. We present and compare several models for obtaining these thresholds in the challenging context of count data analysis where the response may be over‐ and/or under‐dispersed in some of the regions of the covariate space. We utilise a nonparametric mixture of multivariate Gaussians to model the directly observed and the latent continuous variables. The paper presents a Markov chain Monte Carlo algorithm for posterior sampling, sufficient conditions for weak consistency, and illustrations on density, mean and quantile regression utilising simulated and real datasets.  相似文献   

12.
A maximum likelihood estimation procedure is presented for the frailty model. The procedure is based on a stochastic Expectation Maximization algorithm which converges quickly to the maximum likelihood estimate. The usual expectation step is replaced by a stochastic approximation of the complete log-likelihood using simulated values of unobserved frailties whereas the maximization step follows the same lines as those of the Expectation Maximization algorithm. The procedure allows to obtain at the same time estimations of the marginal likelihood and of the observed Fisher information matrix. Moreover, this stochastic Expectation Maximization algorithm requires less computation time. A wide variety of multivariate frailty models without any assumption on the covariance structure can be studied. To illustrate this procedure, a Gaussian frailty model with two frailty terms is introduced. The numerical results based on simulated data and on real bladder cancer data are more accurate than those obtained by using the Expectation Maximization Laplace algorithm and the Monte-Carlo Expectation Maximization one. Finally, since frailty models are used in many fields such as ecology, biology, economy, …, the proposed algorithm has a wide spectrum of applications.  相似文献   

13.
We propose and study properties of maximum likelihood estimators in the class of conditional transformation models. Based on a suitable explicit parameterization of the unconditional or conditional transformation function, we establish a cascade of increasingly complex transformation models that can be estimated, compared and analysed in the maximum likelihood framework. Models for the unconditional or conditional distribution function of any univariate response variable can be set up and estimated in the same theoretical and computational framework simply by choosing an appropriate transformation function and parameterization thereof. The ability to evaluate the distribution function directly allows us to estimate models based on the exact likelihood, especially in the presence of random censoring or truncation. For discrete and continuous responses, we establish the asymptotic normality of the proposed estimators. A reference software implementation of maximum likelihood‐based estimation for conditional transformation models that allows the same flexibility as the theory developed here was employed to illustrate the wide range of possible applications.  相似文献   

14.
15.
Frailty models are often used to model heterogeneity in survival analysis. The distribution of the frailty is generally assumed to be continuous. In some circumstances, it is appropriate to consider discrete frailty distributions. Having zero frailty can be interpreted as being immune, and population heterogeneity may be analysed using discrete frailty models. In this paper, survival functions are derived for the frailty models based on the discrete compound Poisson process. Maximum likelihood estimation procedures for the parameters are studied. We examine the fit of the models to earthquake and the traffic accidents’ data sets from Turkey.  相似文献   

16.
Multivariate Dispersion Models Generated From Gaussian Copula   总被引:5,自引:0,他引:5  
In this paper a class of multivariate dispersion models generated from the multivariate Gaussian copula is presented. Being a multivariate extension of Jørgensen's (1987a) dispersion models, this class of multivariate models is parametrized by marginal position, dispersion and dependence parameters, producing a large variety of multivariate discrete and continuous models including the multivariate normal as a special case. Properties of the multivariate distributions are investigated, some of which are similar to those of the multivariate normal distribution, which makes these models potentially useful for the analysis of correlated non-normal data in a way analogous to that of multivariate normal data. As an example, we illustrate an application of the models to the regression analysis of longitudinal data, and establish an asymptotic relationship between the likelihood equation and the generalized estimating equation of Liang & Zeger (1986).  相似文献   

17.
An objective of randomized placebo-controlled preventive HIV vaccine efficacy (VE) trials is to assess the relationship between vaccine effects to prevent HIV acquisition and continuous genetic distances of the exposing HIVs to multiple HIV strains represented in the vaccine. The set of genetic distances, only observed in failures, is collectively termed the ‘mark.’ The objective has motivated a recent study of a multivariate mark-specific hazard ratio model in the competing risks failure time analysis framework. Marks of interest, however, are commonly subject to substantial missingness, largely due to rapid post-acquisition viral evolution. In this article, we investigate the mark-specific hazard ratio model with missing multivariate marks and develop two inferential procedures based on (i) inverse probability weighting (IPW) of the complete cases, and (ii) augmentation of the IPW estimating functions by leveraging auxiliary data predictive of the mark. Asymptotic properties and finite-sample performance of the inferential procedures are presented. This research also provides general inferential methods for semiparametric density ratio/biased sampling models with missing data. We apply the developed procedures to data from the HVTN 502 ‘Step’ HIV VE trial.  相似文献   

18.
While the literature on multivariate models for continuous data flourishes, there is a lack of models for multivariate counts. We aim to contribute to this framework by extending the well known class of univariate hidden Markov models to the multidimensional case, by introducing multivariate Poisson hidden Markov models. Each state of the extended model is associated with a different multivariate discrete distribution. We consider different distributions with Poisson marginals, starting from the multivariate Poisson distribution and then extending to copula based distributions to allow flexible dependence structures. An EM type algorithm is developed for maximum likelihood estimation. A real data application is presented to illustrate the usefulness of the proposed models. In particular, we apply the models to the occurrence of strong earthquakes (surface wave magnitude ≥5), in three seismogenic subregions in the broad region of the North Aegean Sea for the time period from 1 January 1981 to 31 December 2008. Earthquakes occurring in one subregion may trigger events in adjacent ones and hence the observed time series of events are cross‐correlated. It is evident from the results that the three subregions interact with each other at times differing by up to a few months. This migration of seismic activity is captured by the model as a transition to a state of higher seismicity.  相似文献   

19.
A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to be observed outcomes. The model accommodates multilevel data structures, general covariate effects and distinct link functions and error distributions for each underlying variable. Within the framework proposed, novel models are developed for clustered multiple binary, unordered categorical and joint discrete and continuous outcomes. A Markov chain Monte Carlo sampling algorithm is described for estimating the posterior distributions of the parameters and latent variables. Because of the flexibility of the modelling framework and estimation procedure, extensions to ordered categorical outcomes and more complex data structures are straightforward. The methods are illustrated by using data from a reproductive toxicity study.  相似文献   

20.
Multivariate model validation is a complex decision-making problem involving comparison of multiple correlated quantities, based upon the available information and prior knowledge. This paper presents a Bayesian risk-based decision method for validation assessment of multivariate predictive models under uncertainty. A generalized likelihood ratio is derived as a quantitative validation metric based on Bayes’ theorem and Gaussian distribution assumption of errors between validation data and model prediction. The multivariate model is then assessed based on the comparison of the likelihood ratio with a Bayesian decision threshold, a function of the decision costs and prior of each hypothesis. The probability density function of the likelihood ratio is constructed using the statistics of multiple response quantities and Monte Carlo simulation. The proposed methodology is implemented in the validation of a transient heat conduction model, using a multivariate data set from experiments. The Bayesian methodology provides a quantitative approach to facilitate rational decisions in multivariate model assessment under uncertainty.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号