In many cases of modeling bivariate count data, the interest lies on studying the association rather than the marginal properties. We form a flexible regression copula-based model where covariates are used not only for the marginal but also for the copula parameters. Since copula measures the association, the use of covariates in its parameters allow for direct modeling of association. A real-data application related to transaction market basket data is used. Our goal is to refine and understand whether the association between the number of purchases of certain product categories depends on particular demographic customers’ characteristics. Such information is important for decision making for marketing purposes.  相似文献   

We propose a bivariate integer-valued fractional integrated (BINFIMA) model to account for the long-memory property and apply the model to high-frequency stock transaction data. The BINFIMA model allows for both positive and negative correlations between the counts. The unconditional and conditional first- and second-order moments are given. The model is capable of capturing the covariance between and within intra-day time series of high-frequency transaction data due to macroeconomic news and news related to a specific stock. Empirically, it is found that Ericsson B has mean recursive process while AstraZeneca has long-memory property.  相似文献   

We present a bivariate regression model for count data that allows for positive as well as negative correlation of the response variables. The covariance structure is based on the Sarmanov distribution and consists of a product of generalised Poisson marginals and a factor that depends on particular functions of the response variables. The closed form of the probability function is derived by means of the moment-generating function. The model is applied to a large real dataset on health care demand. Its performance is compared with alternative models presented in the literature. We find that our model is significantly better than or at least equivalent to the benchmark models. It gives insights into influences on the variance of the response variables.  相似文献   

We propose a flexible semiparametric stochastic mixed effects model for bivariate cyclic longitudinal data. The model can handle either single cycle or, more generally, multiple consecutive cycle data. The approach models the mean of responses by parametric fixed effects and a smooth nonparametric function for the underlying time effects, and the relationship across the bivariate responses by a bivariate Gaussian random field and a joint distribution of random effects. The proposed model not only can model complicated individual profiles, but also allows for more flexible within-subject and between-response correlations. The fixed effects regression coefficients and the nonparametric time functions are estimated using maximum penalized likelihood, where the resulting estimator for the nonparametric time function is a cubic smoothing spline. The smoothing parameters and variance components are estimated simultaneously using restricted maximum likelihood. Simulation results show that the parameter estimates are close to the true values. The fit of the proposed model on a real bivariate longitudinal dataset of pre-menopausal women also performs well, both for a single cycle analysis and for a multiple consecutive cycle analysis. The Canadian Journal of Statistics 48: 471–498; 2020 © 2020 Statistical Society of Canada  相似文献   

Asthma is an important chronic disease of childhood. An intervention programme for managing asthma was designed on principles of self-regulation and was evaluated by a randomized longitudinal study.The study focused on several outcomes, and, typically, missing data remained a pervasive problem. We develop a pattern-mixture model to evaluate the outcome of intervention on the number of hospitalizations with non-ignorable dropouts. Pattern-mixture models are not generally identifiable as no data may be available to estimate a number of model parameters. Sensitivity analyses are performed by imposing structures on the unidentified parameters.We propose a parameterization which permits sensitivity analyses on clustered longitudinal count data that have missing values due to non-ignorable missing data mechanisms. This parameterization is expressed as ratios between event rates across missing data patterns and the observed data pattern and thus measures departures from an ignorable missing data mechanism. Sensitivity analyses are performed within a Bayesian framework by averaging over different prior distributions on the event ratios. This model has the advantage of providing an intuitive and flexible framework for incorporating the uncertainty of the missing data mechanism in the final analysis.  相似文献   

The logrank test procedure for testing bivariate symmetry against asymmetry in matched-pair data is proposed. The presented test statistic is based on Mantel-Haenszel type statistics evaluated at diagonal grid points on the plane obtained from distinct uncensored failure times. The asymptotic results of the proposed test are derived and an example is shown to illustrate the methodology.  相似文献   

In this paper, we introduce the shared gamma frailty models with two different baseline distributions namely, the generalized log-logistic and the generalized Weibull. We introduce the Bayesian estimation procedure to estimate the parameters involved in these models. We present a simulation study to compare the true values of the parameters with the estimated values. We apply these models to a real-life bivariate survival data set of McGilchrist and Aisbett related to the kidney infection data and a better model is suggested for the data.  相似文献   


The shared frailty models are often used to model heterogeneity in survival analysis. The most common shared frailty model is a model in which hazard function is a product of a random factor (frailty) and the baseline hazard function which is common to all individuals. There are certain assumptions about the baseline distribution and the distribution of frailty. In this paper, we consider inverse Gaussian distribution as frailty distribution and three different baseline distributions, namely the generalized Rayleigh, the weighted exponential, and the extended Weibull distributions. With these three baseline distributions, we propose three different inverse Gaussian shared frailty models. We also compare these models with the models where the above-mentioned distributions are considered without frailty. We develop the Bayesian estimation procedure using Markov Chain Monte Carlo (MCMC) technique to estimate the parameters involved in these models. We present a simulation study to compare the true values of the parameters with the estimated values. A search of the literature suggests that currently no work has been done for these three baseline distributions with a shared inverse Gaussian frailty so far. We also apply these three models by using a real-life bivariate survival data set of McGilchrist and Aisbett (1991 McGilchrist, C.A., Aisbett, C.W. (1991). Regression with frailty in survival analysis. Biometrics 47:461466.[Crossref], [PubMed], [Web of Science ®] [Google Scholar]) related to the kidney infection data and a better model is suggested for the data using the Bayesian model selection criteria.  相似文献   

For count responses, there are situations in biomedical and sociological applications in which extra zeroes occur. Modeling correlated (e.g. repeated measures and clustered) zero-inflated count data includes special challenges because the correlation between measurements for a subject or a cluster needs to be taken into account. Moreover, zero-inflated count data are often faced with over/under dispersion problem. In this paper, we propose a random effect model for repeated measurements or clustered data with over/under dispersed response called random effect zero-inflated exponentiated-exponential geometric regression model. The proposed method was illustrated through real examples. The performance of the model and asymptotical properties of the estimations were investigated using simulation studies.KEYWORDS: Count model, under- and over-dispersion, zero-inflation, mixture model, zero-inflated poisson model  相似文献   

Frailty models are used in the survival analysis to account for the unobserved heterogeneity in individual risks to disease and death. To analyze the bivariate data on related survival times (e.g., matched pairs experiments, twin or family data) the shared frailty models were suggested. Shared frailty models are used despite their limitations. To overcome their disadvantages correlated frailty models may be used. In this article, we introduce the gamma correlated frailty models with two different baseline distributions namely, the generalized log logistic, and the generalized Weibull. We introduce the Bayesian estimation procedure using Markov chain Monte Carlo (MCMC) technique to estimate the parameters involved in these models. We present a simulation study to compare the true values of the parameters with the estimated values. Also we apply these models to a real life bivariate survival dataset related to the kidney infection data and a better model is suggested for the data.  相似文献   

This paper explores the utility of different approaches for modeling longitudinal count data with dropouts arising from a clinical study for the treatment of actinic keratosis lesions on the face and balding scalp. A feature of these data is that as the disease for subjects on the active arm improves their data show larger dispersion compared with those on the vehicle, exhibiting an over‐dispersion relative to the Poisson distribution. After fitting the marginal (or population averaged) model using the generalized estimating equation (GEE), we note that inferences from such a model might be biased as dropouts are treatment related. Then, we consider using a weighted GEE (WGEE) where each subject's contribution to the analysis is weighted inversely by the subject's probability of dropout. Based on the model findings, we argue that the WGEE might not address the concerns about the impact of dropouts on the efficacy findings when dropouts are treatment related. As an alternative, we consider likelihood‐based inference where random effects are added to the model to allow for heterogeneity across subjects. Finally, we consider a transition model where, unlike the previous approaches that model the log‐link function of the mean response, we model the subject's actual lesion counts. This model is an extension of the Poisson autoregressive model of order 1, where the autoregressive parameter is taken to be a function of treatment as well as other covariates to induce different dispersions and correlations for the two treatment arms. We conclude with a discussion about model selection. Published in 2009 by John Wiley & Sons, Ltd.  相似文献   

Time dependent association measures between variables are of interest in bivariate survival data. Several such measures have been proposed in literature for the modelling and analysis of survival data. In this paper, we introduce a new measure of association for bivariate survival data using product moment residual life function and mean residual life function. Various properties of the proposed measure and its relationship with existing measures are discussed. We also develop a non-parametric estimator of the measure and study its asymptotic properties. The application of the result is illustrated using a real life data. Finally, a stimulation study is carried out to assess the performance of the estimator.  相似文献   

A copula model for bivariate survival data with hybrid censoring is proposed to study the association between survival time of individuals infected with HIV and persistence time of infection with an additional virus. Survival with HIV is right censored and the persistence time of the additional virus is subject to interval censoring case 1. A pseudo-likelihood method is developed to study the association between the two event times under such hybrid censoring. Asymptotic consistency and normality of the pseudo-likelihood estimator are established based on empirical process theory. Simulation studies indicate good performance of the estimator with moderate sample size. The method is applied to a motivating HIV study which investigates the effect of GB virus type C (GBV-C) co-infection on survival time of HIV infected individuals.  相似文献   

The use of bivariate distributions plays a fundamental role in survival and reliability studies. In this paper, we consider a location scale model for bivariate survival times based on the proposal of a copula to model the dependence of bivariate survival data. For the proposed model, we consider inferential procedures based on maximum likelihood. Gains in efficiency from bivariate models are also examined in the censored data setting. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and compared to the performance of the bivariate regression model for matched paired survival data. Sensitivity analysis methods such as local and total influence are presented and derived under three perturbation schemes. The martingale marginal and the deviance marginal residual measures are used to check the adequacy of the model. Furthermore, we propose a new measure which we call modified deviance component residual. The methodology in the paper is illustrated on a lifetime data set for kidney patients.  相似文献   

We propose bivariate Weibull regression model with heterogeneity (frailty or random effect) which is generated by Weibull distribution. We assume that the bivariate survival data follow bivariate Weibull of Hanagal (Econ Qual Control 19:83–90, 2004). There are some interesting situations like survival times in genetic epidemiology, dental implants of patients and twin births (both monozygotic and dizygotic) where genetic behavior (which is unknown and random) of patients follows a known frailty distribution. These are the situations which motivate to study this particular model. We propose two-stage maximum likelihood estimation for hierarchical likelihood in the proposed model. We present a small simulation study to compare these estimates with the true value of the parameters and it is observed that these estimates are very close to the true values of the parameters.  相似文献   

Clustered (longitudinal) count data arise in many bio-statistical practices in which a number of repeated count responses are observed on a number of individuals. The repeated observations may also represent counts over time from a number of individuals. One important problem that arises in practice is to test homogeneity within clusters (individuals) and between clusters (individuals). As data within clusters are observations of repeated responses, the count data may be correlated and/or over-dispersed. For over-dispersed count data with unknown over-dispersion parameter we derive two score tests by assuming a random intercept model within the framework of (i) the negative binomial mixed effects model and (ii) the double extended quasi-likelihood mixed effects model (Lee and Nelder, 2001). These two statistics are much simpler than a statistic derived by Jacqmin-Gadda and Commenges (1995) under the framework of the over-dispersed generalized linear model. The first statistic takes the over-dispersion more directly into the model and therefore is expected to do well when the model assumptions are satisfied and the other statistic is expected to be robust. Simulations show superior level property of the statistics derived under the negative binomial and double extended quasi-likelihood model assumptions. A data set is analyzed and a discussion is given.  相似文献   

Random effect models have often been used in longitudinal data analysis since they allow for association among repeated measurements due to unobserved heterogeneity. Various approaches have been proposed to extend mixed models for repeated count data to include dependence on baseline counts. Dependence between baseline counts and individual-specific random effects result in a complex form of the (conditional) likelihood. An approximate solution can be achieved ignoring this dependence, but this approach could result in biased parameter estimates and in wrong inferences. We propose a computationally feasible approach to overcome this problem, leaving the random effect distribution unspecified. In this context, we show how the EM algorithm for nonparametric maximum likelihood (NPML) can be extended to deal with dependence of repeated measures on baseline counts.  相似文献   

Bayesian analysis of a bivariate survival model based on exponential distributions is discussed using both vague and conjugate prior distributions. Parameter and reliability estimators are given for the maximum likelihood technique and the Bayesian approach using both types of priors. A Monte Carlo study indicates the vague prior Bayes estimator of reliability performs better than its maximum likelihood counterpart.  相似文献   

A new distribution for non-negative integers, or counts, is developed. It is based on the assumption that the waiting times separating consecutive events are independently and identically gamma distributed. Thus, the structural process generating the counts may exhibit duration dependence. In this framework, the frequently observed phenomenon of overdispersion, that is a variance that exceeds the mean, is caused by a decreasing hazard function of the gamma distributed waiting times, while an increasing hazard leads to underdispersion at the level of the counts. A Monte Carlo simulation and an application to fertility data illustrate the performance of the new distribution.  相似文献   

Excess zeros are encountered in many empirical count data applications. We provide a new explanation of extra zeros, related to the underlying stochastic process that generates events. The process has two rates: a lower rate until the first event and a higher one thereafter. We derive the corresponding distribution of the number of events during a fixed period and extend it to account for observed and unobserved heterogeneity. An application to the socioeconomic determinants of the individual number of doctor visits in Germany illustrates the usefulness of the new approach.  相似文献   

