首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Latent variable models are widely used for jointly modeling of mixed data including nominal, ordinal, count and continuous data. In this paper, we consider a latent variable model for jointly modeling relationships between mixed binary, count and continuous variables with some observed covariates. We assume that, given a latent variable, mixed variables of interest are independent and count and continuous variables have Poisson distribution and normal distribution, respectively. As such data may be extracted from different subpopulations, consideration of an unobserved heterogeneity has to be taken into account. A mixture distribution is considered (for the distribution of the latent variable) which accounts the heterogeneity. The generalized EM algorithm which uses the Newton–Raphson algorithm inside the EM algorithm is used to compute the maximum likelihood estimates of parameters. The standard errors of the maximum likelihood estimates are computed by using the supplemented EM algorithm. Analysis of the primary biliary cirrhosis data is presented as an application of the proposed model.  相似文献   

2.
This paper proposes a simple and flexible count data regression model which is able to incorporate overdispersion (the variance is greater than the mean) and which can be considered a competitor to the Poisson model. As is well known, this classical model imposes the restriction that the conditional mean of each count variable must equal the conditional variance. Nevertheless, for the common case of well-dispersed counts the Poisson regression may not be appropriate, while the count regression model proposed here is potentially useful. We consider an application to model counts of medical care utilization by the elderly in the USA using a well-known data set from the National Medical Expenditure Survey (1987), where the dependent variable is the number of stays after hospital admission, and where 10 explanatory variables are analysed.  相似文献   

3.
Stratified randomization based on the baseline value of the primary analysis variable is common in clinical trial design. We illustrate from a theoretical viewpoint the advantage of such a stratified randomization to achieve balance of the baseline covariate. We also conclude that the estimator for the treatment effect is consistent when including both the continuous baseline covariate and the stratification factor derived from the baseline covariate. In addition, the analysis of covariance model including both the continuous covariate and the stratification factor is asymptotically no less efficient than including either only the continuous baseline value or only the stratification factor. We recommend that the continuous baseline covariate should generally be included in the analysis model. The corresponding stratification factor may also be included in the analysis model if one is not confident that the relationship between the baseline covariate and the response variable is linear. In spite of the above recommendation, one should always carefully examine relevant historical data to pre-specify the most appropriate analysis model for a perspective study.  相似文献   

4.
Covariate measurement error problems have been extensively studied in the context of right-censored data but less so for interval-censored data. Motivated by the AIDS Clinical Trial Group 175 study, where the occurrence time of AIDS was examined only at intermittent clinic visits and the baseline covariate CD4 count was measured with error, we describe a semiparametric maximum likelihood method for analyzing mixed case interval-censored data with mismeasured covariates under the proportional hazards model. We show that the estimator of the regression coefficient is asymptotically normal and efficient and provide a very stable and efficient algorithm for computing the estimators. We evaluate the method through simulation studies and illustrate it with AIDS data.  相似文献   

5.
COM-Poisson regression is an increasingly popular model for count data. Its main advantage is that it permits to model separately the mean and the variance of the counts, thus allowing the same covariate to affect in different ways the average level and the variability of the response variable. A key limiting factor to the use of the COM-Poisson distribution is the calculation of the normalisation constant: its accurate evaluation can be time-consuming and is not always feasible. We circumvent this problem, in the context of estimating a Bayesian COM-Poisson regression, by resorting to the exchange algorithm, an MCMC method applicable to situations where the sampling model (likelihood) can only be computed up to a normalisation constant. The algorithm requires to draw from the sampling model, which in the case of the COM-Poisson distribution can be done efficiently using rejection sampling. We illustrate the method and the benefits of using a Bayesian COM-Poisson regression model, through a simulation and two real-world data sets with different levels of dispersion.  相似文献   

6.
In this research, we describe a nonparametric time-varying coefficient model for the analysis of panel count data. We extend the traditional panel count data models by incorporating B-splines estimates of time-varying coefficients. We show that the proposed model can be implemented using a nonparametric maximum pseudo-likelihood method. We further examine the theoretical properties of the estimators of model parameters. The operational characteristics of the proposed method are evaluated through a simulation study. For illustration, we analyse data from a study of childhood wheezing, and describe the time-varying effect of an inflammatory marker on the risk of wheezing.  相似文献   

7.
Abstract

In some clinical, environmental, or economical studies, researchers are interested in a semi-continuous outcome variable which takes the value zero with a discrete probability and has a continuous distribution for the non-zero values. Due to the measuring mechanism, it is not always possible to fully observe some outcomes, and only an upper bound is recorded. We call this left-censored data and observe only the maximum of the outcome and an independent censoring variable, together with an indicator. In this article, we introduce a mixture semi-parametric regression model. We consider a parametric model to investigate the influence of covariates on the discrete probability of the value zero. For the non-zero part of the outcome, a semi-parametric Cox’s regression model is used to study the conditional hazard function. The different parameters in this mixture model are estimated using a likelihood method. Hereby the infinite dimensional baseline hazard function is estimated by a step function. As results, we show the identifiability and the consistency of the estimators for the different parameters in the model. We study the finite sample behaviour of the estimators through a simulation study and illustrate this model on a practical data example.  相似文献   

8.
Count data with structural zeros are common in public health applications. There are considerable researches focusing on zero-inflated models such as zero-inflated Poisson (ZIP) and zero-inflated Negative Binomial (ZINB) models for such zero-inflated count data when used as response variable. However, when such variables are used as predictors, the difference between structural and random zeros is often ignored and may result in biased estimates. One remedy is to include an indicator of the structural zero in the model as a predictor if observed. However, structural zeros are often not observed in practice, in which case no statistical method is available to address the bias issue. This paper is aimed to fill this methodological gap by developing parametric methods to model zero-inflated count data when used as predictors based on the maximum likelihood approach. The response variable can be any type of data including continuous, binary, count or even zero-inflated count responses. Simulation studies are performed to assess the numerical performance of this new approach when sample size is small to moderate. A real data example is also used to demonstrate the application of this method.  相似文献   

9.
In modeling count data with multivariate predictors, we often encounter problems with clustering of observations and interdependency of predictors. We propose to use principal components of predictors to mitigate the multicollinearity problem and to abate information losses due to dimension reduction, a semiparametric link between the count dependent variable and the principal components is postulated. Clustering of observations is accounted into the model as a random component and the model is estimated via the backfitting algorithm. Simulation study illustrates the advantages of the proposed model over standard poisson regression in a wide range of scenarios.  相似文献   

10.
Inflated data and over-dispersion are two common problems when modeling count data with traditional Poisson regression models. In this study, we propose a latent class inflated Poisson (LCIP) regression model to solve the unobserved heterogeneity that leads to inflations and over-dispersion. The performance of the model estimation is evaluated through simulation studies. We illustrate the usefulness of introducing a latent class variable by analyzing the Behavioral Risk Factor Surveillance System (BRFSS) data, which contain several excessive values and characterized by over-dispersion. As a result, the new model we proposed displays a better fit than the standard Poisson regression and zero-inflated Poisson regression models for the inflated counts.KEYWORDS: Inflated data, latent class, heterogeneity, Poisson regression, over-dispersion  相似文献   

11.
The family of generalized Poisson distribution has been found useful in describing over-dispersed and under-dispersed count data. We propose the use of restricted generalized Poisson regression model to predict a response variable affected by one or more explanatory variables. Approximate tests for the adequacy of the model and the estimation of the parameters are considered. Restricted generalized Poisson regression model has been applied to an observed data set.  相似文献   

12.
Longitudinal count data with excessive zeros frequently occur in social, biological, medical, and health research. To model such data, zero-inflated Poisson (ZIP) models are commonly used, after separating zero and positive responses. As longitudinal count responses are likely to be serially correlated, such separation may destroy the underlying serial correlation structure. To overcome this problem recently observation- and parameter-driven modelling approaches have been proposed. In the observation-driven model, the response at a specific time point is modelled through the responses at previous time points after incorporating serial correlation. One limitation of the observation-driven model is that it fails to accommodate the presence of any possible over-dispersion, which frequently occurs in the count responses. This limitation is overcome in a parameter-driven model, where the serial correlation is captured through the latent process using random effects. We compare the results obtained by the two models. A quasi-likelihood approach has been developed to estimate the model parameters. The methodology is illustrated with analysis of two real life datasets. To examine model performance the models are also compared through a simulation study.  相似文献   

13.
Baseline adjustment is an important consideration in thorough QT studies for non‐antiarrhythmic drugs. For crossover studies with period‐specific pre‐dose baselines, we propose a by‐time‐point analysis of covariance model with change from pre‐dose baseline as response, treatment as a fixed effect, pre‐dose baseline for current treatment and pre‐dose baseline averaged across treatments as covariates, and subject as a random effect. Additional factors such as period and sex should be included in the model as appropriate. Multiple pre‐dose measurements can be averaged to obtain a pre‐dose‐averaged baseline and used in the model. We provide conditions under which the proposed model is more efficient than other models. We demonstrate the efficiency and robustness of the proposed model both analytically and through simulation studies. The advantage of the proposed model is also illustrated using the data from a real clinical trial. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

14.
The Bernoulli and Poisson processes are two popular discrete count processes; however, both rely on strict assumptions. We instead propose a generalized homogenous count process (which we name the Conway–Maxwell–Poisson or COM-Poisson process) that not only includes the Bernoulli and Poisson processes as special cases, but also serves as a flexible mechanism to describe count processes that approximate data with over- or under-dispersion. We introduce the process and an associated generalized waiting time distribution with several real-data applications to illustrate its flexibility for a variety of data structures. We consider model estimation under different scenarios of data availability, and assess performance through simulated and real datasets. This new generalized process will enable analysts to better model count processes where data dispersion exists in a more accommodating and flexible manner.  相似文献   

15.
In biomedical studies, the event of interest is often recurrent and within-subject events cannot usually be assumed independent. In addition, individuals within a cluster might not be independent; for example, in multi-center or familial studies, subjects from the same center or family might be correlated. We propose methods of estimating parameters in two semi-parametric proportional rates/means models for clustered recurrent event data. The first model contains a baseline rate function which is common across clusters, while the second model features cluster-specific baseline rates. Dependence structures for patients-within-cluster and events-within-patient are both unspecified. Estimating equations are derived for the regression parameters. For the common baseline model, an estimator of the baseline mean function is proposed. The asymptotic distributions of the model parameters are derived, while finite-sample properties are assessed through a simulation study. Using data from a national organ failure registry, the proposed methods are applied to the analysis of technique failures among Canadian dialysis patients.  相似文献   

16.
In count data models, overdispersion of the dependent variable can be incorporated into the model if a heterogeneity term is added into the mean parameter of the Poisson distribution. We use a nonparametric estimation for the heterogeneity density based on a squared Kth-order polynomial expansion, that we generalize for panel data. A numerical illustration using an insurance dataset is discussed. Even if some statistical analyses showed no clear differences between these new models and the standard Poisson with gamma random effects, we show that the choice of the random effects distribution has a significant influence for interpreting our results.  相似文献   

17.
Early phase 2 tuberculosis (TB) trials are conducted to characterize the early bactericidal activity (EBA) of anti‐TB drugs. The EBA of anti‐TB drugs has conventionally been calculated as the rate of decline in colony forming unit (CFU) count during the first 14 days of treatment. The measurement of CFU count, however, is expensive and prone to contamination. Alternatively to CFU count, time to positivity (TTP), which is a potential biomarker for long‐term efficacy of anti‐TB drugs, can be used to characterize EBA. The current Bayesian nonlinear mixed‐effects (NLME) regression model for TTP data, however, lacks robustness to gross outliers that often are present in the data. The conventional way of handling such outliers involves their identification by visual inspection and subsequent exclusion from the analysis. However, this process can be questioned because of its subjective nature. For this reason, we fitted robust versions of the Bayesian nonlinear mixed‐effects regression model to a wide range of TTP datasets. The performance of the explored models was assessed through model comparison statistics and a simulation study. We conclude that fitting a robust model to TTP data obviates the need for explicit identification and subsequent “deletion” of outliers but ensures that gross outliers exert no undue influence on model fits. We recommend that the current practice of fitting conventional normal theory models be abandoned in favor of fitting robust models to TTP data.  相似文献   

18.
Summary.  In longitudinal studies missing data are the rule not the exception. We consider the analysis of longitudinal binary data with non-monotone missingness that is thought to be non-ignorable. In this setting a full likelihood approach is complicated algebraically and can be computationally prohibitive when there are many measurement occasions. We propose a 'protective' estimator that assumes that the probability that a response is missing at any occasion depends, in a completely unspecified way, on the value of that variable alone. Relying on this 'protectiveness' assumption, we describe a pseudolikelihood estimator of the regression parameters under non-ignorable missingness, without having to model the missing data mechanism directly. The method proposed is applied to CD4 cell count data from two longitudinal clinical trials of patients infected with the human immunodeficiency virus.  相似文献   

19.
Regression methods for common data types such as measured, count and categorical variables are well understood but increasingly statisticians need ways to model relationships between variable types such as shapes, curves, trees, correlation matrices and images that do not fit into the standard framework. Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the responses can be developed using standard methods. We call scoring as the transformation from a new observation to a score, whereas backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip.  相似文献   

20.
胡亚南  田茂再 《统计研究》2019,36(1):104-114
零膨胀计数数据破坏了泊松分布的方差-均值关系,可由取值服从泊松分布的数据和取值为零(退化分布)的数据各占一定比例所构成的混合分布所解释。本文基于自适应弹性网技术, 研究了零膨胀计数数据的联合建模及变量选择问题.对于零膨胀泊松分布,引入潜变量,构造出零膨胀泊松模型的完全似然, 其中由零膨胀部分和泊松部分两项组成.考虑到协变量可能存在共线性和稀疏性,通过对似然函数加自适应弹性网惩罚得到目标函数,然后利用EM算法得到回归系数的稀疏估计量,并用贝叶斯信息准则BIC来确定最优调节参数.本文也给出了估计量的大样本性质的理论证明和模拟研究,最后把所提出的方法应用到实际问题中。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号