期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bivariate Time Series Modeling of Financial Count Data

A. M. M. Shahiduzzaman Quoreshi 《统计学通讯:理论与方法》2013,42(7):1343-1358

A bivariate integer-valued moving average (BINMA) model is proposed. The BINMA model allows for both positive and nagative correlation between the counts. This model can be seen as an inverse of the conditional duration model in the sense that short durations in a time interval correspond to a large count and vice versa. The conditional mean, variance, and covariance of the BINMA model are given. Model extensions to include explanatory variables are suggested. Using the BINMA model for AstraZeneca and Ericsson B, it is found that there is positive correlation between the stock transactions series. Empirically, we find support for the use of long-lag bivariate moving average models for the two series. 相似文献

2.

A Multivariate Generalized Poisson Regression Model

Felix Famoye 《统计学通讯:理论与方法》2013,42(3):497-511

A multivariate generalized Poisson regression model based on the multivariate generalized Poisson distribution is defined and studied. The regression model can be used to describe a count data with any type of dispersion. The model allows for both positive and negative correlation between any pair of the response variables. The parameters of the regression model are estimated by using the maximum likelihood method. Some test statistics are discussed, and two numerical data sets are used to illustrate the applications of the multivariate count data regression model. 相似文献

3.

A new discrete distribution: properties and applications in medical care

Emilio Gómez Déniz 《Journal of applied statistics》2013,40(12):2760-2770

This paper proposes a simple and flexible count data regression model which is able to incorporate overdispersion (the variance is greater than the mean) and which can be considered a competitor to the Poisson model. As is well known, this classical model imposes the restriction that the conditional mean of each count variable must equal the conditional variance. Nevertheless, for the common case of well-dispersed counts the Poisson regression may not be appropriate, while the count regression model proposed here is potentially useful. We consider an application to model counts of medical care utilization by the elderly in the USA using a well-known data set from the National Medical Expenditure Survey (1987), where the dependent variable is the number of stays after hospital admission, and where 10 explanatory variables are analysed. 相似文献

4.

Multivariate models for correlated count data

Mariana Rodrigues-Motta Hildete P. Pinheiro Eduardo G. Martins Márcio S. Araújo Sérgio F. dos Reis 《Journal of applied statistics》2013,40(7):1586-1596

In this study, we deal with the problem of overdispersion beyond extra zeros for a collection of counts that can be correlated. Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial distributions have been considered. First, we propose a multivariate count model in which all counts follow the same distribution and are correlated. Then we extend this model in a sense that correlated counts may follow different distributions. To accommodate correlation among counts, we have considered correlated random effects for each individual in the mean structure, thus inducing dependency among common observations to an individual. The method is applied to real data to investigate variation in food resources use in a species of marsupial in a locality of the Brazilian Cerrado biome. 相似文献

5.

Quantile Regression Methods for Longitudinal Data with Drop-outs: Application to CD4 Cell Counts of Patients Infected with the Human Immunodeficiency Virus

Stuart R. Lipsitz Garrett M. Fitzmaurice Geert Molenberghs & Lue Ping Zhao 《Journal of the Royal Statistical Society. Series C, Applied statistics》1997,46(4):463-476

Patients infected with the human immunodeficiency virus (HIV) generally experience a decline in their CD4 cell count (a count of certain white blood cells). We describe the use of quantile regression methods to analyse longitudinal data on CD4 cell counts from 1300 patients who participated in clinical trials that compared two therapeutic treatments: zidovudine and didanosine. It is of scientific interest to determine any treatment differences in the CD4 cell counts over a short treatment period. However, the analysis of the CD4 data is complicated by drop-outs: patients with lower CD4 cell counts at the base-line appear more likely to drop out at later measurement occasions. Motivated by this example, we describe the use of `weighted' estimating equations in quantile regression models for longitudinal data with drop-outs. In particular, the conventional estimating equations for the quantile regression parameters are weighted inversely proportionally to the probability of drop-out. This approach requires the process generating the missing data to be estimable but makes no assumptions about the distribution of the responses other than those imposed by the quantile regression model. This method yields consistent estimates of the quantile regression parameters provided that the model for drop-out has been correctly specified. The methodology proposed is applied to the CD4 cell count data and the results are compared with those obtained from an `unweighted' analysis. These results demonstrate how an analysis that fails to account for drop-outs can mislead. 相似文献

6.

Zero-inflated models for adjusting varying exposures: a cautionary note on the pitfalls of using offset

Cindy Feng 《Journal of applied statistics》2022,49(1):1

Zero-inflated count data are frequently encountered in public health and epidemiology research. Two-parts model is often used to model the excessive zeros, which are a mixture of two components: a point mass at zero and a count distribution, such as a Poisson distribution. When the rate of events per unit exposure is of interest, offset is commonly used to account for the varying extent of exposure, which is essentially a predictor whose regression coefficient is fixed at one. Such an assumption of exposure effect is, however, quite restrictive for many practical problems. Further, for zero-inflated models, offset is often only included in the count component of the model. However, the probability of excessive zero component could also be affected by the amount of ‘exposure’. We, therefore, proposed incorporating the varying exposure as a covariate rather than an offset term in both the probability of excessive zeros and conditional counts components of the zero-inflated model. A real example is used to illustrate the usage of the proposed methods, and simulation studies are conducted to assess the performance of the proposed methods for a broad variety of situations. 相似文献

7.

Regression for non-Euclidean data using distance matrices

Julian J. Faraway 《Journal of applied statistics》2014,41(11):2342-2357

Regression methods for common data types such as measured, count and categorical variables are well understood but increasingly statisticians need ways to model relationships between variable types such as shapes, curves, trees, correlation matrices and images that do not fit into the standard framework. Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the responses can be developed using standard methods. We call scoring as the transformation from a new observation to a score, whereas backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip. 相似文献

8.

Testing homogeneity in clustered (longitudinal) count data regression model with over-dispersion

Sudhir Paul Kazi Azad 《Journal of statistical planning and inference》2012

Clustered (longitudinal) count data arise in many bio-statistical practices in which a number of repeated count responses are observed on a number of individuals. The repeated observations may also represent counts over time from a number of individuals. One important problem that arises in practice is to test homogeneity within clusters (individuals) and between clusters (individuals). As data within clusters are observations of repeated responses, the count data may be correlated and/or over-dispersed. For over-dispersed count data with unknown over-dispersion parameter we derive two score tests by assuming a random intercept model within the framework of (i) the negative binomial mixed effects model and (ii) the double extended quasi-likelihood mixed effects model (Lee and Nelder, 2001). These two statistics are much simpler than a statistic derived by Jacqmin-Gadda and Commenges (1995) under the framework of the over-dispersed generalized linear model. The first statistic takes the over-dispersion more directly into the model and therefore is expected to do well when the model assumptions are satisfied and the other statistic is expected to be robust. Simulations show superior level property of the statistics derived under the negative binomial and double extended quasi-likelihood model assumptions. A data set is analyzed and a discussion is given. 相似文献

9.

Sample size estimation for a two-group comparison of repeated count outcomes using GEE

Ying Lou Jing Cao Song Zhang 《统计学通讯:理论与方法》2017,46(14):6743-6753

Randomized clinical trials with count measurements as the primary outcome are common in various medical areas such as seizure counts in epilepsy trials, or relapse counts in multiple sclerosis trials. Controlled clinical trials frequently use a conventional parallel-group design that assigns subjects randomly to one of two treatment groups and repeatedly evaluates them at baseline and intervals across a treatment period of a fixed duration. The primary interest is to compare the rates of change between treatment groups. Generalized estimating equations (GEEs) have been widely used to compare rates of change between treatment groups because of its robustness to misspecification of the true correlation structure. In this paper, we derive a sample size formula for comparing the rates of change between two groups in a repeatedly measured count outcome using GEE. The sample size formula incorporates general missing patterns such as independent missing and monotone missing, and general correlation structures such as AR(1) and compound symmetry (CS). The performance of the sample size formula is evaluated through simulation studies. Sample size estimation is illustrated by a clinical trial example from epilepsy. 相似文献

10.

Regression for doubly inflated multivariate Poisson distributions

Ishapathik Das Sumen Sen Pooja Sengupta 《Journal of Statistical Computation and Simulation》2019,89(13):2549-2561

Dependent multivariate count data occur in several research studies. These data can be modelled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula-based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models. 相似文献

11.

Random effects regression models for count data with excess zeros in caries research

D. Todem Y. Zhang A. Ismail W. Sohn 《Journal of applied statistics》2010,37(10):1661-1679

We extend the family of Poisson and negative binomial models to derive the joint distribution of clustered count outcomes with extra zeros. Two random effects models are formulated. The first model assumes a shared random effects term between the conditional probability of perfect zeros and the conditional mean of the imperfect state. The second formulation relaxes the shared random effects assumption by relating the conditional probability of perfect zeros and the conditional mean of the imperfect state to two different but correlated random effects variables. Under the conditional independence and the missing data at random assumption, a direct optimization of the marginal likelihood and an EM algorithm are proposed to fit the proposed models. Our proposed models are fitted to dental caries counts of children under the age of six in the city of Detroit. 相似文献

12.

Financial data modeling by Poisson mixture regression

S. Faria F. Gonçalves 《Journal of applied statistics》2013,40(10):2150-2162

In many financial applications, Poisson mixture regression models are commonly used to analyze heterogeneous count data. When fitting these models, the observed counts are supposed to come from two or more subpopulations and parameter estimation is typically performed by means of maximum likelihood via the Expectation–Maximization algorithm. In this study, we discuss briefly the procedure for fitting Poisson mixture regression models by means of maximum likelihood, the model selection and goodness-of-fit tests. These models are applied to a real data set for credit-scoring purposes. We aim to reveal the impact of demographic and financial variables in creating different groups of clients and to predict the group to which each client belongs, as well as his expected number of defaulted payments. The model's conclusions are very interesting, revealing that the population consists of three groups, contrasting with the traditional good versus bad categorization approach of the credit-scoring systems. 相似文献

13.

Estimating the underlying change in unemployment in the UK 总被引：2，自引：0，他引：2

Andrew Harvey & Chia-Hui Chung 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2000,163(3):303-309

By setting up a suitable time series model in state space form, the latest estimate of the underlying current change in a series may be computed by the Kalman filter. This may be done even if the observations are only available in a time-aggregated form subject to survey sampling error. A related series, possibly observed more frequently, may be used to improve the estimate of change further. The paper applies these techniques to the important problem of estimating the underlying monthly change in unemployment in the UK measured according to the definition of the International Labour Organisation by the Labour Force Survey. The fitted models suggest a reduction in root-mean-squared error of around 10% over a simple estimate based on differences if a univariate model is used and a further reduction of 50% if information on claimant counts is taken into account. With seasonally unadjusted data, the bivariate model offers a gain of roughly 40% over the use of annual differences. For both adjusted and unadjusted data, there is a further gain of around 10% if the next month's figure on claimant counts is used. The method preferred is based on a bivariate model with unadjusted data. If the next month's claimant count is known, the root-mean-squared error for the estimate of change is just over 10000. 相似文献

14.

Analyzing propensity matched zero-inflated count outcomes in observational studies

Stacia M. DeSantis Christos Lazaridis Shuang Ji Francis G. Spinale 《Journal of applied statistics》2014,41(1):127-141

Determining the effectiveness of different treatments from observational data, which are characterized by imbalance between groups due to lack of randomization, is challenging. Propensity matching is often used to rectify imbalances among prognostic variables. However, there are no guidelines on how appropriately to analyze group matched data when the outcome is a zero-inflated count. In addition, there is debate over whether to account for correlation of responses induced by matching and/or whether to adjust for variables used in generating the propensity score in the final analysis. The aim of this research is to compare covariate unadjusted and adjusted zero-inflated Poisson models that do and do not account for the correlation. A simulation study is conducted, demonstrating that it is necessary to adjust for potential residual confounding, but that accounting for correlation is less important. The methods are applied to a biomedical research data set. 相似文献

15.

Bayesian spatial quantile regression for areal count data,with application on substitute care placements in Texas

Clay King 《Journal of applied statistics》2019,46(4):580-597

Quantile regression (QR) allows one to model the effect of covariates across the entire response distribution, rather than only at the mean, but QR methods have been almost exclusively applied to continuous response variables and without considering spatial effects. Of the few studies that have performed QR on count data, none have included random spatial effects, which is an integral facet of the Bayesian spatial QR model for areal counts that we propose. Additionally, we introduce a simplifying alternative to the response variable transformation currently employed in the QR for counts literature. The efficacy of the proposed model is demonstrated via simulation study and on a real data application from the Texas Department of Family and Protective Services (TDFPS). Our model outperforms a comparable non-spatial model in both instances, as evidenced by the deviance information criterion (DIC) and coverage probabilities. With the TDFPS data, we identify one of four covariates, along with the intercept, as having a nonconstant effect across the response distribution. 相似文献

16.

Variance component models for longitudinal count data with baseline information: epilepsy data revisited

Marco Alfò Murray Aitkin 《Statistics and Computing》2006,16(3):231-238

Random effect models have often been used in longitudinal data analysis since they allow for association among repeated measurements due to unobserved heterogeneity. Various approaches have been proposed to extend mixed models for repeated count data to include dependence on baseline counts. Dependence between baseline counts and individual-specific random effects result in a complex form of the (conditional) likelihood. An approximate solution can be achieved ignoring this dependence, but this approach could result in biased parameter estimates and in wrong inferences. We propose a computationally feasible approach to overcome this problem, leaving the random effect distribution unspecified. In this context, we show how the EM algorithm for nonparametric maximum likelihood (NPML) can be extended to deal with dependence of repeated measures on baseline counts. 相似文献

17.

Biomedical applications for a generalized linear functional Poisson model

V. Barnett D. E. Wright 《Journal of applied statistics》1992,19(1):41-47

Many medical and biological studies involve response in the form of Poisson counts which can bemodelled using explanatory variables which also arise from count data. If the explanatory variables are observable without error (also as Poisson counts) we have a generalized linear model with a logarithmic link function and Poisson error structure. If,however, some of the explanatory variables are not directly observable, but arise with superimposed errors (again of Poisson form), the model is of a new type:a generalised linear functional Poisson model. In this paper,maximum likelihood estimates of the parameters of this model are determined along with the information matrix which (on noting its particular patterned form) is amenable to inversion in explicit form. Methods are proposed of an iterative type for computing estimates of the parameters and of their variational properties (e.g. standard errors) for this model, which also has application in other fields such as road traffic studies. 相似文献

18.

ESTIMATION OF A PROPORTION USING SEVERAL INDEPENDENT SAMPLES OF BINOMIAL MIXTURES

G.R. Wood C.D. Lai C.G. Qiao 《Australian & New Zealand Journal of Statistics》2005,47(4):441-448

This paper considers a sequence of independent counts, with each count arising from a mixture of binomial distributions; the mixing distribution is fixed but the number of trials varies from count to count. In this common situation, an estimate of the underlying mean binomial proportion is needed. Two estimators are in general use: the arithmetic average and a weighted average of the observed proportions. Variances of the two estimators are compared and used to decide which estimator is preferred in a given context. The relative merits depend on the distribution of the proportions and the numbers of trials used. 相似文献

19.

Parameter-driven state-space model for integer-valued time series with application

Y. B. Koh N. A. Bukhari I. Mohamed 《Journal of Statistical Computation and Simulation》2019,89(8):1394-1409

Time series of counts occur in many different contexts, the counts being usually of certain events or objects in specified time intervals. In this paper we introduce a model called parameter-driven state-space model to analyse integer-valued time series data. A key property of such model is that the distribution of the observed count data is independent, conditional on the latent process, although the observations are correlated marginally. Our simulation shows that the Monte Carlo Expectation Maximization (MCEM) algorithm and the particle method are useful for the parameter estimation of the proposed model. In the application to Malaysia dengue data, our model fits better when compared with several other models including that of Yang et al. (2015) 相似文献

20.

Nonparametric Comparison for Multivariate Panel Count Data

Hui Zhao Kate Virkler 《统计学通讯:理论与方法》2014,43(3):644-655

Multivariate panel count data often occur when there exist several related recurrent events or response variables defined by occurrences of related events. For univariate panel count data, several nonparametric treatment comparison procedures have been developed. However, it does not seem to exist a nonparametric procedure for multivariate cases. Based on differences between estimated mean functions, this article proposes a class of nonparametric test procedures for multivariate panel count data. The asymptotic distribution of the new test statistics is established and a simulation study is conducted. Moreover, the new procedures are applied to a skin cancer problem that motivated this study. 相似文献