首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Zero-inflated data are more frequent when the data represent counts. However, there are practical situations in which continuous data contain an excess of zeros. In these cases, the zero-inflated Poisson, binomial or negative binomial models are not suitable. In order to reduce this gap, we propose the zero-spiked gamma-Weibull (ZSGW) model by mixing a distribution which is degenerate at zero with the gamma-Weibull distribution, which has positive support. The model attempts to estimate simultaneously the effects of explanatory variables on the response variable and the zero-spiked. We consider a frequentist analysis and a non-parametric bootstrap for estimating the parameters of the ZSGW regression model. We derive the appropriate matrices for assessing local influence on the model parameters. We illustrate the performance of the proposed regression model by means of a real data set (copaiba oil resin production) from a study carried out at the Department of Forest Science of the Luiz de Queiroz School of Agriculture, University of São Paulo. Based on the ZSGW regression model, we determine the explanatory variables that can influence the excess of zeros of the resin oil production and identify influential observations. We also prove empirically that the proposed regression model can be superior to the zero-adjusted inverse Gaussian regression model to fit zero-inflated positive continuous data.  相似文献   

2.
Inverse Gaussian first hitting time regression models sometimes provide an attractive representation of lifetime data. Various authors comment that dependence of both parameters on the same covariate may imply multicollinearity. The frequent appearance of conflicting signs for the two coefficients of the same covariate may be related to this. We carry out simulation studies to examine the reality of this possible multicollinearity. Although there is some dependence between estimates, multicollinearity does not seem to be a major problem. Fitting this model to data generated by a Weibull regression suggests that conflicting signs of estimates may be due to model misspecification.  相似文献   

3.
Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data.  相似文献   

4.
This article discusses regression analysis of mixed interval-censored failure time data. Such data frequently occur across a variety of settings, including clinical trials, epidemiologic investigations, and many other biomedical studies with a follow-up component. For example, mixed failure times are commonly found in the two largest studies of long-term survivorship after childhood cancer, the datasets that motivated this work. However, most existing methods for failure time data consider only right-censored or only interval-censored failure times, not the more general case where times may be mixed. Additionally, among regression models developed for mixed interval-censored failure times, the proportional hazards formulation is generally assumed. It is well-known that the proportional hazards model may be inappropriate in certain situations, and alternatives are needed to analyze mixed failure time data in such cases. To fill this need, we develop a maximum likelihood estimation procedure for the proportional odds regression model with mixed interval-censored data. We show that the resulting estimators are consistent and asymptotically Gaussian. An extensive simulation study is performed to assess the finite-sample properties of the method, and this investigation indicates that the proposed method works well for many practical situations. We then apply our approach to examine the impact of age at cranial radiation therapy on risk of growth hormone deficiency in long-term survivors of childhood cancer.  相似文献   

5.
We present an algorithm for multivariate robust Bayesian linear regression with missing data. The iterative algorithm computes an approximative posterior for the model parameters based on the variational Bayes (VB) method. Compared to the EM algorithm, the VB method has the advantage that the variance for the model parameters is also computed directly by the algorithm. We consider three families of Gaussian scale mixture models for the measurements, which include as special cases the multivariate t distribution, the multivariate Laplace distribution, and the contaminated normal model. The observations can contain missing values, assuming that the missing data mechanism can be ignored. A Matlab/Octave implementation of the algorithm is presented and applied to solve three reference examples from the literature.  相似文献   

6.
A Bayesian approach to modelling binary data on a regular lattice is introduced. The method uses a hierarchical model where the observed data is the sign of a hidden conditional autoregressive Gaussian process. This approach essentially extends the familiar probit model to dependent data. Markov chain Monte Carlo simulations are used on real and simulated data to estimate the posterior distribution of the spatial dependency parameters and the method is shown to work well. The method can be straightforwardly extended to regression models.  相似文献   

7.
In this paper, we propose a defective model induced by a frailty term for modeling the proportion of cured. Unlike most of the cure rate models, defective models have advantage of modeling the cure rate without adding any extra parameter in model. The introduction of an unobserved heterogeneity among individuals has bring advantages for the estimated model. The influence of unobserved covariates is incorporated using a proportional hazard model. The frailty term assumed to follow a gamma distribution is introduced on the hazard rate to control the unobservable heterogeneity of the patients. We assume that the baseline distribution follows a Gompertz and inverse Gaussian defective distributions. Thus we propose and discuss two defective distributions: the defective gamma-Gompertz and gamma-inverse Gaussian regression models. Simulation studies are performed to verify the asymptotic properties of the maximum likelihood estimator. Lastly, in order to illustrate the proposed model, we present three applications in real data sets, in which one of them we are using for the first time, related to a study about breast cancer in the A.C.Camargo Cancer Center, São Paulo, Brazil.  相似文献   

8.
Generalized estimating equations (GEE) is one of the most commonly used methods for regression analysis of longitudinal data, especially with discrete outcomes. The GEE method accounts for the association among the responses of a subject through a working correlation matrix and its correct specification ensures efficient estimation of the regression parameters in the marginal mean regression model. This study proposes a predicted residual sum of squares (PRESS) statistic as a working correlation selection criterion in GEE. A simulation study is designed to assess the performance of the proposed GEE PRESS criterion and to compare its performance with its counterpart criteria in the literature. The results show that the GEE PRESS criterion has better performance than the weighted error sum of squares SC criterion in all cases but is surpassed in performance by the Gaussian pseudo-likelihood criterion. Lastly, the working correlation selection criteria are illustrated with data from the Coronary Artery Risk Development in Young Adults study.  相似文献   

9.
Dependent multivariate count data occur in several research studies. These data can be modelled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula-based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.  相似文献   

10.
The paper considers a significance test of regression variables in the high-dimensional linear regression model when the dimension of the regression variables p, together with the sample size n, tends to infinity. Under two sightly different cases, we proved that the likelihood ratio test statistic will converge in distribution to a Gaussian random variable, and the explicit expressions of the asymptotical mean and covariance are also obtained. The simulations demonstrate that our high-dimensional likelihood ratio test method outperforms those using the traditional methods in analyzing high-dimensional data.  相似文献   

11.
Abstract. We review and extend some statistical tools that have proved useful for analysing functional data. Functional data analysis primarily is designed for the analysis of random trajectories and infinite‐dimensional data, and there exists a need for the development of adequate statistical estimation and inference techniques. While this field is in flux, some methods have proven useful. These include warping methods, functional principal component analysis, and conditioning under Gaussian assumptions for the case of sparse data. The latter is a recent development that may provide a bridge between functional and more classical longitudinal data analysis. Besides presenting a brief review of functional principal components and functional regression, we develop some concepts for estimating functional principal component scores in the sparse situation. An extension of the so‐called generalized functional linear model to the case of sparse longitudinal predictors is proposed. This extension includes functional binary regression models for longitudinal data and is illustrated with data on primary biliary cirrhosis.  相似文献   

12.
Current status data frequently occur in failure time studies, particularly in demographical studies and tumorigenicity experiments. Although commonly used in this context, proportional hazards and odds models are inadequate when survival functions cross. The authors consider a class of two‐sample models which is suitable for this situation and encompasses the proportional hazards and odds models. The estimating equations they propose lead to consistent and asymptotically Gaussian estimates of regression parameters in the extended model. Their approach is assessed through simulations and illustrated using data from a tumorigenicity experiment.  相似文献   

13.
Count data analysis techniques have been developed in biological and medical research areas. In particular, zero-inflated versions of parametric count distributions have been used to model excessive zeros that are often present in these assays. The most common count distributions for analyzing such data are Poisson and negative binomial. However, a Poisson distribution can only handle equidispersed data and a negative binomial distribution can only cope with overdispersion. However, a Conway–Maxwell–Poisson (CMP) distribution [4] can handle a wide range of dispersion. We show, with an illustrative data set on next-generation sequencing of maize hybrids, that both underdispersion and overdispersion can be present in genomic data. Furthermore, the maize data set consists of clustered observations and, therefore, we develop inference procedures for a zero-inflated CMP regression that incorporates a cluster-specific random effect term. Unlike the Gaussian models, the underlying likelihood is computationally challenging. We use a numerical approximation via a Gaussian quadrature to circumvent this issue. A test for checking zero-inflation has also been developed in our setting. Finite sample properties of our estimators and test have been investigated by extensive simulations. Finally, the statistical methodology has been applied to analyze the maize data mentioned before.  相似文献   

14.
We propose a flexible semiparametric stochastic mixed effects model for bivariate cyclic longitudinal data. The model can handle either single cycle or, more generally, multiple consecutive cycle data. The approach models the mean of responses by parametric fixed effects and a smooth nonparametric function for the underlying time effects, and the relationship across the bivariate responses by a bivariate Gaussian random field and a joint distribution of random effects. The proposed model not only can model complicated individual profiles, but also allows for more flexible within-subject and between-response correlations. The fixed effects regression coefficients and the nonparametric time functions are estimated using maximum penalized likelihood, where the resulting estimator for the nonparametric time function is a cubic smoothing spline. The smoothing parameters and variance components are estimated simultaneously using restricted maximum likelihood. Simulation results show that the parameter estimates are close to the true values. The fit of the proposed model on a real bivariate longitudinal dataset of pre-menopausal women also performs well, both for a single cycle analysis and for a multiple consecutive cycle analysis. The Canadian Journal of Statistics 48: 471–498; 2020 © 2020 Statistical Society of Canada  相似文献   

15.
Typical panel data models make use of the assumption that the regression parameters are the same for each individual cross-sectional unit. We propose tests for slope heterogeneity in panel data models. Our tests are based on the conditional Gaussian likelihood function in order to avoid the incidental parameters problem induced by the inclusion of individual fixed effects for each cross-sectional unit. We derive the Conditional Lagrange Multiplier test that is valid in cases where N → ∞ and T is fixed. The test applies to both balanced and unbalanced panels. We expand the test to account for general heteroskedasticity where each cross-sectional unit has its own form of heteroskedasticity. The modification is possible if T is large enough to estimate regression coefficients for each cross-sectional unit by using the MINQUE unbiased estimator for regression variances under heteroskedasticity. All versions of the test have a standard Normal distribution under general assumptions on the error distribution as N → ∞. A Monte Carlo experiment shows that the test has very good size properties under all specifications considered, including heteroskedastic errors. In addition, power of our test is very good relative to existing tests, particularly when T is not large.  相似文献   

16.
In this article, we have developed a Poisson-mixed inverse Gaussian (PMIG) distribution. The mixed inverse Gaussian distribution is a mixture of the inverse Gaussian distribution and its length-biased counterpart. A PMIG regression model is developed and the maximum likelihood estimation of the parameters is studied. A dataset dealing with the number of hospital stays among the elderly population is analyzed by using the PMIG and the PIG (Poisson-inverse Gaussian) regression models and it has been shown that the PMIG model fits the data better than the PIG model.  相似文献   

17.
Count data with excess zeros often occurs in areas such as public health, epidemiology, psychology, sociology, engineering, and agriculture. Zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression are useful for modeling such data, but because of hierarchical study design or the data collection procedure, zero-inflation and correlation may occur simultaneously. To overcome these challenges ZIP or ZINB may still be used. In this paper, multilevel ZINB regression is used to overcome these problems. The method of parameter estimation is an expectation-maximization algorithm in conjunction with the penalized likelihood and restricted maximum likelihood estimates for variance components. Alternative modeling strategies, namely the ZIP distribution are also considered. An application of the proposed model is shown on decayed, missing, and filled teeth of children aged 12 years old.  相似文献   

18.
In many longitudinal studies multiple characteristics of each individual, along with time to occurrence of an event of interest, are often collected. In such data set, some of the correlated characteristics may be discrete and some of them may be continuous. In this paper, a joint model for analysing multivariate longitudinal data comprising mixed continuous and ordinal responses and a time to event variable is proposed. We model the association structure between longitudinal mixed data and time to event data using a multivariate zero-mean Gaussian process. For modeling discrete ordinal data we assume a continuous latent variable follows the logistic distribution and for continuous data a Gaussian mixed effects model is used. For the event time variable, an accelerated failure time model is considered under different distributional assumptions. For parameter estimation, a Bayesian approach using Markov Chain Monte Carlo is adopted. The performance of the proposed methods is illustrated using some simulation studies. A real data set is also analyzed, where different model structures are used. Model comparison is performed using a variety of statistical criteria.  相似文献   

19.
Estimating equations which are not necessarily likelihood-based score equations are becoming increasingly popular for estimating regression model parameters. This paper is concerned with estimation based on general estimating equations when true covariate data are missing for all the study subjects, but surrogate or mismeasured covariates are available instead. The method is motivated by the covariate measurement error problem in marginal or partly conditional regression of longitudinal data. We propose to base estimation on the expectation of the complete data estimating equation conditioned on available data. The regression parameters and other nuisance parameters are estimated simultaneously by solving the resulting estimating equations. The expected estimating equation (EEE) estimator is equal to the maximum likelihood estimator if the complete data scores are likelihood scores and conditioning is with respect to all the available data. A pseudo-EEE estimator, which requires less computation, is also investigated. Asymptotic distribution theory is derived. Small sample simulations are conducted when the error process is an order 1 autoregressive model. Regression calibration is extended to this setting and compared with the EEE approach. We demonstrate the methods on data from a longitudinal study of the relationship between childhood growth and adult obesity.  相似文献   

20.
This paper considers quantile regression models using an asymmetric Laplace distribution from a Bayesian point of view. We develop a simple and efficient Gibbs sampling algorithm for fitting the quantile regression model based on a location-scale mixture representation of the asymmetric Laplace distribution. It is shown that the resulting Gibbs sampler can be accomplished by sampling from either normal or generalized inverse Gaussian distribution. We also discuss some possible extensions of our approach, including the incorporation of a scale parameter, the use of double exponential prior, and a Bayesian analysis of Tobit quantile regression. The proposed methods are illustrated by both simulated and real data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号