首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Poisson regression is a very commonly used technique for modeling the count data in applied sciences, in which the model parameters are usually estimated by the maximum likelihood method. However, the presence of multicollinearity inflates the variance of maximum likelihood (ML) estimator and the estimated parameters give unstable results. In this article, a new linearized ridge Poisson estimator is introduced to deal with the problem of multicollinearity. Based on the asymptotic properties of ML estimator, the bias, covariance and mean squared error of the proposed estimator are obtained and the optimal choice of shrinkage parameter is derived. The performance of the existing estimators and proposed estimator is evaluated through Monte Carlo simulations and two real data applications. The results clearly reveal that the proposed estimator outperforms the existing estimators in the mean squared error sense.KEYWORDS: Poisson regression, multicollinearity, ridge Poisson estimator, linearized ridge regression estimator, mean squared errorMathematics Subject Classifications: 62J07, 62F10  相似文献   

2.
The Poisson regression model (PRM) is employed in modelling the relationship between a count variable (y) and one or more explanatory variables. The parameters of PRM are popularly estimated using the Poisson maximum likelihood estimator (PMLE). There is a tendency that the explanatory variables grow together, which results in the problem of multicollinearity. The variance of the PMLE becomes inflated in the presence of multicollinearity. The Poisson ridge regression (PRRE) and Liu estimator (PLE) have been suggested as an alternative to the PMLE. However, in this study, we propose a new estimator to estimate the regression coefficients for the PRM when multicollinearity is a challenge. We perform a simulation study under different specifications to assess the performance of the new estimator and the existing ones. The performance was evaluated using the scalar mean square error criterion and the mean squared error prediction error. The aircraft damage data was adopted for the application study and the estimators’ performance judged by the SMSE and the mean squared prediction error. The theoretical comparison shows that the proposed estimator outperforms other estimators. This is further supported by the simulation study and the application result.KEYWORDS: Poisson regression model, Poisson maximum likelihood estimator, multicollinearity, Poisson ridge regression, Liu estimator, simulation  相似文献   

3.
Count data may be described by a Poisson regression model. If random coefficients are involved, maximum likelihood is not feasible and alternative estimation methods have to be employed. For the approach based on quasi-likelihood estimation a characterization of design optimality is derived and optimal designs are determined numerically for an example with random slope parameters.  相似文献   

4.
We propose a hidden Markov model for longitudinal count data where sources of unobserved heterogeneity arise, making data overdispersed. The observed process, conditionally on the hidden states, is assumed to follow an inhomogeneous Poisson kernel, where the unobserved heterogeneity is modeled in a generalized linear model (GLM) framework by adding individual-specific random effects in the link function. Due to the complexity of the likelihood within the GLM framework, model parameters may be estimated by numerical maximization of the log-likelihood function or by simulation methods; we propose a more flexible approach based on the Expectation Maximization (EM) algorithm. Parameter estimation is carried out using a non-parametric maximum likelihood (NPML) approach in a finite mixture context. Simulation results and two empirical examples are provided.  相似文献   

5.
The purpose of this paper is to develop a new linear regression model for count data, namely generalized-Poisson Lindley (GPL) linear model. The GPL linear model is performed by applying generalized linear model to GPL distribution. The model parameters are estimated by the maximum likelihood estimation. We utilize the GPL linear model to fit two real data sets and compare it with the Poisson, negative binomial (NB) and Poisson-weighted exponential (P-WE) models for count data. It is found that the GPL linear model can fit over-dispersed count data, and it shows the highest log-likelihood, the smallest AIC and BIC values. As a consequence, the linear regression model from the GPL distribution is a valuable alternative model to the Poisson, NB, and P-WE models.  相似文献   

6.
We consider the problem of detecting a ‘bump’ in the intensity of a Poisson process or in a density. We analyze two types of likelihood ratio‐based statistics, which allow for exact finite sample inference and asymptotically optimal detection: The maximum of the penalized square root of log likelihood ratios (‘penalized scan’) evaluated over a certain sparse set of intervals and a certain average of log likelihood ratios (‘condensed average likelihood ratio’). We show that penalizing the square root of the log likelihood ratio — rather than the log likelihood ratio itself — leads to a simple penalty term that yields optimal power. The thus derived penalty may prove useful for other problems that involve a Brownian bridge in the limit. The second key tool is an approximating set of intervals that is rich enough to allow for optimal detection, but which is also sparse enough to allow justifying the validity of the penalization scheme simply via the union bound. This results in a considerable simplification in the theoretical treatment compared with the usual approach for this type of penalization technique, which requires establishing an exponential inequality for the variation of the test statistic. Another advantage of using the sparse approximating set is that it allows fast computation in nearly linear time. We present a simulation study that illustrates the superior performance of the penalized scan and of the condensed average likelihood ratio compared with the standard scan statistic.  相似文献   

7.
Summary.  We propose covariance-regularized regression, a family of methods for prediction in high dimensional settings that uses a shrunken estimate of the inverse covariance matrix of the features to achieve superior prediction. An estimate of the inverse covariance matrix is obtained by maximizing the log-likelihood of the data, under a multivariate normal model, subject to a penalty; it is then used to estimate coefficients for the regression of the response onto the features. We show that ridge regression, the lasso and the elastic net are special cases of covariance-regularized regression, and we demonstrate that certain previously unexplored forms of covariance-regularized regression can outperform existing methods in a range of situations. The covariance-regularized regression framework is extended to generalized linear models and linear discriminant analysis, and is used to analyse gene expression data sets with multiple class and survival outcomes.  相似文献   

8.
The negative binomial (NB) is frequently used to model overdispersed Poisson count data. To study the effect of a continuous covariate of interest in an NB model, a flexible procedure is used to model the covariate effect by fixed-knot cubic basis-splines or B-splines with a second-order difference penalty on the adjacent B-spline coefficients to avoid undersmoothing. A penalized likelihood is used to estimate parameters of the model. A penalized likelihood ratio test statistic is constructed for the null hypothesis of the linearity of the continuous covariate effect. When the number of knots is fixed, its limiting null distribution is the distribution of a linear combination of independent chi-squared random variables, each with one degree of freedom. The smoothing parameter value is determined by setting a specified value equal to the asymptotic expectation of the test statistic under the null hypothesis. The power performance of the proposed test is studied with simulation experiments.  相似文献   

9.
The purpose of this study is to highlight the application of sparse logistic regression models in dealing with prediction of tumour pathological subtypes based on lung cancer patients'' genomic information. We consider sparse logistic regression models to deal with the high dimensionality and correlation between genomic regions. In a hierarchical likelihood (HL) method, it is assumed that the random effects follow a normal distribution and its variance is assumed to follow a gamma distribution. This formulation considers ridge and lasso penalties as special cases. We extend the HL penalty to include a ridge penalty (called ‘HLnet’) in a similar principle of the elastic net penalty, which is constructed from lasso penalty. The results indicate that the HL penalty creates more sparse estimates than lasso penalty with comparable prediction performance, while HLnet and elastic net penalties have the best prediction performance in real data. We illustrate the methods in a lung cancer study.  相似文献   

10.
Ibrahim (1990) used the EM-algorithm to obtain maximum likelihood estimates of the regression parameters in generalized linear models with partially missing covariates. The technique was termed EM by the method of weights. In this paper, we generalize this technique to Cox regression analysis with missing values in the covariates. We specify a full model letting the unobserved covariate values be random and then maximize the observed likelihood. The asymptotic covariance matrix is estimated by the inverse information matrix. The missing data are allowed to be missing at random but also the non-ignorable non-response situation may in principle be considered. Simulation studies indicate that the proposed method is more efficient than the method suggested by Paik & Tsai (1997). We apply the procedure to a clinical trials example with six covariates with three of them having missing values.  相似文献   

11.
Summary.  We introduce a flexible marginal modelling approach for statistical inference for clustered and longitudinal data under minimal assumptions. This estimated estimating equations approach is semiparametric and the proposed models are fitted by quasi-likelihood regression, where the unknown marginal means are a function of the fixed effects linear predictor with unknown smooth link, and variance–covariance is an unknown smooth function of the marginal means. We propose to estimate the nonparametric link and variance–covariance functions via smoothing methods, whereas the regression parameters are obtained via the estimated estimating equations. These are score equations that contain nonparametric function estimates. The proposed estimated estimating equations approach is motivated by its flexibility and easy implementation. Moreover, if data follow a generalized linear mixed model, with either a specified or an unspecified distribution of random effects and link function, the model proposed emerges as the corresponding marginal (population-average) version and can be used to obtain inference for the fixed effects in the underlying generalized linear mixed model, without the need to specify any other components of this generalized linear mixed model. Among marginal models, the estimated estimating equations approach provides a flexible alternative to modelling with generalized estimating equations. Applications of estimated estimating equations include diagnostics and link selection. The asymptotic distribution of the proposed estimators for the model parameters is derived, enabling statistical inference. Practical illustrations include Poisson modelling of repeated epileptic seizure counts and simulations for clustered binomial responses.  相似文献   

12.
胡亚南  田茂再 《统计研究》2019,36(1):104-114
零膨胀计数数据破坏了泊松分布的方差-均值关系,可由取值服从泊松分布的数据和取值为零(退化分布)的数据各占一定比例所构成的混合分布所解释。本文基于自适应弹性网技术, 研究了零膨胀计数数据的联合建模及变量选择问题.对于零膨胀泊松分布,引入潜变量,构造出零膨胀泊松模型的完全似然, 其中由零膨胀部分和泊松部分两项组成.考虑到协变量可能存在共线性和稀疏性,通过对似然函数加自适应弹性网惩罚得到目标函数,然后利用EM算法得到回归系数的稀疏估计量,并用贝叶斯信息准则BIC来确定最优调节参数.本文也给出了估计量的大样本性质的理论证明和模拟研究,最后把所提出的方法应用到实际问题中。  相似文献   

13.
In this paper, a unified maximum marginal likelihood estimation procedure is proposed for the analysis of right censored data using general partially linear varying-coefficient transformation models (GPLVCTM), which are flexible enough to include many survival models as its special cases. Unknown functional coefficients in the models are approximated by cubic B-spline polynomial. We estimate B-spline coefficients and regression parameters by maximizing marginal likelihood function. One advantage of this procedure is that it is free of both baseline and censoring distribution. Through simulation studies and a real data application (VA data from the Veteran's Administration Lung Cancer Study Clinical Trial), we illustrate that the proposed estimation procedure is accurate, stable and practical.  相似文献   

14.
This paper is concerned with the ridge estimation of fixed and random effects in the context of Henderson's mixed model equations in the linear mixed model. For this purpose, a penalized likelihood method is proposed. A linear combination of ridge estimator for fixed and random effects is compared to a linear combination of best linear unbiased estimator for fixed and random effects under the mean-square error (MSE) matrix criterion. Additionally, for choosing the biasing parameter, a method of MSE under the ridge estimator is given. A real data analysis is provided to illustrate the theoretical results and a simulation study is conducted to characterize the performance of ridge and best linear unbiased estimators approach in the linear mixed model.  相似文献   

15.
The completely random character of radioactive disintegration provides the basis of a strong justification for a Poisson linear model for single-photon emission computed tomography data, which can be used to produce reconstructions of isotope densities, whether by maximum likelihood or Bayesian methods. However, such a model requires the construction of a matrix of weights, which represent the mean rates of arrival at each detector of photons originating from each point within the body space. Two methods of constructing these weights are discussed, and reconstructions resulting from phantom and real data are presented.  相似文献   

16.
The completely random character of radioactive disintegration provides the basis of a strong justification for a Poisson linear model for single-photon emission computed tomography data, which can be used to produce reconstructions of isotope densities, whether by maximum likelihood or Bayesian methods. However, such a model requires the construction of a matrix of weights, which represent the mean rates of arrival at each detector of photons originating from each point within the body space. Two methods of constructing these weights are discussed, and reconstructions resulting from phantom and real data are presented.  相似文献   

17.
Ecological Momentary Assessment is an emerging method of data collection in behavioral research that may be used to capture the times of repeated behavioral events on electronic devices, and information on subjects' psychological states through the electronic administration of questionnaires at times selected from a probability-based design as well as the event times. A method for fitting a mixed Poisson point process model is proposed for the impact of partially-observed, time-varying covariates on the timing of repeated behavioral events. A random frailty is included in the point-process intensity to describe variation among subjects in baseline rates of event occurrence. Covariate coefficients are estimated using estimating equations constructed by replacing the integrated intensity in the Poisson score equations with a design-unbiased estimator. An estimator is also proposed for the variance of the random frailties. Our estimators are robust in the sense that no model assumptions are made regarding the distribution of the time-varying covariates or the distribution of the random effects. However, subject effects are estimated under gamma frailties using an approximate hierarchical likelihood. The proposed approach is illustrated using smoking data.  相似文献   

18.
Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well‐known model‐based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross‐validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co‐data”. In particular, we present two novel examples that allow for co‐data: first, a Bayesian spike‐and‐slab setting that facilitates inclusion of multiple co‐data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.  相似文献   

19.
The generalized cross-validation (GCV) method has been a popular technique for the selection of tuning parameters for smoothing and penalty, and has been a standard tool to select tuning parameters for shrinkage models in recent works. Its computational ease and robustness compared to the cross-validation method makes it competitive for model selection as well. It is well known that the GCV method performs well for linear estimators, which are linear functions of the response variable, such as ridge estimator. However, it may not perform well for nonlinear estimators since the GCV emphasizes linear characteristics by taking the trace of the projection matrix. This paper aims to explore the GCV for nonlinear estimators and to further extend the results to correlated data in longitudinal studies. We expect that the nonlinear GCV and quasi-GCV developed in this paper will provide similar tools for the selection of tuning parameters in linear penalty models and penalized GEE models.  相似文献   

20.
We propose generalized linear models for time or age-time tables of seasonal counts, with the goal of better understanding seasonal patterns in the data. The linear predictor contains a smooth component for the trend and the product of a smooth component (the modulation) and a periodic time series of arbitrary shape (the carrier wave). To model rates, a population offset is added. Two-dimensional trends and modulation are estimated using a tensor product B-spline basis of moderate dimension. Further smoothness is ensured using difference penalties on the rows and columns of the tensor product coefficients. The optimal penalty tuning parameters are chosen based on minimization of a quasi-information criterion. Computationally efficient estimation is achieved using array regression techniques, avoiding excessively large matrices. The model is applied to female death rate in the US due to cerebrovascular diseases and respiratory diseases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号