期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Empirical Likelihood for Censored Linear Regression and Variable Selection

Tong Tong Wu Gang Li Chengyong Tang 《Scandinavian Journal of Statistics》2015,42(3):798-812

The linear regression model for right censored data, also known as the accelerated failure time model using the logarithm of survival time as the response variable, is a useful alternative to the Cox proportional hazards model. Empirical likelihood as a non‐parametric approach has been demonstrated to have many desirable merits thanks to its robustness against model misspecification. However, the linear regression model with right censored data cannot directly benefit from the empirical likelihood for inferences mainly because of dependent elements in estimating equations of the conventional approach. In this paper, we propose an empirical likelihood approach with a new estimating equation for linear regression with right censored data. A nested coordinate algorithm with majorization is used for solving the optimization problems with non‐differentiable objective function. We show that the Wilks' theorem holds for the new empirical likelihood. We also consider the variable selection problem with empirical likelihood when the number of predictors can be large. Because the new estimating equation is non‐differentiable, a quadratic approximation is applied to study the asymptotic properties of penalized empirical likelihood. We prove the oracle properties and evaluate the properties with simulated data. We apply our method to a Surveillance, Epidemiology, and End Results small intestine cancer dataset. 相似文献

2.

Marginalized transition random effect models for multivariate longitudinal binary data

Ozlem Ilk Michael J. Daniels 《Revue canadienne de statistique》2007,35(1):105-123

Generalized linear models with random effects and/or serial dependence are commonly used to analyze longitudinal data. However, the computation and interpretation of marginal covariate effects can be difficult. This led Heagerty (1999, 2002) to propose models for longitudinal binary data in which a logistic regression is first used to explain the average marginal response. The model is then completed by introducing a conditional regression that allows for the longitudinal, within‐subject, dependence, either via random effects or regressing on previous responses. In this paper, the authors extend the work of Heagerty to handle multivariate longitudinal binary response data using a triple of regression models that directly model the marginal mean response while taking into account dependence across time and across responses. Markov Chain Monte Carlo methods are used for inference. Data from the Iowa Youth and Families Project are used to illustrate the methods. 相似文献

3.

Estimation of a Semiparametric Recursive Bivariate Probit Model with Nonparametric Mixing

Giampiero Marra Georgios Papageorgiou Rosalba Radice 《Australian & New Zealand Journal of Statistics》2013,55(3):321-342

We consider an extension of the recursive bivariate probit model for estimating the effect of a binary variable on a binary outcome in the presence of unobserved confounders, nonlinear covariate effects and overdispersion. Specifically, the model consists of a system of two binary outcomes with a binary endogenous regressor which includes smooth functions of covariates, hence allowing for flexible functional dependence of the responses on the continuous regressors, and arbitrary random intercepts to deal with overdispersion arising from correlated observations on clusters or from the omission of non‐confounding covariates. We fit the model by maximizing a penalized likelihood using an Expectation‐Maximisation algorithm. The issues of automatic multiple smoothing parameter selection and inference are also addressed. The empirical properties of the proposed algorithm are examined in a simulation study. The method is then illustrated using data from a survey on health, aging and wealth. 相似文献

4.

An omnibus lack of fit test in logistic regression with sparse data

Ying Liu Paul I. Nelson Shie-Shien Yang 《Statistical Methods and Applications》2012,21(4):437-452

The usefulness of logistic regression depends to a great extent on the correct specification of the relation between a binary response and characteristics of the unit on which the response is recoded. Currently used methods for testing for misspecification (lack of fit) of a proposed logistic regression model do not perform well when a data set contains almost as many distinct covariate vectors as experimental units, a condition referred to as sparsity. A new algorithm for grouping sparse data to create pseudo replicates and using them to test for lack of fit is developed. A simulation study illustrates settings in which the new test is superior to existing ones. Analysis of a dataset consisting of the ages of menarche of Warsaw girls is also used to compare the new and existing lack of fit tests. 相似文献

5.

Phase II monitoring of binary profiles in the presence of within-profile autocorrelation based on Markov Model

Mohammad Reza Maleki Ali Reza Taheriyoun 《统计学通讯:模拟与计算》2017,46(10):7710-7732

This paper introduces a Markov model in Phase II profile monitoring with autocorrelated binary response variable. In the proposed approach, a logistic regression model is extended to describe the within-profile autocorrelation. The likelihood function is constructed and then a particle swarm optimization algorithm (PSO) is tuned and utilized to estimate the model parameters. Furthermore, two control charts are extended in which the covariance matrix is derived based on the Fisher information matrix. Simulation studies are conducted to evaluate the detecting capability of the proposed control charts. A numerical example is also given to illustrate the application of the proposed method. 相似文献

6.

Generalized Additive Models for Zero‐Inflated Data with Partial Constraints

HAI LIU KUNG‐SIK CHAN 《Scandinavian Journal of Statistics》2011,38(4):650-665

Abstract. Zero‐inflated data abound in ecological studies as well as in other scientific fields. Non‐parametric regression with zero‐inflated response may be studied via the zero‐inflated generalized additive model (ZIGAM) with a probabilistic mixture distribution of zero and a regular exponential family component. We propose the (partially) constrained ZIGAM, which assumes that some covariates affect the probability of non‐zero‐inflation and the regular exponential family distribution mean proportionally on the link scales. When the assumption obtains, the new approach provides a unified framework for modelling zero‐inflated data, which is more parsimonious and efficient than the unconstrained ZIGAM. We develop an iterative estimation algorithm, and discuss the confidence interval construction of the estimator. Some asymptotic properties are derived. We also propose a Bayesian model selection criterion for choosing between the unconstrained and constrained ZIGAMs. The new methods are illustrated with both simulated data and a real application in jellyfish abundance data analysis. 相似文献

7.

Sequential Designs for Binary Data with the Purpose to Maximize the Probability of Response

Ellinor Fackle Fornius 《统计学通讯:模拟与计算》2013,42(6):1219-1238

Two kinds of sequential designs are proposed for finding the point that maximizes the probability of response assuming a binary response variable and a quadratic logistic regression model. One is a parametric optimal design approach, and the other one is a nonparametric stochastic approximation approach. The suggested sequential designs are evaluated and compared in a simulation study. In summary, the parametric approach performed very well whereas its competitor failed in some cases. 相似文献

8.

Simulation-based Inference in a Zero-inflated Bernoulli Regression Model

Aba Diop Aliou Diop 《统计学通讯:模拟与计算》2016,45(10):3597-3614

The logistic regression model has become a standard tool to investigate the relationship between a binary outcome and a set of potential predictors. When analyzing binary data, it often arises that the observed proportion of zeros is greater than expected under the postulated logistic model. Zero-inflated binomial (ZIB) models have been developed to fit binary data that contain too many zeros. Maximum likelihood estimators in these models have been proposed and their asymptotic properties established. Several aspects of ZIB models still deserve attention however, such as the estimation of odds-ratios and event probabilities. In this article, we propose estimators of these quantities and we investigate their properties both theoretically and via simulations. Based on these results, we provide recommendations about the range of conditions (minimum sample size, maximum proportion of zeros in excess) under which a reliable statistical inference on the odds-ratios and event probabilities can be obtained in a ZIB regression model. A real-data example illustrates the proposed estimators. 相似文献

9.

Bayesian adaptive Lasso for quantile regression models with nonignorably missing response data

Dengke Xu Niansheng Tang 《统计学通讯:模拟与计算》2013,42(9):2727-2742

Abstract

Handling data with the nonignorably missing mechanism is still a challenging problem in statistics. In this paper, we develop a fully Bayesian adaptive Lasso approach for quantile regression models with nonignorably missing response data, where the nonignorable missingness mechanism is specified by a logistic regression model. The proposed method extends the Bayesian Lasso by allowing different penalization parameters for different regression coefficients. Furthermore, a hybrid algorithm that combined the Gibbs sampler and Metropolis-Hastings algorithm is implemented to simulate the parameters from posterior distributions, mainly including regression coefficients, shrinkage coefficients, parameters in the non-ignorable missing models. Finally, some simulation studies and a real example are used to illustrate the proposed methodology. 相似文献

10.

PARAMETRIC FRACTIONAL IMPUTATION FOR NON‐IGNORABLE CATEGORICAL MISSING DATA WITH FOLLOW‐UP

Ji Young Kim 《Australian & New Zealand Journal of Statistics》2012,54(2):239-250

Incomplete data subject to non‐ignorable non‐response are often encountered in practice and have a non‐identifiability problem. A follow‐up sample is randomly selected from the set of non‐respondents to avoid the non‐identifiability problem and get complete responses. Glynn, Laird, & Rubin analyzed non‐ignorable missing data with a follow‐up sample under a pattern mixture model. In this article, maximum likelihood estimation of parameters of the categorical missing data is considered with a follow‐up sample under a selection model. To estimate the parameters with non‐ignorable missing data, the EM algorithm with weighting, proposed by Ibrahim, is used. That is, in the E‐step, the weighted mean is calculated using the fractional weights for imputed data. Variances are estimated using the approximated jacknife method. Simulation results are presented to compare the proposed method with previously presented methods. 相似文献

11.

Orthogonalized Residuals for Estimation of Marginally Specified Association Parameters in Multivariate Binary Data

BAHJAT F. QAQISH RICHARD C. ZINK JOHN S. PREISSER 《Scandinavian Journal of Statistics》2012,39(3):515-527

Abstract. This paper focuses on marginal regression models for correlated binary responses when estimation of the association structure is of primary interest. A new estimating function approach based on orthogonalized residuals is proposed. A special case of the proposed procedure allows a new representation of the alternating logistic regressions method through marginal residuals. The connections between second‐order generalized estimating equations, alternating logistic regressions, pseudo‐likelihood and other methods are explored. Efficiency comparisons are presented, with emphasis on variable cluster size and on the role of higher‐order assumptions. The new method is illustrated with an analysis of data on impaired pulmonary function. 相似文献

12.

Bias due to Ignoring the Sample Design in Case–Control Studies

John M. Neuhaus 《Australian & New Zealand Journal of Statistics》2002,44(3):285-293

Case–control studies allow efficient estimation of the associations of covariates with a binary response in settings where the probability of a positive response is small. It is well known that covariate–response associations can be consistently estimated using a logistic model by acting as if the case–control (retrospective) data were prospective, and that this result does not hold for other binary regression models. However, in practice an investigator may be interested in fitting a non–logistic link binary regression model and this paper examines the magnitude of the bias resulting from ignoring the case–control sample design with such models. The paper presents an approximation to the magnitude of this bias in terms of the sampling rates of cases and controls, as well as simulation results that show that the bias can be substantial. 相似文献

13.

Heteroscedastic and heavy-tailed regression with mixtures of skew Laplace normal distributions

Fatma Zehra Doğru Keming Yu Olcay Arslan 《Journal of Statistical Computation and Simulation》2019,89(17):3213-3240

Joint modelling skewness and heterogeneity is challenging in data analysis, particularly in regression analysis which allows a random probability distribution to change flexibly with covariates. This paper, based on a skew Laplace normal (SLN) mixture of location, scale, and skewness, introduces a new regression model which provides a flexible modelling of location, scale and skewness parameters simultaneously. The maximum likelihood (ML) estimators of all parameters of the proposed model via the expectation-maximization (EM) algorithm as well as their asymptotic properties are derived. Numerical analyses via a simulation study and a real data example are used to illustrate the performance of the proposed model. 相似文献

14.

Simple Formula for Calculating Bias‐corrected AIC in Generalized Linear Models

Shinpei Imori Hirokazu Yanagihara Hirofumi Wakaki 《Scandinavian Journal of Statistics》2014,41(2):535-555

In real‐data analysis, deciding the best subset of variables in regression models is an important problem. Akaike's information criterion (AIC) is often used in order to select variables in many fields. When the sample size is not so large, the AIC has a non‐negligible bias that will detrimentally affect variable selection. The present paper considers a bias correction of AIC for selecting variables in the generalized linear model (GLM). The GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields. In the present study, we obtain a simple expression for a bias‐corrected AIC (corrected AIC, or CAIC) in GLMs. Furthermore, we provide an ‘R’ code based on our formula. A numerical study reveals that the CAIC has better performance than the AIC for variable selection. 相似文献

15.

Estimation in Regressive Logistic Regression Analyses of Familial Data with Missing Outcomes

Patrick E.B. FitzGerald & Matthew W. Knuiman 《Australian & New Zealand Journal of Statistics》1998,40(3):305-316

This paper examines a number of methods of handling missing outcomes in regressive logistic regression modelling of familial binary data, and compares them with an EM algorithm approach via a simulation study. The results indicate that a strategy based on imputation of missing values leads to biased estimates, and that a strategy of excluding incomplete families has a substantial effect on the variability of the parameter estimates. Recommendations are made which depend, amongst other factors, on the amount of missing data and on the availability of software. 相似文献

16.

Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

J. G. Ibrahim S. R. Lipsitz & M.-H. Chen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):173-190

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed. 相似文献

17.

Graphics for studying logistic regression models

Luca Scrucca 《Statistical Methods and Applications》2002,11(3):371-394

In this article we focus on logistic regression models for binary responses. An existing result shows that the log-odds can be modelled depending on the log of the ratio between the conditional densities of the predictors given the response variable. This suggests that relevant statistical information could be extracted investigating the inverse problem. Thus, we present different methods for studying the log-density ratio through graphs, which allow us to select which predictors are needed, and how they should be included in a logistic regression model. We also discuss data analysis examples based on real datasets available in literature in order to provide further insights into the methodology proposed. 相似文献

18.

A SEMIPARAMETRIC REGRESSION MODEL WITH MISSING COVARIATES IN CONTINUOUS-TIME CAPTURE-RECAPTURE STUDIES

Yan Wang 《Australian & New Zealand Journal of Statistics》2005,47(3):287-297

Covariate data were missing when a semiparametric regression model was used to study bird abundance in the Mai Po Sanctuary, Hong Kong. This paper proposes an EM‐type algorithm to estimate the regression parameters for that study. Analytical calculation of the expectation in the EM method is difficult, or even impossible, especially when missing covariates are continuous. A Monte Carlo method is used in the EM algorithm to ease the calculation complexity. Asymptotic variances of the parameter estimates are also derived. Properties of the proposed estimators are assessed through numerical simulations and a real example. 相似文献

19.

Relative Risk Regression for Binary Outcomes: Methods and Recommendations

下载免费PDF全文

Ian C. Marschner 《Australian & New Zealand Journal of Statistics》2015,57(4):437-462

Relative risks are often considered preferable to odds ratios for quantifying the association between a predictor and a binary outcome. Relative risk regression is an alternative to logistic regression where the parameters are relative risks rather than odds ratios. It uses a log link binomial generalised linear model, or log‐binomial model, which requires parameter constraints to prevent probabilities from exceeding 1. This leads to numerical problems with standard approaches for finding the maximum likelihood estimate (MLE), such as Fisher scoring, and has motivated various non‐MLE approaches. In this paper we discuss the roles of the MLE and its main competitors for relative risk regression. It is argued that reliable alternatives to Fisher scoring mean that numerical issues are no longer a motivation for non‐MLE methods. Nonetheless, non‐MLE methods may be worthwhile for other reasons and we evaluate this possibility for alternatives within a class of quasi‐likelihood methods. The MLE obtained using a reliable computational method is recommended, but this approach requires bootstrapping when estimates are on the parameter space boundary. If convenience is paramount, then quasi‐likelihood estimation can be a good alternative, although parameter constraints may be violated. Sensitivity to model misspecification and outliers is also discussed along with recommendations and priorities for future research. 相似文献

20.

Inference in Semi‐Parametric Dynamic Models for Repeated Count Data

下载免费PDF全文

Brajendra C. Sutradhar K.V. Vineetha Warriyar Nan Zheng 《Australian & New Zealand Journal of Statistics》2016,58(3):397-434

This paper deals with a longitudinal semi‐parametric regression model in a generalised linear model setup for repeated count data collected from a large number of independent individuals. To accommodate the longitudinal correlations, we consider a dynamic model for repeated counts which has decaying auto‐correlations as the time lag increases between the repeated responses. The semi‐parametric regression function involved in the model contains a specified regression function in some suitable time‐dependent covariates and a non‐parametric function in some other time‐dependent covariates. As far as the inference is concerned, because the non‐parametric function is of secondary interest, we estimate this function consistently using the independence assumption‐based well‐known quasi‐likelihood approach. Next, the proposed longitudinal correlation structure and the estimate of the non‐parametric function are used to develop a semi‐parametric generalised quasi‐likelihood approach for consistent and efficient estimation of the regression effects in the parametric regression function. The finite sample performance of the proposed estimation approach is examined through an intensive simulation study based on both large and small samples. Both balanced and unbalanced cluster sizes are incorporated in the simulation study. The asymptotic performances of the estimators are given. The estimation methodology is illustrated by reanalysing the well‐known health care utilisation data consisting of counts of yearly visits to a physician by 180 individuals for four years and several important primary and secondary covariates. 相似文献