期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bias Reduction in Logistic Regression with Missing Responses When the Missing Data Mechanism is Nonignorable

《The American statistician》2012,66(4):340-349

ABSTRACT

In logistic regression with nonignorable missing responses, Ibrahim and Lipsitz proposed a method for estimating regression parameters. It is known that the regression estimates obtained by using this method are biased when the sample size is small. Also, another complexity arises when the iterative estimation process encounters separation in estimating regression coefficients. In this article, we propose a method to improve the estimation of regression coefficients. In our likelihood-based method, we penalize the likelihood by multiplying it by a noninformative Jeffreys prior as a penalty term. The proposed method reduces bias and is able to handle the issue of separation. Simulation results show substantial bias reduction for the proposed method as compared to the existing method. Analyses using real world data also support the simulation findings. An R package called brlrmr is developed implementing the proposed method and the Ibrahim and Lipsitz method. 相似文献

2.

基于GMM的缺失数据回归模型的半参数估计

邓明《统计与信息论坛》2013,28(3):9-15

响应变量存在数据缺失的情况广泛出现在社会经济研究中,对响应变量存在数据缺失的回归模型提出了一个在矩估计框架下的单一的半参数估计量,这种估计量保留了参数回归估计量与非参数匹配估计量的特性,从而使得该估计量既能在响应变量被观测的子样本中保持较好的拟合性,又能够降低响应变量未被观测的子样本的估计误差,并且证明了这种估计量是一致、渐进正态估计量。相似文献

3.

缺失偏态数据下线性回归模型的统计推断 总被引：1，自引：2，他引：1

吴刘仓张家茂邱贻涛《统计与信息论坛》2013,28(9):22-26

研究缺失偏态数据下线性回归模型的参数估计问题,针对缺失偏态数据,为克服样本分布扭曲缺点和提高模型的回归系数、尺度参数和偏度参数的估计效果,提出了一种适合偏态数据下线性回归模型中缺失数据的修正回归插补方法.通过随机模拟和实例研究,并与均值插补、回归插补、随机回归插补方法比较,结果表明所提出的修正回归插补方法是有效可行的. 相似文献

4.

含缺失数据的半参数模型的稳健估计

丁先文张文袁红陈雪平《统计与决策》2022,(1)

文章在响应变量随机缺失下,基于分位数回归研究了半参数模型的稳健估计问题。首先基于B样条基函数近似技术,将模型非参数函数的估计问题转化为样条系数向量估计问题;其次,在响应变量随机缺失下,提出了一种新的插补方法,对缺失的响应变量进行多重插补;再次,基于插补后的数据集,构造出新的分位数目标函数,得到模型非参数函数以及参数向量的稳健估计;最后给出了有效算法计算多重插补估计量。通过模拟研究验证了所提方法的有效性和稳健性。相似文献

5.

Resampling for Order Estimation of Autoregressive Models with Missing Data

Abdelaziz El Matouat Freedath Djibril Moussa Hassania Hamzaoui 《统计学通讯:模拟与计算》2015,44(5):1187-1196

In this article, we consider the order estimation of autoregressive models with incomplete data using the expectation–maximization (EM) algorithm-based information criteria. The criteria take the form of a penalization of the conditional expectation of the log-likelihood. The evaluation of the penalization term generally involves numerical differentiation and matrix inversion. We introduce a simplification of the penalization term for autoregressive model selection and we propose a penalty factor based on a resampling procedure in the criteria formula. The simulation results show the improvements yielded by the proposed method when compared with the classical information criteria for model selection with incomplete data. 相似文献

6.

Mixed Graphical Models with Missing Data and the Partial Imputation EM Algorithm 总被引：2，自引：0，他引：2

Zhi Geng Kang Wan & Feng Tao 《Scandinavian Journal of Statistics》2000,27(3):433-444

In this paper we discuss graphical models for mixed types of continuous and discrete variables with incomplete data. We use a set of hyperedges to represent an observed data pattern. A hyperedge is a set of variables observed for a group of individuals. In a mixed graph with two types of vertices and two types of edges, dots and circles represent discrete and continuous variables respectively. A normal graph represents a graphical model and a hypergraph represents an observed data pattern. In terms of the mixed graph, we discuss decomposition of mixed graphical models with incomplete data, and we present a partial imputation method which can be used in the EM algorithm and the Gibbs sampler to speed their convergence. For a given mixed graphical model and an observed data pattern, we try to decompose a large graph into several small ones so that the original likelihood can be factored into a product of likelihoods with distinct parameters for small graphs. For the case that a graph cannot be decomposed due to its observed data pattern, we can impute missing data partially so that the graph can be decomposed. 相似文献

7.

Infinite Parameter Estimates in Logistic Regression, with Application to Approximate Conditional Inference

John E. Kolassa 《Scandinavian Journal of Statistics》1997,24(4):523-530

This paper discusses recovery of information regarding logistic regression parameters in cases when maximum likelihood estimates of some parameters are infinite. An algorithm for detecting such cases and characterizing the divergence of the parameter estimates is presented. A method for fitting the remaining parameters is also presented . All of these methods rely only on sufficient statistics rather than less aggregated quantities, as required for inference according to the method of Kolassa & Tanner (1994). These results are applied to approximate conditional inference via saddlepoint methods. Specifically, the double saddlepoint method of Skovgaard (1987) is adapted to the case when the solution to the saddlepoint equations exists as a point at infinity 相似文献

8.

Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

J. G. Ibrahim S. R. Lipsitz & M.-H. Chen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):173-190

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed. 相似文献

9.

Maximum Likelihood Estimation of Logistic Regression Parameters under Two-phase, Outcome-dependent Sampling

Norman E. Breslow & Richard Holubkov 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1997,59(2):447-461

Outcome-dependent sampling increases the efficiency of studies of rare outcomes, examples being case—control studies in epidemiology and choice–based sampling in econometrics. Two-phase or double sampling is a standard technique for drawing efficient stratified samples. We develop maximum likelihood estimation of logistic regression coefficients for a hybrid two-phase, outcome–dependent sampling design. An algorithm is given for determining the estimates by repeated fitting of ordinary logistic regression models. Simulation results demonstrate the efficiency loss associated with alternative pseudolikelihood and weighted likelihood methods for certain data configurations. These results provide an efficient solution to the measurement error problem with validation sampling based on a discrete surrogate. 相似文献

10.

Mixtures of Linear Regression with Measurement Errors

Weixin Yao Weixing Song 《统计学通讯:理论与方法》2013,42(8):1602-1614

Existing research on mixtures of regression models are limited to directly observed predictors. The estimation of mixtures of regression for measurement error data imposes challenges for statisticians. For linear regression models with measurement error data, the naive ordinary least squares method, which directly substitutes the observed surrogates for the unobserved error-prone variables, yields an inconsistent estimate for the regression coefficients. The same inconsistency also happens to the naive mixtures of regression estimate, which is based on the traditional maximum likelihood estimator and simply ignores the measurement error. To solve this inconsistency, we propose to use the deconvolution method to estimate the mixture likelihood of the observed surrogates. Then our proposed estimate is found by maximizing the estimated mixture likelihood. In addition, a generalized EM algorithm is also developed to find the estimate. The simulation results demonstrate that the proposed estimation procedures work well and perform much better than the naive estimates. 相似文献

11.

Estimation of regression parameters in missing data problems

Donald L. Mcleish Cyntha A. Struthers 《Revue canadienne de statistique》2006,34(2):233-259

Let Y be a response variable, possibly multivariate, with a density function f (y|x, v; β) conditional on vectors x and v of covariates and a vector β of unknown parameters. The authors consider the problem of estimating β when the values taken by the covariate vector v are available for all observations while some of those taken by the covariate x are missing at random. They compare the profile estimator to several alternatives, both in terms of bias and standard deviation, when the response and covariates are discrete or continuous. 相似文献

12.

多辅助变量线性组合的回归估计

卢静莉丁昌江闫在在《统计与信息论坛》2010,25(5):14-17

以提高估计量的精度为目的,定义了一种新的多个辅助变量的回归估计法,从理论上研究了该方法下权的选取方法,并将该方法下的估计量与Raj多辅助变量回归估计量、Ghosh多元线性回归估计量在精度上进行了数值比较,结果表明：这种新的多辅助变量的回归估计法在精度上优于Raj多辅助变量回归估计和Ghosh多元线性回归估计。相似文献

13.

Ridge Regression for Estimation of Transition Probabilities from Aggregate Data

Inderdeep Kaur M. B. Rajarshi 《统计学通讯:模拟与计算》2013,42(4):524-530

When data from several independent Markov chains are aggregated over each time point, least square estimation of transition probabilities faces the problem of multi-collinearity. We propose here an estimation procedure which involves use of ridge regression for the ordinary least square estimators. Performance of this estimator is then compared with that of the ordinary least squares. 相似文献

14.

Rank Estimation of Log-Linear Regression with Interval-Censored Data

Li L Pu Z 《Lifetime data analysis》2003,9(1):57-70

Interval-censored data arise in a wide variety of research and application fields such as cancer and AIDS studies. In this paper, we study a log-linear regression model when data are subject to interval censoring. We use a U-statistic based on ranks to estimate regression coefficients and establish large sample properties of the estimator. We illustrate the performance of the proposed estimate with simulations and a numerical example. 相似文献

15.

Robust logistic regression of family data in the presence of missing genotypes

Yanping Qiu 《Journal of applied statistics》2019,46(5):926-945

Large cohort studies are commonly launched to study the risk effect of genetic variants or other risk factors on a chronic disorder. In these studies, family data are often collected to provide additional information for the purpose of improving the inference results. Statistical analysis of the family data can be very challenging due to the missing observations of genotypes, incomplete records of disease occurrences in family members, and the complicated dependence attributed to the shared genetic background and environmental factors. In this article, we investigate a class of logistic models with family-shared random effects to tackle these challenges, and develop a robust regression method based on the conditional logistic technique for statistical inference. An expectation–maximization (EM) algorithm with fast computation speed is developed to handle the missing genotypes. The proposed estimators are shown to be consistent and asymptotically normal. Additionally, a score test based on the proposed method is derived to test the genetic effect. Extensive simulation studies demonstrate that the proposed method performs well in finite samples in terms of estimate accuracy, robustness and computational speed. The proposed procedure is applied to an Alzheimer's disease study. 相似文献

16.

Maximum Likelihood Estimation in a Semiparametric Logistic/Proportional-Hazards Mixture Model 总被引：2，自引：0，他引：2

HONG-BIN FANG GANG LI JIANGUO SUN 《Scandinavian Journal of Statistics》2005,32(1):59-75

Abstract. We consider large sample inference in a semiparametric logistic/proportional-hazards mixture model. This model has been proposed to model survival data where there exists a positive portion of subjects in the population who are not susceptible to the event under consideration. Previous studies of the logistic/proportional-hazards mixture model have focused on developing point estimation procedures for the unknown parameters. This paper studies large sample inferences based on the semiparametric maximum likelihood estimator. Specifically, we establish existence, consistency and asymptotic normality results for the semiparametric maximum likelihood estimator. We also derive consistent variance estimates for both the parametric and non-parametric components. The results provide a theoretical foundation for making large sample inference under the logistic/proportional-hazards mixture model. 相似文献

17.

Guaranteed Local Maximum Likelihood Detection of a Change Point in Nonparametric Logistic Regression

A. Vexler G. Gurevich 《统计学通讯:理论与方法》2013,42(4):711-726

We consider nonparametric logistic regression and propose a generalized likelihood test for detecting a threshold effect that indicates a relationship between some risk factor and a defined outcome above the threshold but none below it. One important field of application is occupational medicine and in particular, epidemiological studies. In epidemiological studies, segmented fully parametric logistic regression models are often threshold models, where it is assumed that the exposure has no influence on a response up to a possible unknown threshold, and has an effect beyond that threshold. Finding efficient methods for detection and estimation of a threshold is a very important task in these studies. This article proposes such methods in a context of nonparametric logistic regression. We use a local version of unknown likelihood functions and show that under rather common assumptions the asymptotic power of our test is one. We present a guaranteed non asymptotic upper bound for the significance level of the proposed test. If applying the test yields the acceptance of the conclusion that there was a change point (and hence a threshold limit value), we suggest using the local maximum likelihood estimator of the change point and consider the asymptotic properties of this estimator. 相似文献

18.

Estimation of the mixtures of GLMs with covariate-dependent mixing proportions

Xing Wu 《统计学通讯:理论与方法》2013,42(24):7242-7257

ABSTRACT

In this article, we study the estimation for a class of semiparametric mixtures of generalized linear models where mixing proportions depend on a covariate non parametrically. We investigate a backfitting estimation procedure and show the asymptotic normality of the proposed estimators under mild conditions. We conduct simulation to show the good performance of our methodology and give a real data analysis as an illustration. 相似文献

19.

Empirical Likelihood Inference of the Partial Linear Isotonic Errors-in-variables Regression Models with Missing Data

Zhimeng Sun 《统计学通讯:模拟与计算》2016,45(2):671-688

This article is concerned with statistical inference of the partial linear isotonic regression model missing response and measurement errors in covariates. We proposed an empirical likelihood ratio test statistics and show that it has a limiting weighted chi-square distribution. An adjusted empirical likelihood ratio statistic, which is shown to have a limiting standard central chi-square distribution, is then proposed further. A maximum empirical likelihood estimator is also developed. A simulation study is conducted to examine the finite-sample property of proposed procedure. 相似文献

20.

Regression Analysis of Length‐biased and Right‐censored Failure Time Data with Missing Covariates

下载免费PDF全文

Na Hu Xuerong Chen Jianguo Sun 《Scandinavian Journal of Statistics》2015,42(2):438-452

Length‐biased and right‐censored failure time data arise from many fields, and their analysis has recently attracted a great deal of attention. Two examples of the areas that often produce such data are epidemiological studies and cancer screening trials. In this paper, we discuss regression analysis of such data in the presence of missing covariates, for which no established inference procedure seems to exist. For the problem, we consider the data arising from the proportional hazards model and propose two inverse probability weighted estimation procedures. The asymptotic properties of the resulting estimators are established, and the extensive simulation study conducted for the evaluation of the proposed methods suggests that they work well for practical situations. 相似文献