期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

J. G. Ibrahim S. R. Lipsitz & M.-H. Chen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):173-190

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed. 相似文献

2.

Estimation in the Cox cure model with covariates missing not at random,with application to disease screening/prediction

Lisha Guo Yi Xiong X. Joan Hu 《Revue canadienne de statistique》2020,48(4):608-632

In an attempt to provide a statistical tool for disease screening and prediction, we propose a semiparametric approach to analysis of the Cox proportional hazards cure model in situations where the observations on the event time are subject to right censoring and some covariates are missing not at random. To facilitate the methodological development, we begin with semiparametric maximum likelihood estimation (SPMLE) assuming that the (conditional) distribution of the missing covariates is known. A variant of the EM algorithm is used to compute the estimator. We then adapt the SPMLE to a more practical situation where the distribution is unknown and there is a consistent estimator based on available information. We establish the consistency and weak convergence of the resulting pseudo-SPMLE, and identify a suitable variance estimator. The application of our inference procedure to disease screening and prediction is illustrated via empirical studies. The proposed approach is used to analyze the tuberculosis screening study data that motivated this research. Its finite-sample performance is examined by simulation. 相似文献

3.

Pseudo likelihood-based estimation and testing of missingness mechanism function in nonignorable missing data problems

Xuerong Chen Guoqing Diao Jing Qin 《Scandinavian Journal of Statistics》2020,47(4):1377-1400

In nonignorable missing response problems, we study a semiparametric model with unspecified missingness mechanism model and a exponential family model for response conditional density. Even though existing methods are available to estimate the parameters in exponential family, estimation or testing of the missingness mechanism model nonparametrically remains to be an open problem. By defining a “synthesis" density involving the unknown missingness mechanism model and the known baseline “carrier" density in the exponential family model, we treat this “synthesis" density as a legitimate one with biased sampling version. We develop maximum pseudo likelihood estimation procedures and the resultant estimators are consistent and asymptotically normal. Since the “synthesis" cumulative distribution is a functional of the missingness mechanism model and the known carrier density, proposed method can be used to test the correctness of the missingness mechanism model nonparametrically andindirectly. Simulation studies and real example demonstrate the proposed methods perform very well. 相似文献

4.

An efficient model-free estimation of multiclass conditional probability

Tu Xu Junhui Wang 《Journal of statistical planning and inference》2013

Conventional multiclass conditional probability estimation methods, such as Fisher's discriminate analysis and logistic regression, often require restrictive distributional model assumption. In this paper, a model-free estimation method is proposed to estimate multiclass conditional probability through a series of conditional quantile regression functions. Specifically, the conditional class probability is formulated as a difference of corresponding cumulative distribution functions, where the cumulative distribution functions can be converted from the estimated conditional quantile regression functions. The proposed estimation method is also efficient as its computation cost does not increase exponentially with the number of classes. The theoretical and numerical studies demonstrate that the proposed estimation method is highly competitive against the existing competitors, especially when the number of classes is relatively large. 相似文献

5.

Efficient Robust Estimation for Linear Models with Missing Response at Random

《Scandinavian Journal of Statistics》2018,45(2):366-381

Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an IC_Q‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets. 相似文献

6.

Bivariate Kumaraswamy distribution: properties and a new method to generate bivariate classes

Wagner Barreto-Souza Artur J. Lemonte 《Statistics》2013,47(6):1321-1342

In this paper, we introduce a bivariate Kumaraswamy (BVK) distribution whose marginals are Kumaraswamy distributions. The cumulative distribution function of this bivariate model has absolutely continuous and singular parts. Representations for the cumulative and density functions are presented and properties such as marginal and conditional distributions, product moments and conditional moments are obtained. We show that the BVK model can be obtained from the Marshall and Olkin survival copula and obtain a tail dependence measure. The estimation of the parameters by maximum likelihood is discussed and the Fisher information matrix is determined. We propose an EM algorithm to estimate the parameters. Some simulations are presented to verify the performance of the direct maximum-likelihood estimation and the proposed EM algorithm. We also present a method to generate bivariate distributions from our proposed BVK distribution. Furthermore, we introduce a BVK distribution which has only an absolutely continuous part and discuss some of its properties. Finally, a real data set is analysed for illustrative purposes. 相似文献

7.

缺失偏态数据下线性回归模型的统计推断

吴刘仓张家茂邱贻涛《统计与信息论坛》2013,28(9):22-26

研究缺失偏态数据下线性回归模型的参数估计问题,针对缺失偏态数据,为克服样本分布扭曲缺点和提高模型的回归系数、尺度参数和偏度参数的估计效果,提出了一种适合偏态数据下线性回归模型中缺失数据的修正回归插补方法.通过随机模拟和实例研究,并与均值插补、回归插补、随机回归插补方法比较,结果表明所提出的修正回归插补方法是有效可行的. 相似文献

8.

Estimation of k-Factor GIGARCH Process: A Monte Carlo Study

Abdou Kâ Diongue Dominique Guégan 《统计学通讯:模拟与计算》2013,42(10):2037-2049

In this article, we discuss the parameter estimation for a k-factor generalized long-memory process with conditionally heteroskedastic noise. Two estimation methods are proposed. The first method is based on the conditional distribution of the process and the second is obtained as an extension of Whittle's estimation approach. For comparison purposes, Monte Carlo simulations are used to evaluate the finite sample performance of these estimation techniques, using four different conditional distribution functions. 相似文献

9.

Variable selection for high-dimensional generalized linear model with block-missing data

Yifan He Yang Feng Xinyuan Song 《Scandinavian Journal of Statistics》2023,50(3):1279-1297

In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block-wise missing data either focus on the single-block missing pattern or heavily rely on the model structure. In this study, we propose a single regression-based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block-wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method. 相似文献

10.

Consistency of information criteria for model selection with missing data

Abdelaziz El Matouat Freedath Djibril Moussa Hassania Hamzaoui 《统计学通讯:理论与方法》2013,42(23):6900-6914

ABSTRACT

In this paper, we investigate the consistency of the Expectation Maximization (EM) algorithm-based information criteria for model selection with missing data. The criteria correspond to a penalization of the conditional expectation of the complete data log-likelihood given the observed data and with respect to the missing data conditional density. We present asymptotic properties related to maximum likelihood estimation in the presence of incomplete data and we provide sufficient conditions for the consistency of model selection by minimizing the information criteria. Their finite sample performance is illustrated through simulation and real data studies. 相似文献

11.

Conditional mode estimation for functional stationary ergodic data with responses missing at random

Nengxiang Ling Yang Liu Philippe Vieu 《Statistics》2016,50(5):991-1013

In this paper, we investigate the asymptotic properties of a non-parametric conditional mode estimation given a functional explanatory variable, when functional stationary ergodic data and missing at random responses are observed. First of all, we establish asymptotic properties for a conditional density estimator from which we derive almost sure convergence (with rate) and asymptotic normality of a conditional mode estimator. This new estimate take into account missing data, and a simulation study is performed to illustrate how this fact allows to get higher predictive performances than those obtained with standard estimates. 相似文献

12.

Cure rate survival models with missing covariates: a simulation study

Renata Santana Fonseca Heleno Bolfarine 《Journal of Statistical Computation and Simulation》2013,83(1):97-113

In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence. 相似文献

13.

Skew-normal factor analysis models with incomplete data

M. Liu 《Journal of applied statistics》2015,42(4):789-805

Traditional factor analysis (FA) rests on the assumption of multivariate normality. However, in some practical situations, the data do not meet this assumption; thus, the statistical inference made from such data may be misleading. This paper aims at providing some new tools for the skew-normal (SN) FA model when missing values occur in the data. In such a model, the latent factors are assumed to follow a restricted version of multivariate SN distribution with additional shape parameters for accommodating skewness. We develop an analytically feasible expectation conditional maximization algorithm for carrying out parameter estimation and imputation of missing values under missing at random mechanisms. The practical utility of the proposed methodology is illustrated with two real data examples and the results are compared with those obtained from the traditional FA counterparts. 相似文献

14.

Change-point detection in a linear model by adaptive fused quantile method

Gabriela Ciuperca Matúš Maciak 《Scandinavian Journal of Statistics》2020,47(2):425-463

A novel approach to quantile estimation in multivariate linear regression models with change-points is proposed: the change-point detection and the model estimation are both performed automatically, by adopting either the quantile-fused penalty or the adaptive version of the quantile-fused penalty. These two methods combine the idea of the check function used for the quantile estimation and the L₁ penalization principle known from the signal processing and, unlike some standard approaches, the presented methods go beyond typical assumptions usually required for the model errors, such as sub-Gaussian or normal distribution. They can effectively handle heavy-tailed random error distributions, and, in general, they offer a more complex view on the data as one can obtain any conditional quantile of the target distribution, not just the conditional mean. The consistency of detection is proved and proper convergence rates for the parameter estimates are derived. The empirical performance is investigated via an extensive comparative simulation study and practical utilization is demonstrated using a real data example. 相似文献

15.

Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation

Johan H. Koskinen Garry L. Robins Philippa E. Pattison 《Statistical Methodology》2010,7(3):366-384

Missing data are often problematic in social network analysis since what is missing may potentially alter the conclusions about what we have observed as tie-variables need to be interpreted in relation to their local neighbourhood and the global structure. Some ad hoc methods for dealing with missing data in social networks have been proposed but here we consider a model-based approach. We discuss various aspects of fitting exponential family random graph (or p-star) models (ERGMs) to networks with missing data and present a Bayesian data augmentation algorithm for the purpose of estimation. This involves drawing from the full conditional posterior distribution of the parameters, something which is made possible by recently developed algorithms. With ERGMs already having complicated interdependencies, it is particularly important to provide inference that adequately describes the uncertainty, something that the Bayesian approach provides. To the extent that we wish to explore the missing parts of the network, the posterior predictive distributions, immediately available at the termination of the algorithm, are at our disposal, which allows us to explore the distribution of what is missing unconditionally on any particular parameter values. Some important features of treating missing data and of the implementation of the algorithm are illustrated using a well-known collaboration network and a variety of missing data scenarios. 相似文献

16.

Robust nonparametric estimation with missing data

Graciela Boente Wenceslao González–Manteiga Ana Pérez–González 《Journal of statistical planning and inference》2009

In this paper, under a nonparametric regression model, we introduce two families of robust procedures to estimate the regression function when missing data occur in the response. The first proposal is based on a local M

M

-functional applied to the conditional distribution function estimate adapted to the presence of missing data. The second proposal imputes the missing responses using the local M

M

-smoother based on the observed sample and then estimates the regression function with the completed sample. We show that the robust procedures considered are consistent and asymptotically normally distributed. A robust procedure to select the smoothing parameter is also discussed. 相似文献

17.

Semiparametric jump-preserving estimation for single-index models

Guoxiang Liu Mengmeng Wang Jinguan Lin Qibing Gao 《Journal of nonparametric statistics》2018,30(3):556-580

Estimation of the single-index model with a discontinuous unknown link function is considered in this paper. Existed refined minimum average variance estimation (rMAVE) method can estimate the single-index parameter and unknown link function simultaneously by minimising the average pointwise conditional variance, where the conditional variance can be estimated using the local linear fit method with centred kernel function. When there are jumps in the link function, big biases around jumps can appear. For this reason, we embed the jump-preserving technique in the rMAVE method, then propose an adaptive jump-preserving estimation procedure for the single-index model. Concretely speaking, the conditional variance is obtained by the one among local linear fits with centred, left-sided and right-sided kernel functions who has minimum weighted residual mean squares. The resulting estimators can preserve the jumps well and also give smooth estimates of the continuity parts. Asymptotic properties are established under some mild conditions. Simulations and real data analysis show the proposed method works well. 相似文献

18.

A Profile Conditional Likelihood Approach for the Semiparametric Transformation Regression Model with Missing Covariates

Hua Yun Chen Roderick J. Little 《Lifetime data analysis》2001,7(3):207-224

We propose a profile conditional likelihood approach to handle missing covariates in the general semiparametric transformation regression model. The method estimates the marginal survival function by the Kaplan-Meier estimator, and then estimates the parameters of the survival model and the covariate distribution from a conditional likelihood, substituting the Kaplan-Meier estimator for the marginal survival function in the conditional likelihood. This method is simpler than full maximum likelihood approaches, and yields consistent and asymptotically normally distributed estimator of the regression parameter when censoring is independent of the covariates. The estimator demonstrates very high relative efficiency in simulations. When compared with complete-case analysis, the proposed estimator can be more efficient when the missing data are missing completely at random and can correct bias when the missing data are missing at random. The potential application of the proposed method to the generalized probit model with missing continuous covariates is also outlined. 相似文献

19.

Maximum likelihood estimation with missing spatial data and with an application to remotely sensed data

Robert Haining Daniel Griffith Robert Bennett 《统计学通讯:理论与方法》2013,42(5):1875-1894

The paper examines the small and large lattice properties of the exact maximum likelihood estimator for a spatial model where parameter estimation and missing data estimation are tackled simultaneously, A first order conditional autoregressive model is examined in detail. The paper concludes with an empirical analysis of remotely sensed data. 相似文献

20.

捕获-再捕获模型的统计学原理

胡桂华廖歆《统计与信息论坛》2012,27(9):8-13

捕获-再捕获模型由国外学者首创,最初用于野生动物总体规模估计,后来经过改进逐步应用于人口普查质量评估和其他统计领域。为了正确使用该模型,采取独一无二的方法,从试验背景、组格概率和边缘概率之间的关系、组格条件概率、条件多项分布和条件似然函数等方面对其进行全面解读和研究。研究表明:使用捕获-再捕获模型必须遵守三项理论原则:即总体封闭原则、个体同质原则、独立性原则;对实际问题与理论原则之间存在的差距必须做三件事情:即发现实际问题与理论原则之间的所有分歧点、评估各个分歧点问题的严重程度、寻找解决问题的办法。相似文献