In a missing-data setting, we want to estimate the mean of a scalar outcome, based on a sample in which an explanatory variable is observed for every subject while responses are missing by happenstance for some of them. We consider two kinds of estimates of the mean response when the explanatory variable is functional. One is based on the average of the predicted values and the second one is a functional adaptation of the Horvitz–Thompson estimator. We show that the infinite dimensionality of the problem does not affect the rates of convergence by stating that the estimates are root-n consistent, under missing at random (MAR) assumption. These asymptotic features are completed by simulated experiments illustrating the easiness of implementation and the good behaviour on finite sample sizes of the method. This is the first paper emphasizing that the insensitiveness of averaged estimates, well known in multivariate non-parametric statistics, remains true for an infinite-dimensional covariable. In this sense, this work opens the way for various other results of this kind in functional data analysis.  相似文献   

In this article, based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained, when the responses of linear models are missing at random. It is proved that the proposed estimators are asymptotically normal. In simulation studies and real example, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.  相似文献   

In this paper, we investigate the asymptotic properties of the kernel estimator for non parametric regression operator when the functional stationary ergodic data with randomly censorship are considered. More precisely, we introduce the kernel-type estimator of the non parametric regression operator with the responses randomly censored and obtain the almost surely convergence with rate as well as the asymptotic normality of the estimator. As an application, the asymptotic (1 ? ζ) confidence interval of the regression operator is also presented (0 < ζ < 1). Finally, the simulation study is carried out to show the finite-sample performances of the estimator.  相似文献   

When responses are missing at random, we propose a semiparametric direct estimator for the missing probability and density-weighted average derivatives of a general nonparametric multiple regression function. An estimator for the normalized version of the weighted average derivatives is constructed as well using instrumental variables regression. The proposed estimators are computationally simple and asymptotically normal, and provide a solution to the problem of estimating index coefficients of single-index models with responses missing at random. The developed theory generalizes the method of the density-weighted average derivatives estimation of Powell et al. (1989) for the non-missing data case. Monte Carlo simulation studies are conducted to study the performance of the methods.  相似文献   

In this paper, we define the nonlinear wavelet estimator of density for the right censoring model with the censoring indicator missing at random (MAR), and develop its asymptotic expression for mean integrated squared error (MISE). Unlike for kernel estimator, the MISE expression of the estimator is not affected by the presence of discontinuities in the curve. Meanwhile, asymptotic normality of the estimator is established. The proposed estimator can reduce to the estimator defined by Li [Non-linear wavelet-based density estimators under random censorship. J Statist Plann Inference. 2003;117(1):35–58] when the censoring indicator MAR does not occur and a bandwidth in non-parametric estimation is close to zero. Also, we define another two nonlinear wavelet estimators of the density. A simulation is done to show the performance of the three proposed estimators.  相似文献   

To estimate parameters defined by estimating equations with covariates missing at random, we consider three bias-corrected nonparametric approaches based on inverse probability weighting, regression and augmented inverse probability weighting. However, when the dimension of covariates is not low, the estimation efficiency will be affected due to the curse of dimensionality. To address this issue, we propose a two-stage estimation procedure by using the dimension-reduced kernel estimation in conjunction with bias-corrected estimating equations. We show that the resulting three estimators are asymptotically equivalent and achieve the desirable properties. The impact of dimension reduction in nonparametric estimation of parameters is also investigated. The finite-sample performance of the proposed estimators is studied through simulation, and an application to an automobile data set is also presented.  相似文献   

In this paper, a regression semi-parametric model is considered where responses are assumed to be missing at random. From the empirical likelihood function defined based on the rank-based estimating equation, robust confidence intervals/regions of the true regression coefficient are derived. Monte Carlo simulation experiments show that the proposed approach provides more accurate confidence intervals/regions compared to its normal approximation counterpart under different model error structure. The approach is also compared with the least squares approach, and its superiority is shown whenever the error distribution in the simulation study is heavy tailed or contaminated. Finally, a real data example is given to illustrate our proposed method.  相似文献   

In this paper, we consider the problem of hazard rate estimation in the presence of covariates, for survival data with censoring indicators missing at random. We propose in the context usually denoted by MAR (missing at random, in opposition to MCAR, missing completely at random, which requires an additional independence assumption), nonparametric adaptive strategies based on model selection methods for estimators admitting finite dimensional developments in functional orthonormal bases. Theoretical risk bounds are provided, they prove that the estimators behave well in term of mean square integrated error (MISE). Simulation experiments illustrate the statistical procedure.  相似文献   

This paper considers the nonparametric inverse probability weighted estimation for functional data with missing response data at random. Under mild conditions, the asymptotic properties of the proposed estimation method are established. Based on the resampling method, the estimation of the asymptotic variance of the proposed estimator is obtained. Finally, the finite sample properties of the proposed estimation method are investigated via Monte Carlo simulation studies. A real data analysis is given to illustrate the use of the proposed method.  相似文献   

This paper investigates the estimations of regression parameters and response mean in nonlinear regression models in the presence of missing response variables that are missing with missingness probabilities depending on covariates. We propose four empirical likelihood (EL)-based estimators for the regression parameters and the response mean. The resulting estimators are shown to be consistent and asymptotically normal under some general assumptions. To construct the confidence regions for the regression parameters as well as the response mean, we develop four EL ratio statistics, which are proven to have the χ2 distribution asymptotically. Simulation studies and an artificial data set are used to illustrate the proposed methodologies. Empirical results show that the EL method behaves better than the normal approximation method and that the coverage probabilities and average lengths depend on the selection probability function.  相似文献   

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.  相似文献   

In a missing data setting, we have a sample in which a vector of explanatory variables ${\bf x}_i$ is observed for every subject i, while scalar responses $y_i$ are missing by happenstance on some individuals. In this work we propose robust estimators of the distribution of the responses assuming missing at random (MAR) data, under a semiparametric regression model. Our approach allows the consistent estimation of any weakly continuous functional of the response's distribution. In particular, strongly consistent estimators of any continuous location functional, such as the median, L‐functionals and M‐functionals, are proposed. A robust fit for the regression model combined with the robust properties of the location functional gives rise to a robust recipe for estimating the location parameter. Robustness is quantified through the breakdown point of the proposed procedure. The asymptotic distribution of the location estimators is also derived. The proofs of the theorems are presented in Supplementary Material available online. The Canadian Journal of Statistics 41: 111–132; 2013 © 2012 Statistical Society of Canada  相似文献   

Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi-randomization-based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao–Blackwell method, given cell-homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.  相似文献   

Xu Guo  Yiping Yang  Wangli Xu 《Statistics》2015,49(3):588-601
In this paper, we investigate the empirical-likelihood-based inference for the construction of confidence intervals and regions of the parameters of interest in single index models with missing covariates at random. An augmented inverse probability weighted-type empirical likelihood ratio for the parameters of interest is defined such that this ratio is asymptotically standard chi-squared. Our approach is to directly calibrate the empirical log-likelihood ratio, and does not need multiplication by an adjustment factor for the original ratio. Our bias-corrected empirical likelihood is self-scale invariant and no plug-in estimator for the limiting variance is needed. Some simulation studies are carried out to assess the performance of our proposed method.  相似文献   

In this paper, a nonlinear model with response variables missing at random is studied. In order to improve the coverage accuracy for model parameters, the empirical likelihood (EL) ratio method is considered. On the complete data, the EL statistic for the parameters and its approximation have a χ2 asymptotic distribution. When the responses are reconstituted using a semi-parametric method, the empirical log-likelihood on the response variables associated with the imputed data is also asymptotically χ2. The Wilks theorem for EL on the parameters, based on reconstituted data, is also satisfied. These results can be used to construct the confidence region for the model parameters and the response variables. It is shown via Monte Carlo simulations that the EL methods outperform the normal approximation-based method in terms of coverage probability for the unknown parameter, including on the reconstituted data. The advantages of the proposed method are exemplified on real data.  相似文献   

Missing values are common in longitudinal data studies. The missing data mechanism is termed non-ignorable (NI) if the probability of missingness depends on the non-response (missing) observations. This paper presents a model for the ordinal categorical longitudinal data with NI non-monotone missing values. We assumed two separate models for the response and missing procedure. The response is modeled as ordinal logistic, whereas the logistic binary model is considered for the missing process. We employ these models in the context of so-called shared-parameter models, where the outcome and missing data models are connected by a common set of random effects. It is commonly assumed that the random effect follows the normal distribution in longitudinal data with or without missing data. This can be extremely restrictive in practice, and it may result in misleading statistical inferences. In this paper, we instead adopt a more flexible alternative distribution which is called the skew-normal distribution. The methodology is illustrated through an application to Schizophrenia Collaborative Study data [19 D. Hedeker, Generalized linear mixed models, in Encyclopedia of Statistics in Behavioral Science, B. Everitt and D. Howell, eds., John Wiley, London, 2005, pp. 729738. [Google Scholar]] and a simulation.  相似文献   

When data are outcome-dependent non response, pseudo-likelihood yields consistent regression coefficients without specifying the missing data mechanism. However, it is onerous to derive parameter estimators including their standard errors from the regression coefficients under pseudo-likelihood (PL). The present study applies an imputation method to compute the asymptotic standard errors of parameter estimators. The proposed method is simpler than Delta method and it showed similar effect size of the standard errors to bootstrapping in simulation and application studies.  相似文献   

The multivariate t linear mixed model (MtLMM) has been recently proposed as a robust tool for analysing multivariate longitudinal data with atypical observations. Missing outcomes frequently occur in longitudinal research even in well controlled situations. As a powerful alternative to the traditional expectation maximization based algorithm employing single imputation, we consider a Bayesian analysis of the MtLMM to account for the uncertainties of model parameters and missing outcomes through multiple imputation. An inverse Bayes formulas sampler coupled with Metropolis-within-Gibbs scheme is used to effectively draw the posterior distributions of latent data and model parameters. The techniques for multiple imputation of missing values, estimation of random effects, prediction of future responses, and diagnostics of potential outliers are investigated as well. The proposed methodology is illustrated through a simulation study and an application to AIDS/HIV data.  相似文献   

Penalized methods for variable selection such as the Smoothly Clipped Absolute Deviation penalty have been increasingly applied to aid variable section in regression analysis. Much of the literature has focused on parametric models, while a few recent studies have shifted the focus and developed their applications for the popular semi-parametric, or distribution-free, generalized estimating equations (GEEs) and weighted GEE (WGEE). However, although the WGEE is composed of one main and one missing-data module, available methods only focus on the main module, with no variable selection for the missing-data module. In this paper, we develop a new approach to further extend the existing methods to enable variable selection for both modules. The approach is illustrated by both real and simulated study data.  相似文献   

