首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A general nonparametric imputation procedure, based on kernel regression, is proposed to estimate points as well as set- and function-indexed parameters when the data are missing at random (MAR). The proposed method works by imputing a specific function of a missing value (and not the missing value itself), where the form of this specific function is dictated by the parameter of interest. Both single and multiple imputations are considered. The associated empirical processes provide the right tool to study the uniform convergence properties of the resulting estimators. Our estimators include, as special cases, the imputation estimator of the mean, the estimator of the distribution function proposed by Cheng and Chu [1996. Kernel estimation of distribution functions and quantiles with missing data. Statist. Sinica 6, 63–78], imputation estimators of a marginal density, and imputation estimators of regression functions.  相似文献   

2.
It is quite appealing to extend existing theories in classical linear models to correlated responses where linear mixed-effects models are utilized and the dependency in the data is modeled by random effects. In the mixed modeling framework, missing values occur naturally due to dropouts or non-responses, which is frequently encountered when dealing with real data. Motivated by such problems, we aim to investigate the estimation and model selection performance in linear mixed models when missing data are present. Inspired by the property of the indicator function for missingness and its relation to missing rates, we propose an approach that records missingness in an indicator-based matrix and derive the likelihood-based estimators for all parameters involved in the linear mixed-effects models. Based on the proposed method for estimation, we explore the relationship between estimation and selection behavior over missing rates. Simulations and a real data application are conducted for illustrating the effectiveness of the proposed method in selecting the most appropriate model and in estimating parameters.  相似文献   

3.
In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology.  相似文献   

4.
The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method can be useful when we can make first and second moment assumptions only; in addition, it can be helpful when the EM algorithm applied to the actual likelihood becomes overly complicated. First we derive a loss function for the observed data using an exponential family density which has the same mean and variance structure of the complete data. This loss function is the counterpart of the quasi-deviance for the observed data. Then the loss function is minimized using the EM algorithm. The use of the EM algorithm guarantees a decrease in the loss function at every iteration. When the observed data can be expressed as a deterministic linear transformation of the complete data, or when data are missing completely at random, the proposed method yields consistent estimators. Examples are given for overdispersed polytomous data, linear random effects models, and linear regression with missing covariates. Simulation results for the linear regression model with missing covariates show that the proposed estimates are more efficient than estimates based on completely observed units, even when outcomes are bimodal or skewed.  相似文献   

5.
Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model.  相似文献   

6.
We propose a profile conditional likelihood approach to handle missing covariates in the general semiparametric transformation regression model. The method estimates the marginal survival function by the Kaplan-Meier estimator, and then estimates the parameters of the survival model and the covariate distribution from a conditional likelihood, substituting the Kaplan-Meier estimator for the marginal survival function in the conditional likelihood. This method is simpler than full maximum likelihood approaches, and yields consistent and asymptotically normally distributed estimator of the regression parameter when censoring is independent of the covariates. The estimator demonstrates very high relative efficiency in simulations. When compared with complete-case analysis, the proposed estimator can be more efficient when the missing data are missing completely at random and can correct bias when the missing data are missing at random. The potential application of the proposed method to the generalized probit model with missing continuous covariates is also outlined.  相似文献   

7.
In this paper, a regression semi-parametric model is considered where responses are assumed to be missing at random. From the empirical likelihood function defined based on the rank-based estimating equation, robust confidence intervals/regions of the true regression coefficient are derived. Monte Carlo simulation experiments show that the proposed approach provides more accurate confidence intervals/regions compared to its normal approximation counterpart under different model error structure. The approach is also compared with the least squares approach, and its superiority is shown whenever the error distribution in the simulation study is heavy tailed or contaminated. Finally, a real data example is given to illustrate our proposed method.  相似文献   

8.
In this paper, we develop Bayesian methodology and computational algorithms for variable subset selection in Cox proportional hazards models with missing covariate data. A new joint semi-conjugate prior for the piecewise exponential model is proposed in the presence of missing covariates and its properties are examined. The covariates are assumed to be missing at random (MAR). Under this new prior, a version of the Deviance Information Criterion (DIC) is proposed for Bayesian variable subset selection in the presence of missing covariates. Monte Carlo methods are developed for computing the DICs for all possible subset models in the model space. A Bone Marrow Transplant (BMT) dataset is used to illustrate the proposed methodology.  相似文献   

9.
In this paper, we develop a semiparametric regression model for longitudinal skewed data. In the new model, we allow the transformation function and the baseline function to be unknown. The proposed model can provide a much broader class of models than the existing additive and multiplicative models. Our estimators for regression parameters, transformation function and baseline function are asymptotically normal. Particularly, the estimator for the transformation function converges to its true value at the rate n ? 1 ∕ 2, the convergence rate that one could expect for a parametric model. In simulation studies, we demonstrate that the proposed semiparametric method is robust with little loss of efficiency. Finally, we apply the new method to a study on longitudinal health care costs.  相似文献   

10.
Abstract

A method is proposed for the estimation of missing data in analysis of covariance models. This is based on obtaining an estimate of the missing observation that minimizes the error sum of squares. Specific derivation of this estimate is carried out for the one-factor analysis of covariance, and numerical examples are given to show the nature of the estimates produced. Parameter estimates of the imputed data are then compared with those of the incomplete data.  相似文献   

11.
In longitudinal data, missing observations occur commonly with incomplete responses and covariates. Missing data can have a ‘missing not at random’ mechanism, a non‐monotone missing pattern, and moreover response and covariates can be missing not simultaneously. To avoid complexities in both modelling and computation, a two‐stage estimation method and a pairwise‐likelihood method are proposed. The two‐stage estimation method enjoys simplicities in computation, but incurs more severe efficiency loss. On the other hand, the pairwise approach leads to estimators with better efficiency, but can be cumbersome in computation. In this paper, we develop a compromise method using a hybrid pairwise‐likelihood framework. Our proposed approach has better efficiency than the two‐stage method, but its computational cost is still reasonable compared to the pairwise approach. The performance of the methods is evaluated empirically by means of simulation studies. Our methods are used to analyse longitudinal data obtained from the National Population Health Study.  相似文献   

12.
Estimating a Convex Function in Nonparametric Regression   总被引:1,自引:0,他引:1  
Abstract.  A new nonparametric estimate of a convex regression function is proposed and its stochastic properties are studied. The method starts with an unconstrained estimate of the derivative of the regression function, which is firstly isotonized and then integrated. We prove asymptotic normality of the new estimate and show that it is first order asymptotically equivalent to the initial unconstrained estimate if the regression function is in fact convex. If convexity is not present, the method estimates a convex function whose derivative has the same L p -norm as the derivative of the (non-convex) underlying regression function. The finite sample properties of the new estimate are investigated by means of a simulation study and it is compared with a least squares approach of convex estimation. The application of the new method is demonstrated in two data examples.  相似文献   

13.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

14.
Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   

15.
The EM algorithm is a popular method for computing maximum likelihood estimates or posterior modes in models that can be formulated in terms of missing data or latent structure. Although easy implementation and stable convergence help to explain the popularity of the algorithm, its convergence is sometimes notoriously slow. In recent years, however, various adaptations have significantly improved the speed of EM while maintaining its stability and simplicity. One especially successful method for maximum likelihood is known as the parameter expanded EM or PXEM algorithm. Unfortunately, PXEM does not generally have a closed form M-step when computing posterior modes, even when the corresponding EM algorithm is in closed form. In this paper we confront this problem by adapting the one-step-late EM algorithm to PXEM to establish a fast closed form algorithm that improves on the one-step-late EM algorithm by insuring monotone convergence. We use this algorithm to fit a probit regression model and a variety of dynamic linear models, showing computational savings of as much as 99.9%, with the biggest savings occurring when the EM algorithm is the slowest to converge.  相似文献   

16.
A class of bivariate continuous-discrete distributions is proposed to fit Poisson dynamic models in a single unified framework via bivariate mixture transition distributions (BMTDs). Potential advantages of this class over the current models include its ability to capture stretches, bursts and nonlinear patterns characterized by Internet network traffic, high-frequency financial data and many others. It models the inter-arrival times and the number of arrivals (marks) in a single unified model which benefits from the dependence structure of the data. The continuous marginal distributions of this class include as special cases the exponential, gamma, Weibull and Rayleigh distributions (for the inter-arrival times), whereas the discrete marginal distributions are geometric and negative binomial. The conditional distributions are Poisson and Erlang. Maximum-likelihood estimation is discussed and parameter estimates are obtained using an expectation–maximization algorithm, while the standard errors are estimated using the missing information principle. It is shown via real data examples that the proposed BMTD models appear to capture data features better than other competing models.  相似文献   

17.
In this paper, we study estimation of linear models in the framework of longitudinal data with dropouts. Under the assumptions that random errors follow an elliptical distribution and all the subjects share the same within-subject covariance matrix which does not depend on covariates, we develop a robust method for simultaneous estimation of mean and covariance. The proposed method is robust against outliers, and does not require to model the covariance and missing data process. Theoretical properties of the proposed estimator are established and simulation studies show its good performance. In the end, the proposed method is applied to a real data analysis for illustration.  相似文献   

18.
We propose an 1-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data.  相似文献   

19.
In recent years much effort has been devoted to maximum likelihood estimation of generalized linear mixed models. Most of the existing methods use the EM algorithm, with various techniques in handling the intractable E-step. In this paper, a new implementation of a stochastic approximation algorithm with Markov chain Monte Carlo method is investigated. The proposed algorithm is computationally straightforward and its convergence is guaranteed. A simulation and three real data sets, including the challenging salamander data, are used to illustrate the procedure and to compare it with some existing methods. The results indicate that the proposed algorithm is an attractive alternative for problems with a large number of random effects or with high dimensional intractable integrals in the likelihood function.  相似文献   

20.
Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号