首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Cui  Ruifei  Groot  Perry  Heskes  Tom 《Statistics and Computing》2019,29(2):311-333

We consider the problem of causal structure learning from data with missing values, assumed to be drawn from a Gaussian copula model. First, we extend the ‘Rank PC’ algorithm, designed for Gaussian copula models with purely continuous data (so-called nonparanormal models), to incomplete data by applying rank correlation to pairwise complete observations and replacing the sample size with an effective sample size in the conditional independence tests to account for the information loss from missing values. When the data are missing completely at random (MCAR), we provide an error bound on the accuracy of ‘Rank PC’ and show its high-dimensional consistency. However, when the data are missing at random (MAR), ‘Rank PC’ fails dramatically. Therefore, we propose a Gibbs sampling procedure to draw correlation matrix samples from mixed data that still works correctly under MAR. These samples are translated into an average correlation matrix and an effective sample size, resulting in the ‘Copula PC’ algorithm for incomplete data. Simulation study shows that: (1) ‘Copula PC’ estimates a more accurate correlation matrix and causal structure than ‘Rank PC’ under MCAR and, even more so, under MAR and (2) the usage of the effective sample size significantly improves the performance of ‘Rank PC’ and ‘Copula PC.’ We illustrate our methods on two real-world datasets: riboflavin production data and chronic fatigue syndrome data.

  相似文献   

2.
For the Poisson a posterior distribution for the complete sample size, N, is derived from an incomplete sample when any specified subset of the classes are missing.Means as well as other posterior characteristics of N are obtained for two examples with various classes removed. For the special case of a truncated ‘missing zero class’ Poisson sample a simulation experiment is performed for the small ‘N=25’ sample situation applying both Bayesian and maximum likelihood methods of estimation.  相似文献   

3.
We propose a new regression-based filter for extracting signals online from multivariate high frequency time series. It separates relevant signals of several variables from noise and (multivariate) outliers.

Unlike parallel univariate filters, the new procedure takes into account the local covariance structure between the single time series components. It is based on high-breakdown estimates, which makes it robust against (patches of) outliers in one or several of the components as well as against outliers with respect to the multivariate covariance structure. Moreover, the trade-off problem between bias and variance for the optimal choice of the window width is approached by choosing the size of the window adaptively, depending on the current data situation.

Furthermore, we present an advanced algorithm of our filtering procedure that includes the replacement of missing observations in real time. Thus, the new procedure can be applied in online-monitoring practice. Applications to physiological time series from intensive care show the practical effect of the proposed filtering technique.  相似文献   

4.
Abstract

A method is proposed for the estimation of missing data in analysis of covariance models. This is based on obtaining an estimate of the missing observation that minimizes the error sum of squares. Specific derivation of this estimate is carried out for the one-factor analysis of covariance, and numerical examples are given to show the nature of the estimates produced. Parameter estimates of the imputed data are then compared with those of the incomplete data.  相似文献   

5.
Models for geostatistical data introduce spatial dependence in the covariance matrix of location-specific random effects. This is usually defined to be a parametric function of the distances between locations. Bayesian formulations of such models overcome asymptotic inference and estimation problems involved in maximum likelihood-based approaches and can be fitted using Markov chain Monte Carlo (MCMC) simulation. The MCMC implementation, however, requires repeated inversions of the covariance matrix which makes the problem computationally intensive, especially for large number of locations. In the present work, we propose to convert the spatial covariance matrix to a sparse matrix and compare a number of numerical algorithms especially suited within the MCMC framework in order to accelerate large matrix inversion. The algorithms are assessed empirically on simulated datasets of different size and sparsity. We conclude that the band solver applied after ordering the distance matrix reduces the computational time in inverting covariance matrices substantially.  相似文献   

6.
A. Roy  D. Klein 《Statistics》2018,52(2):393-408
Testing hypotheses about the structure of a covariance matrix for doubly multivariate data is often considered in the literature. In this paper the Rao's score test (RST) is derived to test the block exchangeable covariance matrix or block compound symmetry (BCS) covariance structure under the assumption of multivariate normality. It is shown that the empirical distribution of the RST statistic under the null hypothesis is independent of the true values of the mean and the matrix components of a BCS structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Simulation studies are performed for the sample size consideration, and for the estimation of the empirical quantiles of the null distribution of the test statistic. The RST procedure is illustrated on a real data set from the medical studies.  相似文献   

7.
We consider the study of censored survival times in the situation where the available data consist of both eligible and ineligible subjects, and information distinguishing the two groups is sometimes missing. A complete-case analysis in this context would use only subjects known to be eligible, resulting in inefficient and potentially biased estimators. We propose a two-step procedure which resembles the EM algorithm but is computationally much faster. In the first step, one estimates the conditional expectation of the missing eligibility indicators given the observed data using a logistic regression based on the complete cases (i.e., subjects with non-missing eligibility indicator). In the second step, maximum likelihood estimators are obtained from a weighted Cox proportional hazards model, with the weights being either observed eligibility indicators or estimated conditional expectations thereof. Under ignorable missingness, the estimators from the second step are proven to be consistent and asymptotically normal, with explicit variance estimators. We demonstrate through simulation that the proposed methods perform well for moderate sized samples and are robust in the presence of eligibility indicators that are missing not at random. The proposed procedure is more efficient and more robust than the complete case analysis and, unlike the EM algorithm, does not require time-consuming iteration. Although the proposed methods are applicable generally, they would be most useful for large data sets (e.g., administrative data), for which the computational savings outweigh the price one has to pay for making various approximations in avoiding iteration. We apply the proposed methods to national kidney transplant registry data.  相似文献   

8.
Inverse probability weighting (IPW) can deal with confounding in non randomized studies. The inverse weights are probabilities of treatment assignment (propensity scores), estimated by regressing assignment on predictors. Problems arise if predictors can be missing. Solutions previously proposed include assuming assignment depends only on observed predictors and multiple imputation (MI) of missing predictors. For the MI approach, it was recommended that missingness indicators be used with the other predictors. We determine when the two MI approaches, (with/without missingness indicators) yield consistent estimators and compare their efficiencies.We find that, although including indicators can reduce bias when predictors are missing not at random, it can induce bias when they are missing at random. We propose a consistent variance estimator and investigate performance of the simpler Rubin’s Rules variance estimator. In simulations we find both estimators perform well. IPW is also used to correct bias when an analysis model is fitted to incomplete data by restricting to complete cases. Here, weights are inverse probabilities of being a complete case. We explain how the same MI methods can be used in this situation to deal with missing predictors in the weight model, and illustrate this approach using data from the National Child Development Survey.  相似文献   

9.
This article is concerned with statistically and computationally efficient estimation in a hierarchical data setting with unequal cluster sizes and an AR(1) covariance structure. Maximum likelihood estimation for AR(1) requires numerical iteration when cluster sizes are unequal. A near optimal non-iterative procedure is proposed. Pseudo-likelihood and split-sample methods are used, resulting in computing weights to combine cluster size specific parameter estimates. Results show that the method is statistically nearly as efficient as maximum likelihood, but shows great savings in computation time.  相似文献   

10.
A maximum likelihood estimation procedure is presented for the frailty model. The procedure is based on a stochastic Expectation Maximization algorithm which converges quickly to the maximum likelihood estimate. The usual expectation step is replaced by a stochastic approximation of the complete log-likelihood using simulated values of unobserved frailties whereas the maximization step follows the same lines as those of the Expectation Maximization algorithm. The procedure allows to obtain at the same time estimations of the marginal likelihood and of the observed Fisher information matrix. Moreover, this stochastic Expectation Maximization algorithm requires less computation time. A wide variety of multivariate frailty models without any assumption on the covariance structure can be studied. To illustrate this procedure, a Gaussian frailty model with two frailty terms is introduced. The numerical results based on simulated data and on real bladder cancer data are more accurate than those obtained by using the Expectation Maximization Laplace algorithm and the Monte-Carlo Expectation Maximization one. Finally, since frailty models are used in many fields such as ecology, biology, economy, …, the proposed algorithm has a wide spectrum of applications.  相似文献   

11.
Clustered longitudinal data feature cross‐sectional associations within clusters, serial dependence within subjects, and associations between responses at different time points from different subjects within the same cluster. Generalized estimating equations are often used for inference with data of this sort since they do not require full specification of the response model. When data are incomplete, however, they require data to be missing completely at random unless inverse probability weights are introduced based on a model for the missing data process. The authors propose a robust approach for incomplete clustered longitudinal data using composite likelihood. Specifically, pairwise likelihood methods are described for conducting robust estimation with minimal model assumptions made. The authors also show that the resulting estimates remain valid for a wide variety of missing data problems including missing at random mechanisms and so in such cases there is no need to model the missing data process. In addition to describing the asymptotic properties of the resulting estimators, it is shown that the method performs well empirically through simulation studies for complete and incomplete data. Pairwise likelihood estimators are also compared with estimators obtained from inverse probability weighted alternating logistic regression. An application to data from the Waterloo Smoking Prevention Project is provided for illustration. The Canadian Journal of Statistics 39: 34–51; 2011 © 2010 Statistical Society of Canada  相似文献   

12.
Summary. The maximum likelihood estimator (MLE) for the proportional hazards model with partly interval-censored data is studied. Under appropriate regularity conditions, the MLEs of the regression parameter and the cumulative hazard function are shown to be consistent and asymptotically normal. Two methods to estimate the variance–covariance matrix of the MLE of the regression parameter are considered, based on a generalized missing information principle and on a generalized profile information procedure. Simulation studies show that both methods work well in terms of the bias and variance for samples of moderate size. An example illustrates the methods.  相似文献   

13.
This paper presents missing data methods for repeated measures data in small samples. Most methods currently available are for large samples. In particular, no studies have compared the performance of multiple imputation methods to that of non-imputation incomplete analysis methods. We first develop a strategy for multiple imputations for repeated measures data under a cell-means model that is applicable for any multivariate data with small samples. Multiple imputation inference procedures are applied to the resulting multiply imputed complete data sets. Comparisons to other available non-imputation incomplete data methods is made via simulation studies to conclude that there is not much gain in using the computer intensive multiple imputation methods for small sample repeated measures data analysis in terms of the power of testing hypotheses of parameters of interest.  相似文献   

14.
The occurrence of missing data is an often unavoidable consequence of repeated measures studies. Fortunately, multivariate general linear models such as growth curve models and linear mixed models with random effects have been well developed to analyze incomplete normally-distributed repeated measures data. Most statistical methods have assumed that the missing data occur at random. This assumption may include two types of missing data mechanism: missing completely at random (MCAR) and missing at random (MAR) in the sense of Rubin (1976). In this paper, we develop a test procedure for distinguishing these two types of missing data mechanism for incomplete normally-distributed repeated measures data. The proposed test is similar in spiril to the test of Park and Davis (1992). We derive the test for incomplete normally-distribrlted repeated measures data using linear mixed models. while Park and Davis (1992) cleirved thr test for incomplete repeatctl categorical data in the framework of Grizzle Starmer. and Koch (1969). Thr proposed procedure can be applied easily to any other multivariate general linear model which allow for missing data. The test is illustrated using the hip-replacernent patient.data from Crowder and Hand (1990).  相似文献   

15.
We consider regression analysis when part of covariates are incomplete in generalized linear models. The incomplete covariates could be due to measurement error or missing for some study subjects. We assume there exists a validation sample in which the data is complete and is a simple random subsample from the whole sample. Based on the idea of projection-solution method in Heyde (1997, Quasi-Likelihood and its Applications: A General Approach to Optimal Parameter Estimation. Springer, New York), a class of estimating functions is proposed to estimate the regression coefficients through the whole data. This method does not need to specify a correct parametric model for the incomplete covariates to yield a consistent estimate, and avoids the ‘curse of dimensionality’ encountered in the existing semiparametric method. Simulation results shows that the finite sample performance and efficiency property of the proposed estimates are satisfactory. Also this approach is computationally convenient hence can be applied to daily data analysis.  相似文献   

16.
Ibrahim (1990) used the EM-algorithm to obtain maximum likelihood estimates of the regression parameters in generalized linear models with partially missing covariates. The technique was termed EM by the method of weights. In this paper, we generalize this technique to Cox regression analysis with missing values in the covariates. We specify a full model letting the unobserved covariate values be random and then maximize the observed likelihood. The asymptotic covariance matrix is estimated by the inverse information matrix. The missing data are allowed to be missing at random but also the non-ignorable non-response situation may in principle be considered. Simulation studies indicate that the proposed method is more efficient than the method suggested by Paik & Tsai (1997). We apply the procedure to a clinical trials example with six covariates with three of them having missing values.  相似文献   

17.
In modeling complex longitudinal data, semiparametric nonlinear mixed-effects (SNLME) models are very flexible and useful. Covariates are often introduced in the models to partially explain the inter-individual variations. In practice, data are often incomplete in the sense that there are often measurement errors and missing data in longitudinal studies. The likelihood method is a standard approach for inference for these models but it can be computationally very challenging, so computationally efficient approximate methods are quite valuable. However, the performance of these approximate methods is often based on limited simulation studies, and theoretical results are unavailable for many approximate methods. In this article, we consider a computationally efficient approximate method for a class of SNLME models with incomplete data and investigate its theoretical properties. We show that the estimates based on the approximate method are consistent and asymptotically normally distributed.  相似文献   

18.
A general framework is proposed for joint modelling of mixed correlated ordinal and continuous responses with missing values for responses, where the missing mechanism for both kinds of responses is also considered. Considering the posterior distribution of unknowns given all available information, a Markov Chain Monte Carlo sampling algorithm via winBUGS is used for estimating the posterior distribution of the parameters. For sensitivity analysis to investigate the perturbation from missing at random to not missing at random, it is shown how one can use some elements of covariance structure. These elements associate responses and their missing mechanisms. Influence of small perturbation of these elements on posterior displacement and posterior estimates is also studied. The model is illustrated using data from a foreign language achievement study.  相似文献   

19.
The asymptotic variance of the maximum likelihood estimate is proved to decrease when the maximization is restricted to a subspace that contains the true parameter value. Maximum likelihood estimation allows a systematic fitting of covariance models to the sample, which is important in data assimilation. The hierarchical maximum likelihood approach is applied to the spectral diagonal covariance model with different parameterizations of eigenvalue decay, and to the sparse inverse covariance model with specified parameter values on different sets of nonzero entries. It is shown computationally that using smaller sets of parameters can decrease the sampling noise in high dimension substantially.  相似文献   

20.
Previous simulations have reported second order missing data estimators to be superior to the more straightforward first order procedures such as mean value replacement. These simulations however were based on deterministic comparisonsbetween regression criteria even though simulated sampling is a random procedure. In this paper a simulation structured asan experimental design allows statistical testing of the various missing data estimators for the various regression criteria as well as different regression specifications. Our results indicate that although no missing data estimator is globally best many of the computationally simpler first order methods perform as well as the more expensive higher order estimators, contrary to some previous findings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号