期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations

Cédric Béguin Beat Hulliger 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(2):275-294

Summary. As a part of the EUREDIT project new methods to detect multivariate outliers in incomplete survey data have been developed. These methods are the first to work with sampling weights and to be able to cope with missing values. Two of these methods are presented here. The epidemic algorithm simulates the propagation of a disease through a population and uses extreme infection times to find outlying observations. Transformed rank correlations are robust estimates of the centre and the scatter of the data. They use a geometric transformation that is based on the rank correlation matrix. The estimates are used to define a Mahalanobis distance that reveals outliers. The two methods are applied to a small data set and to one of the evaluation data sets of the EUREDIT project. 相似文献

2.

Evaluation of missing data mechanisms in two and three dimensional incomplete tables

Sayan Ghosh Palaniappan Vellaisamy 《Journal of the Korean Statistical Society》2019,48(2):297-313

The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies. 相似文献

3.

Estimation from incomplete data in longitudinal surveys

Randhir Singh 《Journal of statistical planning and inference》1985,11(2):163-170

In longitudinal surveys where a number of observations have to be made on the same sampling unit at specified time intervals, it is not uncommon that observations for some of the time stages for some of the sampled units are found missing. In the present investigation an estimation procedure for estimating the population total based on such incomplete data from multiple observations is suggested which makes use of all the available information and is seen to be more efficient than the one based on only completely observed units. Estimators are also proposed for two other situations; firstly when data is collected only for a sample of time stages and secondly when data is observed for only one time stage per sampled unit. 相似文献

4.

Learning causal structure from mixed data with missing values using Gaussian copula models

Cui Ruifei Groot Perry Heskes Tom 《Statistics and Computing》2019,29(2):311-333

We consider the problem of causal structure learning from data with missing values, assumed to be drawn from a Gaussian copula model. First, we extend the ‘Rank PC’ algorithm, designed for Gaussian copula models with purely continuous data (so-called nonparanormal models), to incomplete data by applying rank correlation to pairwise complete observations and replacing the sample size with an effective sample size in the conditional independence tests to account for the information loss from missing values. When the data are missing completely at random (MCAR), we provide an error bound on the accuracy of ‘Rank PC’ and show its high-dimensional consistency. However, when the data are missing at random (MAR), ‘Rank PC’ fails dramatically. Therefore, we propose a Gibbs sampling procedure to draw correlation matrix samples from mixed data that still works correctly under MAR. These samples are translated into an average correlation matrix and an effective sample size, resulting in the ‘Copula PC’ algorithm for incomplete data. Simulation study shows that: (1) ‘Copula PC’ estimates a more accurate correlation matrix and causal structure than ‘Rank PC’ under MCAR and, even more so, under MAR and (2) the usage of the effective sample size significantly improves the performance of ‘Rank PC’ and ‘Copula PC.’ We illustrate our methods on two real-world datasets: riboflavin production data and chronic fatigue syndrome data.

相似文献

5.

Estimation of parameters of a polynomial model under intra class correlation structure for incomplete longitudinal data

Anand K. Seth Sati Mazumdar 《统计学通讯:理论与方法》2013,42(5):1549-1559

The problem of analyzing and modeling incomplete longitudinal data arising from clinical and epidemiological studies are discussed, A method for handling arbitrarily missing observations under the intra class correlation structure and a polynomial model is developed. Explicit expressions for likelihood equations and information matrix for a second degree polynomial model are provided. The method is illustrated through an example. 相似文献

6.

The “Caterpillar”-SSA method for analysis of time series with missing values 总被引：1，自引：0，他引：1

N. Golyandina E. Osipov 《Journal of statistical planning and inference》2007

The paper concerns the problem of applying singular spectrum analysis to time series with missing data. A method of filling in the missing data is proposed and is applied to time series of finite rank. Conditions of exact reconstruction of missing data are developed and versions of the algorithm applicable to real-life time series are presented. The proposed algorithms result in the extraction of additive components of time series such as trends and periodic components, with simultaneous filling in of the missing data. An example is presented. 相似文献

7.

Probability density estimation with data missing at random when covariables are present

Qihua Wang 《Journal of statistical planning and inference》2008

This paper addresses the problem of the probability density estimation in the presence of covariates when data are missing at random (MAR). The inverse probability weighted method is used to define a nonparametric and a semiparametric weighted probability density estimators. A regression calibration technique is also used to define an imputed estimator. It is shown that all the estimators are asymptotically normal with the same asymptotic variance as that of the inverse probability weighted estimator with known selection probability function and weights. Also, we establish the mean squared error (MSE) bounds and obtain the MSE convergence rates. A simulation is carried out to assess the proposed estimators in terms of the bias and standard error. 相似文献

8.

Efficiency gains due to using missing data procedures in regression models

Theo Nijman Franz Palm 《Statistical Papers》1988,29(1):249-256

The problem of missing observations in regression models is often solved by using imputed values to complete the sample. As an alternative for static models, it has been suggested to limit the analysis to the periods or units for which all relevant variables are observed. The choice of an imputation procedure affects the asymptotic efficiency of the method used to subsequently estimate the parameters of the model. In this note, we show that the relative asymptotic efficiency of three estimators designed to handle incomplete samples depends on parameters that have a straightforward statistical interpretation. In terms of a gain of asymptotic efficiency, the use of these estimators is equivalent to the observation of a percentage of the values which are actually missing. This percentage depends on three R²-measures only, which can be straightforwardly computed in applied work. Therefore it should be easy in practice to check whether it is worthwhile to use a more elaborate estimator. 相似文献

9.

Percentage points of the statistics for testing hypotheses on mean vectors of multivariate normal distributions with missing observations

《Journal of Statistical Computation and Simulation》2012,82(3):211-224

A problem of testing of hypotheses on the mean vector of a multivariate normal distribution with unknown and positive definite covariance matrix is considered when a sample with a special, though not unusual, pattern of missing observations from that population is available. The approximate percentage points of the test statistic are obtained and their accuracy has been checked by comparing them with some exact percentage points which are calculated for complete samples and some special incomplete samples. The approximate percentage points are in good agreement with exact percentage points. The above work is extended to the problem of testing the hypothesis of equality of two mean vectors of two multivariate normal distributions with the same, unknown covariance matrix 相似文献

10.

Effect of correlation on the estimation of a mean in the presence of spurious observations

Irwin Guttman G. C. Tiao 《Revue canadienne de statistique》1978,6(2):229-247

This paper examines the effect of correlation of observations on estimators of a mean which are designed to guard against the possibility of spurious observations (that is, observations generated in a manner not intended). The mean squared error, premium and protection of these estimators are evaluated and discussed for some specific correlation structures. 相似文献

11.

Optimum covariate designs in partially balanced incomplete block (PBIB) design set-ups

Ganesh Dutta Premadhis Das Nripes K. Mandal 《Journal of statistical planning and inference》2009

The use of covariates in block designs is necessary when the covariates cannot be controlled like the blocking factor in the experiment. In this paper, we consider the situation where there is some flexibility for selection in the values of the covariates. The choice of values of the covariates for a given block design attaining minimum variance for estimation of each of the parameters has attracted attention in recent times. Optimum covariate designs in simple set-ups such as completely randomised design (CRD), randomised block design (RBD) and some series of balanced incomplete block design (BIBD) have already been considered. In this paper, optimum covariate designs have been considered for the more complex set-ups of different partially balanced incomplete block (PBIB) designs, which are popular among practitioners. The optimum covariate designs depend much on the methods of construction of the basic PBIB designs. Different combinatorial arrangements and tools such as orthogonal arrays, Hadamard matrices and different kinds of products of matrices viz. Khatri–Rao product, Kronecker product have been conveniently used to construct optimum covariate designs with as many covariates as possible. 相似文献

12.

A comparison of various software tools for dealing with missing data via imputation

《Journal of Statistical Computation and Simulation》2012,82(11):1653-1675

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect. 相似文献

13.

A minimax approach to missing values in linear regression

M. Jänner P. Stahlecker 《Statistical Papers》1993,34(1):247-261

We consider the problem of estimating the parameter vector in the linear model when observations on the independent variables are partially missing or incorrect. A new estimator is developed which systematically combines prior restrictions on the exogenous variables with the incomplete data. We compare this method with the alternative strategy of deleting missing values. 相似文献

14.

On equality of ordinary least squares estimator,best linear unbiased estimator and best linear unbiased predictor in the general linear model

Yonghui Liu 《Journal of statistical planning and inference》2009

The equality of ordinary least squares estimator (OLSE), best linear unbiased estimator (BLUE) and best linear unbiased predictor (BLUP) in the general linear model with new observations is investigated through matrix rank method, some new necessary and sufficient conditions are given. 相似文献

15.

A note on robustness of D-optimal block designs for two-colour microarray experiments

R.A. Bailey Katharina Schiffl Ralf-Dieter Hilgers 《Journal of statistical planning and inference》2013

Two-colour microarray experiments form an important tool in gene expression analysis. Due to the high risk of missing observations in microarray experiments, it is fundamental to concentrate not only on optimal designs but also on designs which are robust against missing observations. As an extension of Latif et al. (2009), we define the optimal breakdown number for a collection of designs to describe the robustness, and we calculate the breakdown number for various D-optimal block designs. We show that, for certain values of the numbers of treatments and arrays, the designs which are D-optimal have the highest breakdown number. Our calculations use methods from graph theory. 相似文献

16.

The two-sample problem with multivariate censored data: a new rank test family

Eve Leconte Thierry Moreau Joseph Lellouch 《统计学通讯:模拟与计算》2013,42(4):1061-1076

A new rank test family is proposed to test the equality of two multivariate failure times distributions with censored observations. The tests are very simple: they are based on a transformation of the multivariate rank vectors to a univariate rank score and the resulting statistics belong to the familiar class of the weighted logrank test statistics. The new procedure is also applicable to multivariate observations in general, such as repeated measures, some of which may be missing. To investigate the performance of the proposed tests, a simulation study was conducted with bivariate exponential models for various censoring rates. The size and power of these tests against Lehmann alternatives were compared to the size and power of two other tests (Wei and Lachin, 1984 and Wei and Knuiman, 1987). In all simulations the new procedures provide a relatively good power and an accurate control over the size of the test. A real example from the National Cooperative Gallstone Study is given 相似文献

17.

TESTS FOR INDEPENDENCE IN THE PRESENCE OF MISSING VALUES

L. J. Wei 《Australian & New Zealand Journal of Statistics》1983,25(1):85-90

The locally most powerful rank test is derived for testing independence of two random variables with possible missing observations on both responses. The test statistic has a simple form and can be easily obtained. For the purpose of practical use, the asymptotic null distribution of the test statistic is also provided. 相似文献

18.

Optimal saturated block designs when observations are correlated

Bo Jin J.P. Morgan 《Journal of statistical planning and inference》2008

A- and MV-optimal block designs are identified in the class of minimally connected designs when the observations within blocks are spatially correlated. All connected designs are shown to be D-equal regardless of the correlation structure, and a sufficient condition for E-optimality is presented. Earlier results for the uncorrelated case are strengthened. 相似文献

19.

Model-based clustering of multivariate skew data with circular components and missing values

Francesco Lagona Marco Picone 《Journal of applied statistics》2012,39(5):927-945

Motivated by classification issues that arise in marine studies, we propose a latent-class mixture model for the unsupervised classification of incomplete quadrivariate data with two linear and two circular components. The model integrates bivariate circular densities and bivariate skew normal densities to capture the association between toroidal clusters of bivariate circular observations and planar clusters of bivariate linear observations. Maximum-likelihood estimation of the model is facilitated by an expectation maximization (EM) algorithm that treats unknown class membership and missing values as different sources of incomplete information. The model is exploited on hourly observations of wind speed and direction and wave height and direction to identify a number of sea regimes, which represent specific distributional shapes that the data take under environmental latent conditions. 相似文献

20.

Model selection of generalized partially linear models with missing covariates

Ying-Zi FuXue-Dong Chen 《Journal of statistical planning and inference》2012,142(1):126-138

In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology. 相似文献