期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study 总被引：3，自引：0，他引：3

Jerome P. Reiter 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2005,168(1):185-205

相似文献

2.

An application of information theory to the problem of statistical disclosure

Ove Frank 《Journal of statistical planning and inference》1978,2(2):143-152

By using prior knowledge it may be possible to deduce pieces of individual information from a frequency distribution of a population. If the prior information is described by a stochastic model, an information-theoretic approach can be applied in order to judge the possibilities for disclosure. By specifying the stochastic model in various ways it is shown how the decrease in entropy caused by the publication of a frequency distribution can be determined and interpreted. The stochastic models are also used to derive formulae for disclosure risks and expected numbers of disclosures. 相似文献

3.

Proposals for 2001 samples of anonymized records: An assessment of disclosure risk

Angela Dale & Mark Elliot 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(3):427-447

In 1991 Marsh and co-workers made the case for a sample of anonymized records (SAR) from the 1991 census of population. The case was accepted by the Office for National Statistics (then the Office of Population Censuses and Surveys) and a request was made by the Economic and Social Research Council to purchase the SARs. Two files were released for Great Britain—a 2% sample of individuals and a 1% sample of households. Subsequently similar samples were released for Northern Ireland. Since their release, the files have been heavily used for research and there has been no known breach of confidentiality. There is a considerable demand for similar files from the 2001 census, with specific requests for a larger sample size and lower population threshold for the individual SAR. This paper reassesses the analysis of Marsh and co-workers of the risk of identification of an individual or household in a sample of microdata from the 1991 census and also uses alternative ways of assessing risks with the 1991 SARs. The results of both the reassessment and the new analyses are reassuring and allow us to take the 1991 SARs as a base-line against which to assess proposals for changes to the size and structure of samples from the 2001 census. 相似文献

4.

A family of methods for statistical disclosure control

Andreas Quatember Monika Cornelia Hausner 《Journal of applied statistics》2013,40(2):337-346

Statistical disclosure control (SDC) is a balancing act between mandatory data protection and the comprehensible demand from researchers for access to original data. In this paper, a family of methods is defined to ‘mask’ sensitive variables before data files can be released. In the first step, the variable to be masked is ‘cloned’ (C). Then, the duplicated variable as a whole or just a part of it is ‘suppressed’ (S). The masking procedure's third step ‘imputes’ (I) data for these artificial missings. Then, the original variable can be deleted and its masked substitute has to serve as the basis for the analysis of data. The idea of this general ‘CSI framework’ is to open the wide field of imputation methods for SDC. The method applied in the I-step can make use of available auxiliary variables including the original variable. Different members of this family of methods delivering variance estimators are discussed in some detail. Furthermore, a simulation study analyzes various methods belonging to the family with respect to both, the quality of parameter estimation and privacy protection. Based on the results obtained, recommendations are formulated for different estimation tasks. 相似文献

5.

Linear sensitivity measures in statistical disclosure control

Lawrence H. Cox 《Journal of statistical planning and inference》1981,5(2):153-164

The mathematical properties of a class of functions called linear sensitivity measures are investigated. These measures are applied to the problem of maintaining the statistical confidentiality of respondents to a census or statistical survey such as an establishment-based economic survey. Sensitivity criteria in practical use are cast in this setting. 相似文献

6.

The probability of identification: applying ideas from forensic statistics to disclosure risk assessment

C. J. Skinner 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(1):195-212

Summary. The paper establishes a correspondence between statistical disclosure control and forensic statistics regarding their common use of the concept of 'probability of identification'. The paper then seeks to investigate what lessons for disclosure control can be learnt from the forensic identification literature. The main lesson that is considered is that disclosure risk assessment cannot, in general, ignore the search method that is employed by an intruder seeking to achieve disclosure. The effects of using several search methods are considered. Through consideration of the plausibility of assumptions and 'worst case' approaches, the paper suggests how the impact of search method can be handled. The paper focuses on foundations of disclosure risk assessment, providing some justification for some modelling assumptions underlying some existing record level measures of disclosure risk. The paper illustrates the effects of using various search methods in a numerical example based on microdata from a sample from the 2001 UK census. 相似文献

7.

Comparison of alternative imputation methods for ordinal data

Federica Cugnata 《统计学通讯:模拟与计算》2017,46(1):315-330

In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms. 相似文献

8.

Partial cell suppression: A new methodology for statistical disclosure control

Matteo Fischetti Juan-José Salazar-González 《Statistics and Computing》2003,13(1):13-21

In this paper we address the problem of protecting confidentiality in statistical tables containing sensitive information that cannot be disseminated. This is an issue of primary importance in practice. Cell Suppression is a widely-used technique for avoiding disclosure of sensitive information, which consists in suppressing all sensitive table entries along with a certain number of other entries, called complementary suppressions. Determining a pattern of complementary suppressions that minimizes the overall loss of information results into a difficult (i.e., -hard) optimization problem known as the Cell Suppression Problem. We propose here a different protection methodology consisting of replacing some table entries by appropriate intervals containing the actual value of the unpublished cells. We call this methodology Partial Cell Suppression, as opposed to the classical complete cell suppression. Partial cell suppression has the important advantage of reducing the overall information loss needed to protect the sensitive information. Also, the new method provides automatically auditing ranges for each unpublished cell, thus saving an often time-consuming task to the statistical office while increasing the information explicitly provided with the table. Moreover, we propose an efficient (i.e., polynomial-time) algorithm to find an optimal partial suppression solution. A preliminary computational comparison between partial and complete suppression methologies is reported, showing the advantages of the new approach. Finally, we address possible extensions leading to a unified complete/partial cell suppression framework. 相似文献

9.

Comparisons of imputation methods with application to assess factors associated with self efficacy of physical activity in breast cancer survivors

Yunxi Zhang Ye Lin George Baum Karen M. Basen-Engquist Michael D. Swartz 《统计学通讯:模拟与计算》2013,42(8):2523-2537

ABSTRACT

Missing data are commonly encountered in self-reported measurements and questionnaires. It is crucial to treat missing values using appropriate method to avoid bias and reduction of power. Various types of imputation methods exist, but it is not always clear which method is preferred for imputation of data with non-normal variables. In this paper, we compared four imputation methods: mean imputation, quantile imputation, multiple imputation, and quantile regression multiple imputation (QRMI), using both simulated and real data investigating factors affecting self-efficacy in breast cancer survivors. The results displayed an advantage of using multiple imputation, especially QRMI when data are not normal. 相似文献

10.

Multiple imputation for combining confidential data owned by two agencies

Christine N. Kohnen Jerome P. Reiter 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(2):511-528

Summary. Statistical agencies that own different databases on overlapping subjects can benefit greatly from combining their data. These benefits are passed on to secondary data analysts when the combined data are disseminated to the public. Sometimes combining data across agencies or sharing these data with the public is not possible: one or both of these actions may break promises of confidentiality that have been given to data subjects. We describe an approach that is based on two stages of multiple imputation that facilitates data sharing and dissemination under restrictions of confidentiality. We present new inferential methods that properly account for the uncertainty that is caused by the two stages of imputation. We illustrate the approach by using artificial and genuine data. 相似文献

11.

A measure of disclosure risk for microdata

C. J. Skinner M. J. Elliot 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2002,64(4):855-867

Summary. Protection against disclosure is important for statistical agencies releasing microdata files from sample surveys. Simple measures of disclosure risk can provide useful evidence to support decisions about release. We propose a new measure of disclosure risk: the probability that a unique match between a microdata record and a population unit is correct. We argue that this measure has at least two advantages. First, we suggest that it may be a more realistic measure of risk than two measures that are currently used with census data. Second, we show that consistent inference (in a specified sense) may be made about this measure from sample data without strong modelling assumptions. This is a surprising finding, in its contrast with the properties of the two 'similar' established measures. As a result, this measure has potentially useful applications to sample surveys. In addition to obtaining a simple consistent predictor of the measure, we propose a simple variance estimator and show that it is consistent. We also consider the extension of inference to allow for certain complex sampling schemes. We present a numerical study based on 1991 census data for about 450 000 enumerated individuals in one area of Great Britain. We show that the theoretical results on the properties of the point predictor of the measure of risk and its variance estimator hold to a good approximation for these data. 相似文献

12.

Multiple imputation for competing risks regression with interval-censored data

《Journal of Statistical Computation and Simulation》2012,82(11):2217-2228

ABSTRACT

We present here an extension of Pan's multiple imputation approach to Cox regression in the setting of interval-censored competing risks data. The idea is to convert interval-censored data into multiple sets of complete or right-censored data and to use partial likelihood methods to analyse them. The process is iterated, and at each step, the coefficient of interest, its variance–covariance matrix, and the baseline cumulative incidence function are updated from multiple posterior estimates derived from the Fine and Gray sub-distribution hazards regression given augmented data. Through simulation of patients at risks of failure from two causes, and following a prescheduled programme allowing for informative interval-censoring mechanisms, we show that the proposed method results in more accurate coefficient estimates as compared to the simple imputation approach. We have implemented the method in the MIICD R package, available on the CRAN website. 相似文献

13.

Multiple imputation for continuous variables using a Bayesian principal component analysis

《Journal of Statistical Computation and Simulation》2012,82(11):2140-2156

ABSTRACT

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage. 相似文献

14.

A Simple Method to Ensure Plausible Multiple Imputation for Continuous Multivariate Data

Shakir Hussain Mohammed A. Mohammed M. Sayeed Haque Roger Holder John Macleod Richard Hobbs 《统计学通讯:模拟与计算》2013,42(9):1779-1784

Multiple Imputation (MI) is an established approach for handling missing values. We show that MI for continuous data under the multivariate normal assumption is susceptible to generating implausible values. Our proposed remedy, is to: (1) transform the observed data into quantiles of the standard normal distribution; (2) obtain a functional relationship between the observed data and it's corresponding standard normal quantiles; (3) undertake MI using the quantiles produced in step 1; and finally, (4) use the functional relationship to transform the imputations into their original domain. In conclusion, our approach safeguards MI from imputing implausible values. 相似文献

15.

Latent class based multiple imputation approach for missing categorical data

Mulugeta Gebregziabher Stacia M. DeSantis 《Journal of statistical planning and inference》2010

In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered. 相似文献

16.

Multiple imputation methods for recurrent event data with missing event category

Douglas E. Schaubel Jianwen Cai 《Revue canadienne de statistique》2006,34(4):677-692

Frequently in clinical and epidemiologic studies, the event of interest is recurrent (i.e., can occur more than once per subject). When the events are not of the same type, an analysis which accounts for the fact that events fall into different categories will often be more informative. Often, however, although event times may always be known, information through which events are categorized may potentially be missing. Complete‐case methods (whose application may require, for example, that events be censored when their category cannot be determined) are valid only when event categories are missing completely at random. This assumption is rather restrictive. The authors propose two multiple imputation methods for analyzing multiple‐category recurrent event data under the proportional means/rates model. The use of a proper or improper imputation technique distinguishes the two approaches. Both methods lead to consistent estimation of regression parameters even when the missingness of event categories depends on covariates. The authors derive the asymptotic properties of the estimators and examine their behaviour in finite samples through simulation. They illustrate their approach using data from an international study on dialysis. 相似文献

17.

A comparison of multiple imputation and doubly robust estimation for analyses with missing data 总被引：1，自引：0，他引：1

James R. Carpenter Michael G. Kenward Stijn Vansteelandt 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(3):571-584

Summary. Multiple imputation is now a well-established technique for analysing data sets where some units have incomplete observations. Provided that the imputation model is correct, the resulting estimates are consistent. An alternative, weighting by the inverse probability of observing complete data on a unit, is conceptually simple and involves fewer modelling assumptions, but it is known to be both inefficient (relative to a fully parametric approach) and sensitive to the choice of weighting model. Over the last decade, there has been a considerable body of theoretical work to improve the performance of inverse probability weighting, leading to the development of 'doubly robust' or 'doubly protected' estimators. We present an intuitive review of these developments and contrast these estimators with multiple imputation from both a theoretical and a practical viewpoint. 相似文献

18.

Multiple imputation for the analysis of incomplete compound variables

Jiwei Zhao Richard J. Cook Changbao Wu 《Revue canadienne de statistique》2015,43(2):240-264

相似文献

19.

Study of an imputation algorithm for the analysis of interval-censored data

《Journal of Statistical Computation and Simulation》2012,82(3):477-490

In this article, an iterative single-point imputation (SPI) algorithm, called quantile-filling algorithm for the analysis of interval-censored data, is studied. This approach combines the simplicity of the SPI and the iterative thoughts of multiple imputation. The virtual complete data are imputed by conditional quantiles on the intervals. The algorithm convergence is based on the convergence of the moment estimation from the virtual complete data. Simulation studies have been carried out and the results are shown for interval-censored data generated from the Weibull distribution. For the Weibull distribution, complete procedures of the algorithm are shown in closed forms. Furthermore, the algorithm is applicable to the parameter inference with other distributions. From simulation studies, it has been found that the algorithm is feasible and stable. The estimation accuracy is also satisfactory. 相似文献

20.

A Simulation Study Comparing Multiple Imputation Methods for Incomplete Longitudinal Ordinal Data

A. F. Donneau M. Mauer G. Molenberghs A. Albert 《统计学通讯:模拟与计算》2015,44(5):1311-1338

Multiple imputation (MI) is now a reference solution for handling missing data. The default method for MI is the Multivariate Normal Imputation (MNI) algorithm that is based on the multivariate normal distribution. In the presence of longitudinal ordinal missing data, where the Gaussian assumption is no longer valid, application of the MNI method is questionable. This simulation study compares the performance of the MNI and ordinal imputation regression model for incomplete longitudinal ordinal data for situations covering various numbers of categories of the ordinal outcome, time occasions, sample sizes, rates of missingness, well-balanced, and skewed data. 相似文献