首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 865 毫秒
1.
There is a growing demand for public use data while at the same time there are increasing concerns about the privacy of personal information. One proposed method for accomplishing both goals is to release data sets that do not contain real values but yield the same inferences as the actual data. The idea is to view confidential data as missing and use multiple imputation techniques to create synthetic data sets. In this article, we compare techniques for creating synthetic data sets in simple scenarios with a binary variable.  相似文献   

2.
网上调查的理论与技术初探   总被引:3,自引:0,他引:3       下载免费PDF全文
尹恩山 《统计研究》1999,16(4):35-37
信息传播媒体是人类文明进步的重要标志,从信件到电视、电话,每种新的信息传播媒体的普及运用都会随之产生新的调查方式。新一代信息传播媒体——国际互联网的出现在世纪之交叩响了信息时代的大门,它以光电的速度、多媒体的内容、双向快速的信息交流形式和全球一体化的...  相似文献   

3.
对某个具体的政府行政职能部门而言,其数据供给和需求常常是不对称的。如何构建一套统计体系,将与其有关的统计数据信息采集、加工、应用、发布通过一个信息系统来实现,以满足其对外的数据供给和对内的数据需求?本文以北京市住建委为例,给出了政府行政职能部门统计体系设计的四个基本步骤:解析行政管理职能,构造统计体系的基本框架,分模块进行具体统计内容设计,落实统计报送周期、数据来源和传输渠道。  相似文献   

4.
For micro-datasets considered for release as scientific or public use files, statistical agencies have to face the dilemma of guaranteeing the confidentiality of survey respondents on the one hand and offering sufficiently detailed data on the other hand. For that reason, a variety of methods to guarantee disclosure control is discussed in the literature. In this paper, we present an application of Rubin’s (J. Off. Stat. 9, 462–468, 1993) idea to generate synthetic datasets from existing confidential survey data for public release.We use a set of variables from the 1997 wave of the German IAB Establishment Panel and evaluate the quality of the approach by comparing results from an analysis by Zwick (Ger. Econ. Rev. 6(2), 155–184, 2005) with the original data with the results we achieve for the same analysis run on the dataset after the imputation procedure. The comparison shows that valid inferences can be obtained using the synthetic datasets in this context, while confidentiality is guaranteed for the survey participants.  相似文献   

5.
毛盛勇 《统计研究》2012,29(7):14-18
本文研究分析并归纳了经济合作与发展组织(OECD)34个成员国及巴西、俄罗斯、印度和南非等4个金砖国家季度支出法GDP核算的主要方法、资料来源及数据发布情况,提出要进一步提高基础数据质量,加快正式建立我国季度支出法GDP核算制度。  相似文献   

6.
The European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) believes access to clinical trial data should be implemented in a way that supports good research, avoids misuse of such data, lies within the scope of the original informed consent and fully protects patient confidentiality. In principle, EFSPI supports responsible data sharing. EFSPI acknowledges it is in the interest of patients that their data are handled in a strictly confidential manner to avoid misuse under all possible circumstances. It is also in the interest of the altruistic nature of patients participating in trials that such data will be used for further development of science as much as possible applying good statistical principles. This paper summarises EFSPI's position on access to clinical trial data. The position was developed during the European Medicines Agency (EMA) advisory process and before the draft EMA policy on publication and access to clinical trial data was released for consultation; however, the EFSPI's position remains unchanged following the release of the draft policy. Finally, EFSPI supports a need for further guidance to be provided on important technical aspects relating to re‐analyses and additional analyses of clinical trial data, for example, multiplicity, meta‐analysis, subgroup analyses and publication bias. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

7.
A common task in quality control is to determine a control limit for a product at the time of release that incorporates its risk of degradation over time. Such a limit for a given quality measurement will be based on empirical stability data, the intended shelf life of the product and the stability specification. The task is particularly important when the registered specifications for release and stability are equal. We discuss two relevant formulations and their implementations in both a frequentist and Bayesian framework. The first ensures that the risk of a batch failing the specification is comparable at release and at the end of shelf life. The second is to screen out batches at release time that are at high risk of failing the stability specification at the end of their shelf life. Although the second formulation seems more natural from a quality assurance perspective, it usually renders a control limit that is too stringent. In this paper we provide theoretical insight in this phenomenon, and introduce a heat-map visualisation that may help practitioners to assess the feasibility of implementing a limit under the second formulation. We also suggest a solution when infeasible. In addition, the current industrial benchmark is reviewed and contrasted to the two formulations. Computational algorithms for both formulations are laid out in detail, and illustrated on a dataset.  相似文献   

8.
When preparing data for public release, information organizations face the challenge of preserving the quality of data while protecting the confidentiality of both data subjects and sensitive data attributes. Without knowing what type of analyses will be conducted by data users, it is often hard to alter data without sacrificing data utility. In this paper, we propose a new approach to mitigate this difficulty, which entails using Bayesian additive regression trees (BART), in connection with existing methods for statistical disclosure limitation, to help preserve data utility while meeting confidentiality requirements. We illustrate the performance of our method through both simulation and a data example. The method works well when the targeted relationship underlying the original data is not weak, and the performance appears to be robust to the intensity of alteration.  相似文献   

9.

A maximum likelihood procedure is given for estimating parameters in a germination-growth process, based on germination times only or on both times and locations. The process is assumed to be driven by a Poisson process whose intensity is of known analytical form. The procedure is shown to perform well on simulated data with unnormalised gamma intensity and is also applied to data on release of neurotransmitter at a synapse.  相似文献   

10.
In this article, we use U.S. real-time data to produce combined density nowcasts of quarterly Gross Domestic Product (GDP) growth, using a system of three commonly used model classes. We update the density nowcast for every new data release throughout the quarter, and highlight the importance of new information for nowcasting. Our results show that the logarithmic score of the predictive densities for U.S. GDP growth increase almost monotonically, as new information arrives during the quarter. While the ranking of the model classes changes during the quarter, the combined density nowcasts always perform well relative to the model classes in terms of both logarithmic scores and calibration tests. The density combination approach is superior to a simple model selection strategy and also performs better in terms of point forecast evaluation than standard point forecast combinations.  相似文献   

11.
Summary.  We consider a Bayesian forecasting system to predict the dispersal of contamination on a large scale grid in the event of an accidental release of radioactivity. The statistical model is built on a physical model for atmospheric dispersion and transport called MATCH. Our spatiotemporal model is a dynamic linear model where the state parameters are the (essentially, deterministic) predictions of MATCH; the distributions of these are updated sequentially in the light of monitoring data. One of the distinguishing features of the model is that the number of these parameters is very large (typically several hundreds of thousands) and we discuss practical issues arising in its implementation as a realtime model. Our procedures have been checked against a variational approach which is used widely in the atmospheric sciences. The results of the model are applied to test data from a tracer experiment.  相似文献   

12.
The Ricker's two‐release method is a simplified version of the Jolly‐Seber method, from Seber's Estimation of Animal Abundance (1982) , used to estimate survival rate and abundance in animal populations. This method assumes there is only a single recapture sample and no immigration, emigration or recruitment. In this paper, we propose a Bayesian analysis for this method to estimate the survival rate and the capture probability, employing Markov chain Monte Carlo methods and a latent variable analysis. The performance of the proposed method is illustrated with a simulation study as well as a real data set. The results show that the proposed method provides favourable inference for the survival rate when compared with the modified maximum likelihood method.  相似文献   

13.
Bayesian calibration of computer models   总被引:5,自引:0,他引:5  
We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise.  相似文献   

14.
投资者订单决策过程是研究不完全市场信息释放的逻辑起点。订单的信息分布对市场结构设计有重要的作用。构建非对称环境中投资者订单选择策略模型,并利用高频误差修正模型对理论研究结论进行检验。研究发现中国证券市场订单具有递减信息分布。  相似文献   

15.
The von Bertalanffy growth model is extended to incorporate explanatory variables. The generalized model includes the switched growth model and the seasonal growth model as special cases, and can also be used to assess the tagging effect on growth. Distribution-free and consistent estimating functions are constructed for estimation of growth parameters from tag-recapture data in which age at release is unknown. This generalizes the work of James (1991, Biometrics 47 1519–1530) who considered the classical model and allowed for individual variability in growth. A real dataset from barramundi ( Lates calcarifer ) is analysed to estimate the growth parameters and possible effect of tagging on growth.  相似文献   

16.
Summary: Panel data offers a unique opportunity to identify data that interviewers clearly faked by comparing data waves. In the German Socio–Economic Panel (SOEP), only 0.5 percent of all records of raw data have been detected as faked. These fakes are used here to analyze the potential impact of fakes on survey results. Due to our central finding the faked records have no impact on the mean or the proportions. However, we show that there may be a serious bias in the estimation of correlations and regression coefficients. In all but one year (1998), the detected faked data have never been disseminated within the widely–used SOEP study. The fakes are removed prior to data release.* We are grateful to participants in the workshop on Item Nonresponse and Data Quality on Large Social Surveys for useful critique and comments, especially Rainer Schnell and our outstanding discussant Regina Riphahn. The usual disclaimer applies.  相似文献   

17.
To protect public-use microdata, one approach is not to allow users access to the microdata. Instead, users submit analyses to a remote computer that reports back basic output from the fitted model, such as coefficients and standard errors. To be most useful, this remote server also should provide some way for users to check the fit of their models, without disclosing actual data values. This paper discusses regression diagnostics for remote servers. The proposal is to release synthetic diagnostics—i.e. simulated values of residuals and dependent and independent variables–constructed to mimic the relationships among the real-data residuals and independent variables. Using simulations, it is shown that the proposed synthetic diagnostics can reveal model inadequacies without substantial increase in the risk of disclosures. This approach also can be used to develop remote server diagnostics for generalized linear models.  相似文献   

18.
Summary.  We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focused on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty in our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set.  相似文献   

19.
Summary: The paper illustrates the value of broad researcher access to survey and administrative microdata using examples drawn from the U. S. experience, outlining how analyses of trends in earnings inequality, poverty and employment dynamics using such data have benefited policy makers and contributed to improvements in statistical agency data products. Methods of facilitating researcher access, including the release of public use files, the use of licensing agreements, and the establishment of research data centers, are discussed. * The author thanks Anne Polivka, Marilyn Seastrom and two anonymous referees for helpful comments on an earlier draft of the paper.  相似文献   

20.
The Points to Consider Document on Missing Data was adopted by the Committee of Health and Medicinal Products (CHMP) in December 2001. In September 2007 the CHMP issued a recommendation to review the document, with particular emphasis on summarizing and critically appraising the pattern of drop‐outs, explaining the role and limitations of the ‘last observation carried forward’ method and describing the CHMP's cautionary stance on the use of mixed models. In preparation for the release of the updated guidance document, statisticians in the Pharmaceutical Industry held a one‐day expert group meeting in September 2008. Topics that were debated included minimizing the extent of missing data and understanding the missing data mechanism, defining the principles for handling missing data and understanding the assumptions underlying different analysis methods. A clear message from the meeting was that at present, biostatisticians tend only to react to missing data. Limited pro‐active planning is undertaken when designing clinical trials. Missing data mechanisms for a trial need to be considered during the planning phase and the impact on the objectives assessed. Another area for improvement is in the understanding of the pattern of missing data observed during a trial and thus the missing data mechanism via the plotting of data; for example, use of Kaplan–Meier curves looking at time to withdrawal. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号