首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The case for small area microdata   总被引:3,自引:2,他引:1  
Summary.  Census data are available in aggregate form for local areas and, through the samples of anonymized records (SARs), as samples of microdata for households and individuals. In 1991 there were two SAR files: a household file and an individual file. These have a high degree of detail on the census variables but little geographical detail, a situation that will be exacerbated for the 2001 SAR owing to the loss of district level geography on the individual SAR. The paper puts forward the case for an additional sample of microdata, also drawn from the census, that has much greater geographical detail. Small area microdata (SAM) are individual level records with local area identifiers and, to maintain confidentiality, reduced detail on the census variables. Population data from seven local authorities, including rural and urban areas, are used to define prototype samples of SAM. The rationale for SAM is given, with examples that demonstrate the role of local area information in the analysis of census data. Since there is a trade-off between the extent of local detail and the extent of detail on variables that can be made available, the confidentiality risk of SAM is assessed empirically. An indicative specification of the SAM is given, having taken into account the results of the confidentiality analysis.  相似文献   

2.
陶然  黄恒君 《统计研究》2016,33(2):10-17
“互联网+”模式下,基于地理信息的统计数据开发利用对提高政府统计能力带来的影响不可忽视。结合美国MAF/TIGER系统建立的实践经验,通过分析名录库单位地址与电子地图中建筑物的匹配关系,我们提出利用经济普查获取的单位位置坐标,通过五年一次的全面更新为名录库建立地理识别信息,探讨未来在非普查年份如何获取位置坐标用于单位名录库地理信息的维护管理,重点讨论了通过互联网资源采集信息点(POI)数据和反向地址编译获取位置坐标的技术手段。统计系统基本单位名录地理信息的建立与开发将进一步拓展我国基本单位名录库的应用领域。  相似文献   

3.
In the area of diagnostics, it is common practice to leverage external data to augment a traditional study of diagnostic accuracy consisting of prospectively enrolled subjects to potentially reduce the time and/or cost needed for the performance evaluation of an investigational diagnostic device. However, the statistical methods currently being used for such leveraging may not clearly separate study design and outcome data analysis, and they may not adequately address possible bias due to differences in clinically relevant characteristics between the subjects constituting the traditional study and those constituting the external data. This paper is intended to draw attention in the field of diagnostics to the recently developed propensity score-integrated composite likelihood approach, which originally focused on therapeutic medical products. This approach applies the outcome-free principle to separate study design and outcome data analysis and can mitigate bias due to imbalance in covariates, thereby increasing the interpretability of study results. While this approach was conceived as a statistical tool for the design and analysis of clinical studies for therapeutic medical products, here, we will show how it can also be applied to the evaluation of sensitivity and specificity of an investigational diagnostic device leveraging external data. We consider two common scenarios for the design of a traditional diagnostic device study consisting of prospectively enrolled subjects, which is to be augmented by external data. The reader will be taken through the process of implementing this approach step-by-step following the outcome-free principle that preserves study integrity.  相似文献   

4.
Perceptive readers may have observed, and regretted, the absence of any discussion of card catalog closings, abolition of serials departments, changes in clerical procedures, or minutiae or any of the other concrete details of library operations so dear to the hearts of us all. The craft of serials management has scarcely been mentioned. Those topics discussed — the nature of organizational change, influencing change, communication, planning for the optimum impact of automation and for flexibility throughout the development and implementation phases — are concepts difficult to introduce into the pragmatic world of libraries. It is difficult to identify causes, to predict effects, and to deal with them. These are, however, critical elements in the success or failure of a serials automation project and a library administration which addresses these issues will manage the changes brought about by automation rather than passively submitting to its consequences.  相似文献   

5.
The National Sample Survey Organisation (NSSO) surveys are the main source of official statistics in India, and generate a range of invaluable data at the macro level (e.g. state and national levels). However, the NSSO data cannot be used directly to produce reliable estimates at the micro level (e.g. district or further disaggregate level) due to small sample sizes. There is a rapidly growing demand of such micro-level statistics in India, as the country is moving from centralized to more decentralized planning system. In this article, we employ small-area estimation (SAE) techniques to derive model-based estimates of the proportion of indebted households at district or at other small-area levels in the state of Uttar Pradesh in India by linking data from the Debt–Investment Survey 2002–2003 of NSSO and the Population Census 2001 and the Agriculture Census 2003. Our results show that the model-based estimates are precise and representative. For many small areas, it is even not possible to produce estimates using sample data alone. The model-based estimates generated using SAE are still reliable for such areas. The estimates are expected to provide invaluable information to policy analysts and decision-makers.  相似文献   

6.
The development of computer systems that support the geography of the census in Canada is described. The authors "present proposals for the 1991 Census that increase the level of automation, whether the geography changes or not. In addition, opportunities for change and improvement, should the geography change, are outlined. To minimize the risks involved, it is important to give these different proposals serious consideration early in the planning of a census."  相似文献   

7.
Abstract

Perceptive readers may have observed, and regretted, the absence of any discussion of card catalog closings, abolition of serials departments, changes in clerical procedures, or minutiae or any of the other concrete details of library operations so dear to the hearts of us all. The craft of serials management has scarcely been mentioned. Those topics discussed — the nature of organizational change, influencing change, communication, planning for the optimum impact of automation and for flexibility throughout the development and implementation phases — are concepts difficult to introduce into the pragmatic world of libraries. It is difficult to identify causes, to predict effects, and to deal with them. These are, however, critical elements in the success or failure of a serials automation project and a library administration which addresses these issues will manage the changes brought about by automation rather than passively submitting to its consequences.  相似文献   

8.
Summary:  The census and similar sources of data have been published for two centuries so the information that they contain should provide an unparalleled insight into the changing population of Britain over this time period. To date, however, the seemingly trivial problem of changes in boundaries has seriously hampered the use of these sources as they make it impossible to create long run time series of spatially detailed data. The paper reviews methodologies that attempt to resolve this problem by using geographical information systems and areal inter-polation to allow the reallocation of data from one set of administrative units onto another. This makes it possible to examine change over time for a standard geography and thus it becomes possible to unlock the spatial detail and the temporal depth that are held in the census and in related sources.  相似文献   

9.
Under given concrete exogenous conditions, the fraction of identifiable records in a microdata file without positive identifiers such as name and address is estimated. The effect of possible noise in the data, as well as the sample property of microdata files, is taken into account. Using real microdata files, it is shown that there is no risk of disclosure if the information content of characteristics known to the investigator (additional knowledge) is limited. Files with additional knowledge of large information content yield a high risk of disclosure. This can be eliminated only by massive modifications of the data records, which, however, involve large biases for complex statistical evaluations. In this case, the requirement for privacy protection and high-quality data perhaps may be fulfilled only if the linkage of such files with extensive additional knowledge is prevented by appropriate organizational and legal restrictions.  相似文献   

10.
Consider a finite population of size N with T possible realizations for each population unit. In reality the realizations may represent temporal, geographic or physical variations of the population unit. The paper provides design-based unbiased estimates for several population parameters of interest. Both simple random sampling and stratified sampling are considered. Some comparisons are given. An empirical study is also included with natural population data.  相似文献   

11.
Summary.  Census data are vital components of epidemiological studies, but the issues that are involved in using these data in such studies are often not fully appreciated. The paper describes some of the problems and uncertainties that arise, and some of the approaches that can be used to address them, based on experience in the Small Area Health Statistics Unit at Imperial College London. Issues considered include the geography of census data (zone design systems, recasting and the role of postcodes), temporal aspects of census data (especially in relation to migration and population change) and information content (especially in relation to characterization of socio-economic status). In the light of these issues, opportunities to improve the resolution and utility of census data for epidemiological studies are discussed.  相似文献   

12.
The paper proposes a cross-validation method to address the question of specification search in a multiple nonlinear quantile regression framework. Linear parametric, spline-based partially linear and kernel-based fully nonparametric specifications are contrasted as competitors using cross-validated weighted L 1-norm based goodness-of-fit and prediction error criteria. The aim is to provide a fair comparison with respect to estimation accuracy and/or predictive ability for different semi- and nonparametric specification paradigms. This is challenging as the model dimension cannot be estimated for all competitors and the meta-parameters such as kernel bandwidths, spline knot numbers and polynomial degrees are difficult to compare. General issues of specification comparability and automated data-driven meta-parameter selection are discussed. The proposed method further allows us to assess the balance between fit and model complexity. An extensive Monte Carlo study and an application to a well-known data set provide empirical illustration of the method.  相似文献   

13.
"This article presents and implements a new method for making stochastic population forecasts that provide consistent probability intervals. We blend mathematical demography and statistical time series methods to estimate stochastic models of fertility and mortality based on U.S. data back to 1900 and then use the theory of random-matrix products to forecast various demographic measures and their associated probability intervals to the year 2065. Our expected total population sizes agree quite closely with the Census medium projections, and our 95 percent probability intervals are close to the Census high and low scenarios. But Census intervals in 2065 for ages 65+ are nearly three times as broad as ours, and for 85+ are nearly twice as broad. In contrast, our intervals for the total dependency and youth dependency ratios are more than twice as broad as theirs, and our ratio for the elderly dependency ratio is 12 times as great as theirs. These items have major implications for policy, and these contrasting indications of uncertainty clearly show the limitations of the conventional scenario-based methods."  相似文献   

14.
The United States Census Bureau Library, acting on a study commissioned in 2002, has undertaken to improve electronic access for their patrons. The library provides e-journal and database access. The staff relies on usage statistics from these resources to sharpen the focus of the collection. Through an aggressive outreach program Census Bureau employees learn about these new products and services. Future plans include adding more e-journals, an open OPAC, and a new library building.  相似文献   

15.
Over the past five years the Artificial Intelligence Center at SRI has been developing a new technology to address the problem of automated information management within real- world contexts. The result of this work is a body of techniques for automated reasoning from evidence that we call evidential reasoning. The techniques are based upon the mathematics of belief functions developed by Dempster and Shafer and have been successfully applied to a variety of problems including computer vision, multisensor integration, and intelligence analysis.

We have developed both a formal basis and a framework for implementating automated reasoning systems based upon these techniques. Both the formal and practical approach can be divided into four parts: (1) specifying a set of distinct propositional spaces, (2) specifying the interrelationships among these spaces, (3) representing bodies of evidence as belief distributions, and (4) establishing paths of the bodies for evidence to move through these spaces by means of evidential operations, eventually converging on spaces where the target questions can be answered. These steps specify a means for arguing from multiple bodies of evidence toward a particular (probabilistic) conclusion. Argument construction is the process by which such evidential analyses are constructed and is the analogue of constructing proof trees in a logical context.

This technology features the ability to reason from uncertain, incomplete, and occasionally inaccurate information based upon seven evidential operations: fusion, discounting, translation, projection, summarization, interpretation, and gisting. These operation are theoretically sound but have intuitive appeal as well.

In implementing this formal approach, we have found that evidential arguments can be represented as graphs. To support the construction, modification, and interrogation of evidential arguments, we have developed Gister. Gister provides an interactive, menu-driven, graphical interface that allows these graphical structures to be easily manipulated.

Our goal is to provide effective automated aids to domain experts for argument construction. Gister represents our first attempt at such an aid.  相似文献   


16.
This paper describes an application of small area estimation (SAE) techniques under area-level spatial random effect models when only area (or district or aggregated) level data are available. In particular, the SAE approach is applied to produce district-level model-based estimates of crop yield for paddy in the state of Uttar Pradesh in India using the data on crop-cutting experiments supervised under the Improvement of Crop Statistics scheme and the secondary data from the Population Census. The diagnostic measures are illustrated to examine the model assumptions as well as reliability and validity of the generated model-based small area estimates. The results show a considerable gain in precision in model-based estimates produced applying SAE. Furthermore, the model-based estimates obtained by exploiting spatial information are more efficient than the one obtained by ignoring this information. However, both of these model-based estimates are more efficient than the direct survey estimate. In many districts, there is no survey data and therefore it is not possible to produce direct survey estimates for these districts. The model-based estimates generated using SAE are still reliable for such districts. These estimates produced by using SAE will provide invaluable information to policy-analysts and decision-makers.  相似文献   

17.
Abstract

The United States Census Bureau Library, acting on a study commissioned in 2002, has undertaken to improve electronic access for their patrons. The library provides e-journal and database access. The staff relies on usage statistics from these resources to sharpen the focus of the collection. Through an aggressive outreach program Census Bureau employees learn about these new products and services. Future plans include adding more e-journals, an open OPAC, and a new library building.  相似文献   

18.
产品质量变化是CPI偏差的主要来源,质量调整问题是CPI统计理论与实践的难点之一。进入大数据时代,CPI质量调整的必要性和可行性都显著上升,常用的一些质量调整方法将更具操作性,而且质量偏差调整的效果也将明显改善。基于大数据的支持,经典的Hedonic方法可采用加权时间―虚拟Hedonic指数和加权时间―产品―虚拟Hedonic指数对质量发生改变的规格品进行价格虚拟,结果更加合理准确,并能保证CPI质量调整的及时性和动态性。目前中国CPI统计尚未引入质量偏差调整,为进一步提高数据质量,应尽快研究和实施CPI质量调整。  相似文献   

19.
"This paper examines attempts to collect data on a politically controversial topic, race and ethnicity, in the British Census of Population in the post-war period. It discusses an indirect, proxy method of inferring race or ethnicity by asking for the country of birth of the respondent and of his parents, and a direct question where the respondent is asked to identify his racial or ethnic group. Different versions of the direct question are examined, as is the 1979 Census test, which resulted in considerable public resistance to the question. Following the exclusion of the direct question from the 1981 Census, the subject was reviewed by the Parliamentary Home Affairs Committee, the results of whose report--including practical suggestions as to question wording--are discussed."  相似文献   

20.
Small area statistics obtained from sample survey data provide a critical source of information used to study health, economic, and sociological trends. However, most large-scale sample surveys are not designed for the purpose of producing small area statistics. Moreover, data disseminators are prevented from releasing public-use microdata for small geographic areas for disclosure reasons; thus, limiting the utility of the data they collect. This research evaluates a synthetic data method, intended for data disseminators, for releasing public-use microdata for small geographic areas based on complex sample survey data. The method replaces all observed survey values with synthetic (or imputed) values generated from a hierarchical Bayesian model that explicitly accounts for complex sample design features, including stratification, clustering, and sampling weights. The method is applied to restricted microdata from the National Health Interview Survey and synthetic data are generated for both sampled and non-sampled small areas. The analytic validity of the resulting small area inferences is assessed by direct comparison with the actual data, a simulation study, and a cross-validation study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号