首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Can we find some common principle in the three comparisons? Lacking adequate time for a thorough exploration, let me suggest that representation is that common principle. I suggested (section 4) that judgment selection of spatial versus temporal extensions distinguish “longitudinal” local studies from “cross-section” population sampling. We had noted (section 3) that censuses are taken for detailed representation of the spatial dimension but they depend on judgmental selection of the temporal. Survey sampling lacks spatial detail but is spatially representative with randomization, and it can be made timely. Periodic samples can be designed that are representative of temporal extension. Furthermore, spatial and temporal detail can be obtained either through estimation or through cumulated samples [Purcell and Kish 1979, 1980; Kish 1979b, 1981, 1986 6.6]. Registers and administrative records can have good spatial and temporal representation, but representation may be lacking in population content, and surely in representation of variables. Representation of variables and of the relations between variables and over the population are the issues in conflict between surveys, experiments, and observations. This is a deep subject, and too deep to be explored again, as it was in section 2. A final point about limits for randomization to achieve representation through sampling: randomization for selecting samples of variables is beyond me generally, because I cannot conceive of frames for defined populations of variables. Yet we can find attempts at randomized selection of variables: in the selection of items for the consumer price index, also of items for tests of IQ or of achievements. Generally I believe that randomization is the way to achieve representation without complete coverage, and that it can be applied and practised in many dimensions.  相似文献   

2.
This keynote address at the 7th Australian Statistical Conference (1984) discusses briefly seven modifications of sample design for improving the usefulness and timeliness of surveys relevant to public poky. These are: better estimates for small domains, cumulating rolling samples, more panel surveys, multipurpose designs for periodic samples, split panel designs (SPD) combining panels with nonoverlapping samples, and frequent collections cumulated for less frequent reporting periods.  相似文献   

3.
Model-based estimators are becoming very popular in statistical offices because Governments require accurate estimates for small domains that were not planned when the study was designed, as their inclusion would have produced an increase in the cost of the study. The sample sizes in these domains are very small or even zero; consequently, traditional direct design-based estimators lead to unacceptably large standard errors. In this regard, model-based estimators that 'borrow information' from related areas by using auxiliary information are appropriate. This paper reviews, under the model-based approach, a BLUP synthetic and an EBLUP estimator. The goal is to obtain estimators of domain totals when there are several domains with very small sample sizes or without sampled units. We also provide detailed expressions of the mean squared error at different levels of aggregation. The results are illustrated with real data from the Basque Country Business Survey.  相似文献   

4.
"The census of population represents a rich source of social data. Other countries have released samples of anonymized records from their censuses to the research community for secondary analysis. So far this has not been done in Britain. The areas of research which might be expected to benefit from such microdata are outlined, and support is drawn from considering experience overseas. However, it is essential to protect the confidentiality of the data. The paper therefore considers the risks, both real and perceived, of identification of individuals from census microdata. The conclusion of the paper is that the potential benefits from census microdata are large and that the risks in terms of disclosure are very small. The authors therefore argue that the Office of Population Censuses and Surveys and the General Register Office of Scotland should release samples of anonymized records from the 1991 census for secondary analysis."  相似文献   

5.
The authors propose to estimate nonlinear small area population parameters by using the empirical Bayes (best) method, based on a nested error model. They focus on poverty indicators as particular nonlinear parameters of interest, but the proposed methodology is applicable to general nonlinear parameters. They use a parametric bootstrap method to estimate the mean squared error of the empirical best estimators. They also study small sample properties of these estimators by model‐based and design‐based simulation studies. Results show large reductions in mean squared error relative to direct area‐specific estimators and other estimators obtained by “simulated” censuses. The authors also apply the proposed method to estimate poverty incidences and poverty gaps in Spanish provinces by gender with mean squared errors estimated by the mentioned parametric bootstrap method. For the Spanish data, results show a significant reduction in coefficient of variation of the proposed empirical best estimators over direct estimators for practically all domains. The Canadian Journal of Statistics 38: 369–385; 2010 © 2010 Statistical Society of Canada  相似文献   

6.
Trends in family size and composition in the USSR over the past 30 years are analyzed, based on data from censuses conducted since 1959. Trends are examined according to territory and urban or rural area. Aspects considered include changes in the number of families, family size, incidence of divorce, single-parent families, and ethnic composition.  相似文献   

7.
"Net undercount rates in the U.S. decennial census have been steadily declining over the last several censuses. Differential undercounts among race groups and geographic areas, however, appear to persist. In the following, we examine and compare several methodologies for providing small area estimates of census coverage by constructing artificial populations. Measures of performance are also introduced to assess the various small area estimates. Synthetic estimation in combination with regression modelling provide the best results over the methods considered. Sampling error effects are also simulated. The results form the basis for determining coverage evaluation survey small area estimates of the 1900 decennial census."  相似文献   

8.
Summary.  Origin–destination statistics have been produced from the last three UK censuses. The paper describes what is new about the 2001 census interaction data on migration and commuting, considers the disclosure control methods that were applied to cells containing small values and demonstrates the problems that are associated with making comparisons with 1991 data. The effect of small cell adjustment procedures on the interaction data sets is investigated by means of selective analyses at different spatial scales. Some recommendations are made in light of the problems that were manifest in 2001.  相似文献   

9.
In many surveys, the domains of study are small and the samples that carry information on a domain can be very small indeed. If the survey is conducted repeatedly there is often a high degree of overlap in samples over time. We show how to use the richness of information over time to compensate for the paucity of cross‐sectional information. We propose a model‐based estimator of the population total which makes use of stabilised parameter estimates that combine information from different survey periods that are adjacent in time. The motivating example for this research was the ProdCom survey as implemented in the UK.  相似文献   

10.
The estimation or prediction of population characteristics based on the sample information is the key issue in survey sampling. If the sample sizes in subpopulations (domains) are large enough, similar methods as used for the whole population can be used to estimate or to predict subpopulations characteristics as well. To estimate or to predict characteristics of domains with small or even zero sample sizes, small area estimation methods “borrowing strength” from other subpopulations or time periods are widely used. We extend this problem and study methods of prediction of future population and subpopulations’ characteristics based on the longitudinal data.  相似文献   

11.
Small area estimation (SAE) concerns with how to reliably estimate population quantities of interest when some areas or domains have very limited samples. This is an important issue in large population surveys, because the geographical areas or groups with only small samples or even no samples are often of interest to researchers and policy-makers. For example, large population health surveys, such as Behavioural Risk Factor Surveillance System and Ohio Mecaid Assessment Survey (OMAS), are regularly conducted for monitoring insurance coverage and healthcare utilization. Classic approaches usually provide accurate estimators at the state level or large geographical region level, but they fail to provide reliable estimators for many rural counties where the samples are sparse. Moreover, a systematic evaluation of the performances of the SAE methods in real-world setting is lacking in the literature. In this paper, we propose a Bayesian hierarchical model with constraints on the parameter space and show that it provides superior estimators for county-level adult uninsured rates in Ohio based on the 2012 OMAS data. Furthermore, we perform extensive simulation studies to compare our methods with a collection of common SAE strategies, including direct estimators, synthetic estimators, composite estimators, and Datta GS, Ghosh M, Steorts R, Maples J.'s [Bayesian benchmarking with applications to small area estimation. Test 2011;20(3):574–588] Bayesian hierarchical model-based estimators. To set a fair basis for comparison, we generate our simulation data with characteristics mimicking the real OMAS data, so that neither model-based nor design-based strategies use the true model specification. The estimators based on our proposed model are shown to outperform other estimators for small areas in both simulation study and real data analysis.  相似文献   

12.
The Greek writer Phlegon (80–140 AD) from Tralles in Asia Minor wrote a book entitled On Long-lived Persons that contains a long list of people over a hundred years old. He collected data from the Roman censuses. With respect to the history of statistics, Phlegon's book is the earliest surviving text to use the Stem-and-Leaf display of collected data.  相似文献   

13.
Modern systems of official statistics require the estimation and publication of business statistics for disaggregated domains, for example, industry domains and geographical regions. Outlier robust methods have proven to be useful for small‐area estimation. Recently proposed outlier robust model‐based small‐area methods assume, however, uncorrelated random effects. Spatial dependencies, resulting from similar industry domains or geographic regions, often occur. In this paper, we propose an outlier robust small‐area methodology that allows for the presence of spatial correlation in the data. In particular, we present a robust predictive methodology that incorporates the potential spatial impact from other areas (domains) on the small area (domain) of interest. We further propose two parametric bootstrap methods for estimating the mean‐squared error. Simulations indicate that the proposed methodology may lead to efficiency gains. The paper concludes with an illustrative application by using business data for estimating average labour costs in Italian provinces.  相似文献   

14.
Sample surveys are usually designed and analysed to produce estimates for larger areas. Nevertheless, sample sizes are often not large enough to give adequate precision for small area estimates of interest. To overcome such difficulties, borrowing strength from related small areas via modelling becomes essential. In line with this, we propose components of variance models with power transformations for small area estimation. This paper reports the results of a study aimed at incorporating the power transformation in small area estimation for improving the quality of small area predictions. The proposed methods are demonstrated on satellite data in conjunction with survey data to estimate mean acreage under a specified crop for counties in Iowa.  相似文献   

15.
A method of calculating the composition of the Soviet population by nationality between censuses is described. The method uses both census and vital statistics data and is designed to produce estimates for the USSR, the Union Republics, and the rural and urban population.  相似文献   

16.
Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence on the estimates. In every example we have considered, our estimator achieves a mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation study that is carefully matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to encounter in a genome-wide association study.  相似文献   

17.
DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) is a classifier combination technique to construct a set of diverse base classifiers using additional artificially generated training instances. The predictions from the base classifiers are then integrated into one by the mean combination rule. In order to gain more insight about its effectiveness and advantages, this paper utilizes a large experiment to study the bias–variance analysis of DECORATE as well as some other widely used ensemble methods (such as bagging, AdaBoost, random forest) at different training sample sizes. The experimental results yield the following conclusions. For small training sets, DECORATE has a dominant advantage over its rivals and its success is attributed to the larger bias reduction achieved by it than the other algorithms. With increase in training data, AdaBoost benefits most and the bias reduced by it gradually turns to be significant while its variance reduction is also medium. Thus, AdaBoost performs best with large training samples. Moreover, random forest behaves always second best regardless of small or large training sets and it is seen to mainly decrease variance while maintaining low bias. Bagging seems to be an intermediate one since it reduces variance primarily.  相似文献   

18.
Summary.  The paper analyses a time series of infant mortality rates in the north of England from 1921 to the early 1970s at a spatial scale that is more disaggregated than in previous studies of infant mortality trends in this period. The paper describes regression methods to obtain mortality gradients over socioeconomic indicators from the censuses of 1931, 1951, 1961 and 1971 and to assess whether there is any evidence for widening spatial inequalities in infant mortality outcomes against a background of an overall reduction in the infant mortality rate. Changes in the degree of inequality are also formally assessed by inequality measures such as the Gini and Theil indices, for which sampling densities are obtained and significant changes assessed. The analysis concerns a relatively infrequent outcome (especially towards the end of the period that is considered) and a high proportion of districts with small populations, so necessitating the use of appropriate methods for deriving indices of inequality and for regression modelling.  相似文献   

19.
Some coverage error models for census data   总被引:1,自引:0,他引:1  
"Alternative models are presented for representing coverage error in surveys and censuses of human populations. The models are related to the capture-recapture models used in wildlife applications and to the dual-system models employed in the vital events literature. Estimation methodologies are discussed for one of the coverage error models." After a discussion of the theory underlying the methodology, "distinctions are made between two kinds of error: (a) sampling error and (b) error associated with the model. An example involving data from the 1980 U.S. census is presented. The problem of adjusting census and survey data for coverage error is also discussed."  相似文献   

20.
This article shows how to construct simple numerical exercises in balanced and unequally replicated one-way analysis of variance (ANOVA) (and experimental design) such that the estimated effects and residual standard deviations are preassigned whole numbers. Methods for generating single samples and simple linear regressions with exact estimates are already available (see Edwards 1959, Searle and Firey 1980, Posten 1982, and Read and Riley 1983). In this article the basic method of Read and Riley is extended to one-way ANOVA. For small numbers of treatments and replications and small residual standard deviations, tables of basic data are supplied that greatly expedite the construction of ANOVA layouts; the same methods may easily be applied, and the tables extended, to generate exercises involving larger numbers if required. The sets of estimated effects are essentially arbitrary, and so the significance or insignificance of main effects, or of contrasts thereof, can be illustrated by data sets designed for the purpose. This facility is a new and helpful aid to instruction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号