首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In 1991 Marsh and co-workers made the case for a sample of anonymized records (SAR) from the 1991 census of population. The case was accepted by the Office for National Statistics (then the Office of Population Censuses and Surveys) and a request was made by the Economic and Social Research Council to purchase the SARs. Two files were released for Great Britain—a 2% sample of individuals and a 1% sample of households. Subsequently similar samples were released for Northern Ireland. Since their release, the files have been heavily used for research and there has been no known breach of confidentiality. There is a considerable demand for similar files from the 2001 census, with specific requests for a larger sample size and lower population threshold for the individual SAR. This paper reassesses the analysis of Marsh and co-workers of the risk of identification of an individual or household in a sample of microdata from the 1991 census and also uses alternative ways of assessing risks with the 1991 SARs. The results of both the reassessment and the new analyses are reassuring and allow us to take the 1991 SARs as a base-line against which to assess proposals for changes to the size and structure of samples from the 2001 census.  相似文献   

2.
Summary.  We apply multivariate shrinkage to estimate local area rates of unemployment and economic inactivity by using UK Labour Force Survey data. The method exploits the similarity of the rates of claiming unemployment benefit and the unemployment rates as defined by the International Labour Organisation. This is done without any distributional assumptions, merely relying on the high correlation of the two rates. The estimation is integrated with a multiple-imputation procedure for missing employment status of subjects in the database (item non-response). The hot deck method that is used in the imputations is adapted to reflect the uncertainty in the model for non-response. The method is motivated as a development (improvement) of the current operational procedure in which the imputed value is a non-stochastic function of the data. An extension of the procedure to subjects who are absent from the database (unit non-response) is proposed.  相似文献   

3.
Summary.  The paper evaluates the effect of a recent change to unemployment benefit in the UK which requires both partners in a couple (rather than just one) to search for work. The difference-in-differences estimator is extended in two ways. First, variations in when the change was implemented are exploited to test and adjust for bias resulting from differential trends among the control group. Second, the approach is combined with matching to relax functional form restrictions. After several months, positive effects on exiting from benefits were detected but effects on entry to jobs were less apparent.  相似文献   

4.
In the context of estimating regression coefficients of an ill-conditioned binary logistic regression model, we develop a new biased estimator having two parameters for estimating the regression vector parameter β when it is subjected to lie in the linear subspace restriction Hβ = h. The matrix mean squared error and mean squared error (MSE) functions of these newly defined estimators are derived. Moreover, a method to choose the two parameters is proposed. Then, the performance of the proposed estimator is compared to that of the restricted maximum likelihood estimator and some other existing estimators in the sense of MSE via a Monte Carlo simulation study. According to the simulation results, the performance of the estimators depends on the sample size, number of explanatory variables, and degree of correlation. The superiority region of our proposed estimator is identified based on the biasing parameters, numerically. It is concluded that the new estimator is superior to the others in most of the situations considered and it is recommended to the researchers.  相似文献   

5.
This paper examines trends in the participation in higher education by disadvantaged social groups over the recent period of higher education expansion and reform. It has been suggested that disadvantaged groups can recoup by participation at mature ages and this question is examined. The data sources used are the Labour Force Survey (1986–1995), which yielded 13384 students (6747 men and 6637 women) and the General Household Survey (1984–1992) which yielded 1936 students (982 men and 954 women). From a perspective of equal opportunities, the relative participation of young people from manual and non-manual origins does not appear to have changed over the period considered, but there is some evidence of increased relative participation by people from manual class origins as mature students. Mature students from such origins were older than those from non-manual class origins, as were mature women than mature men, with consequences for employability. From a perspective of lifelong learning, the recent expansion has been successful, with more entrants from the unemployed. Considerable percentages of women also enter from full-time housework, and increasing percentages from manual work. However, as in the past, many entrants had been successful in becoming employed before entry, some being seconded by employers. Despite these changes, the greatest absolute take-up has been from middle class youth. Early employment outcomes were examined and suggest some discrimination against mature students. It is possible that the increased cost of higher education, in the context of an expanded labour market of graduates, may deter some mature students.  相似文献   

6.
Summary. This study investigates whether there was evidence of increasing risk of still-birth with increasing paternal exposure to ionizing radiation received during employment at the Sellafield nuclear installation before the child was conceived. A significant positive association is found between the total paternal preconceptional exposure to external ionizing radiation and the risk of still-birth (after adjustment for year of birth, social class, birth order and paternal age, odds ratio at 100 mSv 1.24 (95% confidence interval 1.04–1.45)). A summary of the principal scientific findings of this study has been published in the Lancet . This paper describes in detail the statistical methods that were used in the investigation and presents the results in full.  相似文献   

7.
Summary.  Previous research has proposed a design-based analysis procedure for experiments that are embedded in complex sampling designs in which the ultimate sampling units of an on-going sample survey are randomized over different treatments according to completely randomized designs or randomized block designs. Design-based Wald and t -statistics are applied to test whether sample means that are observed under various survey implementations are significantly different. This approach is generalized to experimental designs in which clusters of sampling units are randomized over the different treatments. Furthermore, test statistics are derived to test differences between ratios of two sample estimates that are observed under alternative survey implementations. The methods are illustrated with a simulation study and real life applications of experiments that are embedded in the Dutch Labour Force Survey. The functionality of a software package that was developed to conduct these analyses is described.  相似文献   

8.
We analyse the patterns of 6564 suicides in Hong Kong and 23671 suicides in Australia for the period 1981–1993. Within the unifying framework of logistic regression we investigate how suicide rates vary with marital status and age and how these patterns vary over time and between the two cultures. The main significant differences between the two cultures are that rates are higher in Australia, rates for males are much higher than for females in Australia but only slightly higher in Hong Kong, in Hong Kong the oldest age group has the highest suicide rate unlike in Australia and the protective effects of marriage are larger in Australia.  相似文献   

9.
In many panel studies, bivariate ordinal–nominal responses are measured and the aim is to investigate the effects of explanatory variables on these responses. A regression analysis for these types of data must allow for the correlation among responses of the same individual. To analyse such ordinal–nominal responses using a proper weighting approach, an ordinal–nominal bivariate transition model is proposed and maximum likelihood is used to find the parameter estimates. We propose a method in which the likelihood function can be partitioned to make possible the use of existing software. The approach is applied to the Labour Force Survey data in Iran, where the ordinal response, at the first period, is the duration of unemployment for unemployed people and the nominal response, in the second period, is economic activity status of these individuals. The interest is to find the reasons for staying unemployed or moving to another status of economic activity.  相似文献   

10.
The lymphocyte proliferative assay (LPA) of immune competence was conducted on 52 subjects, with up to 36 processing conditions per subject, to evaluate whether samples could be shipped or stored overnight, rather than being processed on fresh blood as currently required. The LPA study resulted in clustered binary data, with both cluster level and cluster-varying covariates. Two modelling strategies for the analysis of such clustered binary data are through the cluster-specific and population-averaged approaches. Whereas most research in this area has focused on the analysis of matched pairs data, in many situations, such as the LPA study, cluster sizes are naturally larger. Through considerations of interpretation and efficiency of these models when applied to large clusters, the mixed effect cluster-specific model was selected as most appropriate for the analysis of the LPA data. The model confirmed that the LPA response is significantly impaired in individuals infected with the human immunodeficiency virus (HIV). The LPA response was found to be significantly lower for shipped and overnight samples than for fresh samples, and this effect was significantly stronger among HIV-infected individuals. Surprisingly, an anticoagulant effect was not detected.  相似文献   

11.
The logistic regression model has been widely used in the social and natural sciences and results from studies using this model can have significant policy impacts. Thus, confidence in the reliability of inferences drawn from these models is essential. The robustness of such inferences is dependent on sample size. The purpose of this article is to examine the impact of alternative data sets on the mean estimated bias and efficiency of parameter estimation and inference for the logistic regression model with observational data. A number of simulations are conducted examining the impact of sample size, nonlinear predictors, and multicollinearity on substantive inferences (e.g. odds ratios, marginal effects) when using logistic regression models. Findings suggest that small sample size can negatively affect the quality of parameter estimates and inferences in the presence of rare events, multicollinearity, and nonlinear predictor functions, but marginal effects estimates are relatively more robust to sample size.  相似文献   

12.
Whittemore (1981) proposed an approach for calculating the sample size needed to test hypotheses with specified significance and power against a given alternative for logistic regression with small response probability. Based on the distribution of covariate, which could be either discrete or continuous, this approach first provides a simple closed-form approximation to the asymptotic covariance matrix of the maximum likelihood estimates, and then uses it to calculate the sample size needed to test a hypothesis about the parameter. Self et al. (1992) described a general approach for power and sample size calculations within the framework of generalized linear models, which include logistic regression as a special case. Their approach is based on an approximation to the distribution of the likelihood ratio statistic. Unlike the Whittemore approach, their approach is not limited to situations of small response probability. However, it is restricted to models with a finite number of covariate configurations. This study compares these two approaches to see how accurate they would be for the calculations of power and sample size in logistic regression models with various response probabilities and covariate distributions. The results indicate that the Whittemore approach has a slight advantage in achieving the nominal power only for one case with small response probability. It is outperformed for all other cases with larger response probabilities. In general, the approach proposed in Self et al. (1992) is recommended for all values of the response probability. However, its extension for logistic regression models with an infinite number of covariate configurations involves an arbitrary decision for categorization and leads to a discrete approximation. As shown in this paper, the examined discrete approximations appear to be sufficiently accurate for practical purpose.  相似文献   

13.
In previous work, non–response adjustments based on calibration weighting have been proposed for estimating gross flows in economic activity status from the quarterly Labour Force Survey. However, even after adjustment there may be residual non–response bias. The weighting is based on estimates of cross–sectional distributions and so cannot adjust for bias if non–response is associated with individual flows between quarters. To investigate this possibility, it was decided to apply models for estimating gross flows when non–response depends on the flows. This paper has two aims: first to describe the many problems encountered when attempting to implement these models; and second to outline a solution to the major problem that arose, namely, that comparing the model results directly with the weighting results was not possible. A simulation study was used to compare the results indirectly and it was tentatively concluded that non–response is not strongly associated with the flows and that the weighting provides an adequate adjustment.  相似文献   

14.
The demand for reliable statistics in subpopulations, when only reduced sample sizes are available, has promoted the development of small area estimation methods. In particular, an approach that is now widely used is based on the seminal work by Battese et al. [An error-components model for prediction of county crop areas using survey and satellite data, J. Am. Statist. Assoc. 83 (1988), pp. 28–36] that uses linear mixed models (MM). We investigate alternatives when a linear MM does not hold because, on one side, linearity may not be assumed and/or, on the other, normality of the random effects may not be assumed. In particular, Opsomer et al. [Nonparametric small area estimation using penalized spline regression, J. R. Statist. Soc. Ser. B 70 (2008), pp. 265–283] propose an estimator that extends the linear MM approach to the case in which a linear relationship may not be assumed using penalized splines regression. From a very different perspective, Chambers and Tzavidis [M-quantile models for small area estimation, Biometrika 93 (2006), pp. 255–268] have recently proposed an approach for small-area estimation that is based on M-quantile (MQ) regression. This allows for models robust to outliers and to distributional assumptions on the errors and the area effects. However, when the functional form of the relationship between the qth MQ and the covariates is not linear, it can lead to biased estimates of the small area parameters. Pratesi et al. [Semiparametric M-quantile regression for estimating the proportion of acidic lakes in 8-digit HUCs of the Northeastern US, Environmetrics 19(7) (2008), pp. 687–701] apply an extended version of this approach for the estimation of the small area distribution function using a non-parametric specification of the conditional MQ of the response variable given the covariates [M. Pratesi, M.G. Ranalli, and N. Salvati, Nonparametric m-quantile regression using penalized splines, J. Nonparametric Stat. 21 (2009), pp. 287–304]. We will derive the small area estimator of the mean under this model, together with its mean-squared error estimator and compare its performance to the other estimators via simulations on both real and simulated data.  相似文献   

15.
Summary.  Origin–destination statistics have been produced from the last three UK censuses. The paper describes what is new about the 2001 census interaction data on migration and commuting, considers the disclosure control methods that were applied to cells containing small values and demonstrates the problems that are associated with making comparisons with 1991 data. The effect of small cell adjustment procedures on the interaction data sets is investigated by means of selective analyses at different spatial scales. Some recommendations are made in light of the problems that were manifest in 2001.  相似文献   

16.
17.
Summary.  In the Netherlands, there is a research tradition that measures fraud against regulations by interviewing eligible individuals using a survey. In these studies the sensitive questions about fraud are posed by using a randomized response method. The paper describes the results of a Dutch study into the consequences of replacing home interviews by trained interviewers with Internet-delivered interviews in a survey on fraud in the area of disability benefits. Both surveys used computer-assisted self-interviews with randomized response questions. This study has three goals: first to present the research tradition that makes use of randomized response, second to compare the results of home interviews and the Internet survey and finally to introduce an adapted weighted logistic regression method to test the relationship between the probability of fraud and explanatory variables. The results show that there are no systematic differences between modes of interview, either for estimates of the prevalence of fraud or for the identification of associated variables. These outcomes result in the conclusion that the Internet survey is a useful and cost-effective instrument for measuring fraud in a population, and that it is unlikely that replacing home interviews with the Internet survey will result in a significant break with tradition.  相似文献   

18.
In order to estimate the effective dose such as the 0.5 quantile ED50ED50 in a bioassay problem various parametric and semiparametric models have been used in the literature. If the true dose–response curve deviates significantly from the model, the estimates will generally be inconsistent. One strategy is to analyze the data making only a minimal assumption on the model, namely, that the dose–response curve is non-decreasing. In the present paper we first define an empirical dose–response curve based on the estimated response probabilities by using the “pool-adjacent-violators” (PAV) algorithm, then estimate effective doses ED100pED100p for a large range of p by taking inverse of this empirical dose–response curve. The consistency and asymptotic distribution of these estimated effective doses are obtained. The asymptotic results can be extended to the estimated effective doses proposed by Glasbey [1987. Tolerance-distribution-free analyses of quantal dose–response data. Appl. Statist. 36 (3), 251–259] and Schmoyer [1984. Sigmoidally constrained maximum likelihood estimation in quantal bioassay. J. Amer. Statist. Assoc. 79, 448–453] under the additional assumption that the dose–response curve is symmetric or sigmoidal. We give some simulations on constructing confidence intervals using different methods.  相似文献   

19.
The risk of an individual woman having a pregnancy associated with Down's syndrome is estimated given her age, α-fetoprotein, human chorionic gonadotropin, and pregnancy-specific β1-glycoprotein levels. The classical estimation method is based on discriminant analysis under the assumption of lognormality of the marker values, but logistic regression is also applied for data classification. In the present work, we compare the performance of the two methods using a dataset containing the data of almost 89,000 unaffected and 333 affected pregnancies. Assuming lognormality of the marker values, we also calculate the theoretical detection and false positive rates for both the methods.  相似文献   

20.
The present investigation was undertaken to study the gillnet catch efficiency of sardines in the coastal waters of Sri Lanka using commercial catch and effort data. Commercial catch and effort data of small mesh gillnet fishery were collected in five fisheries districts during the period May 1999–August 2002. Gillnet catch efficiency of sardines was investigated by developing catch rates predictive models using data on commercial fisheries and environmental variables. Three statistical techniques [multiple linear regression, generalized additive model and regression tree model (RTM)] were employed to predict the catch rates of trenched sardine Amblygaster sirm (key target species of small mesh gillnet fishery) and other sardines (Sardinella longiceps, S. gibbosa, S. albella and S. sindensis). The data collection programme was conducted for another six months and the models were tested on new data. RTMs were found to be the strongest in terms of reliability and accuracy of the predictions. The two operational characteristics used here for model formulation (i.e. depth of fishing and number of gillnet pieces used per fishing operation) were more useful as predictor variables than the environmental variables. The study revealed a rapid tendency of increasing the catch rates of A. sirm with increased sea depth up to around 32 m.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号