期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bayesian structural equation modeling for the health index

Ferra Yanuar Kamarulzaman Ibrahim Abdul Aziz Jemain 《Journal of applied statistics》2013,40(6):1254-1269

There are many factors which could influence the level of health of an individual. These factors are interactive and their overall effects on health are usually measured by an index which is called as health index. The health index could also be used as an indicator to describe the health level of a community. Since the health index is important, many research have been done to study its determinant. The main purpose of this study is to model the health index of an individual based on classical structural equation modeling (SEM) and Bayesian SEM. For estimation of the parameters in the measurement and structural equation models, the classical SEM applies the robust-weighted least-square approach, while the Bayesian SEM implements the Gibbs sampler algorithm. The Bayesian SEM approach allows the user to use the prior information for updating the current information on the parameter. Both methods are applied to the data gathered from a survey conducted in Hulu Langat, a district in Malaysia. Based on the classical and the Bayesian SEM, it is found that demographic status and lifestyle are significantly related to the health index. However, mental health has no significant relation to the health index. 相似文献

2.

Source Data Perturbation and consistent sets of safe tables

Cuppen Menno Willenborg Leon 《Statistics and Computing》2003,13(4):355-362

When tables are generated from a data file, the release of those tables should not reveal too detailed information concerning individual respondents. The disclosure of individual respondents in the microdata file can be prevented by applying disclosure control methods at the table level (by cell suppression or cell perturbation), but this may create inconsistencies among other tables based on the same data file. Alternatively, disclosure control methods can be applied at the microdata level, but these methods may change the data permanently and do not account for specific table properties. These problems can be circumvented by assigning a (single and fixed) weight factor to each respondent/record in the microdata file. Normally this weight factor is equal to 1 for each record, and is not explicitly incorporated in the microdata file. Upon tabulation, each contribution of a respondent is weighted multiplicatively by the respondent's weight factor. This approach is called Source Data Perturbation (SDP) because the data is perturbed at the microdata level, not at the table level. It should be noted, however, that the data in the original microdata is not changed; only a weight variable is added. The weight factors can be chosen in accordance with the SDC paradigm, i.e. such that the tables generated from the microdata are safe, and the information loss is minimized. The paper indicates how this can be done. Moreover it is shown that the SDP approach is very suitable for use in data warehouses, as the weights can be conveniently put in the fact tables. The data can then still be accessed and sliced and diced up to a certain level of detail, and tables generated from the data warehouse are mutually consistent and safe. 相似文献

3.

A statistical framework for ecological and aggregate studies 总被引：6，自引：2，他引：4

Jonathan Wakefield & Ruth Salway 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(1):119-137

Inference from studies that make use of data at the level of the area, rather than at the level of the individual, is more difficult for a variety of reasons. Some of these difficulties arise because frequently exposures (including confounders) vary within areas. In the most basic form of ecological study the outcome measure is regressed against a simple area level summary of exposure. In the aggregate data approach a survey of exposures and confounders is taken within each area. An alternative approach is to assume a parametric form for the within-area exposure distribution. We provide a framework within which ecological and aggregate data studies may be viewed, and we review some approaches to inference in such studies, clarifying the assumptions on which they are based. General strategies for analysis are provided including an estimator based on Monte Carlo integration that allows inference in the case of a general risk–exposure model. We also consider the implications of the introduction of random effects, and the existence of confounding and errors in variables. 相似文献

4.

A hazard model of the probability of medical school drop-out in the UK 总被引：2，自引：0，他引：2

Wiji Arulampalam Robin A. Naylor Jeremy P. Smith 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(1):157-178

Summary. From individual level longitudinal data for two entire cohorts of medical students in UK universities, we use multilevel models to analyse the probability that an individual student will drop out of medical school. We find that academic preparedness—both in terms of previous subjects studied and levels of attainment therein—is the major influence on withdrawal by medical students. Additionally, males and more mature students are more likely to withdraw than females or younger students respectively. We find evidence that the factors influencing the decision to transfer course differ from those affecting the decision to drop out for other reasons. 相似文献

5.

农村劳动力非农化程度微观影响因素的实证研究

下载免费PDF全文

张务伟张福明杨学成《统计研究》2012,29(1):106-109

本文基于山东省1674位农村劳动力调查资料,运用最优尺度回归分析方法,对影响农村劳动力非农化程度的个人因素和家庭因素进行了研究。研究结果表明,在劳动者个人因素中,对农村劳动力非农化程度影响力最大的前三位因素是：是否有技术特长、是否接受过职业培训和年龄大小;而在劳动者家庭因素中,家庭人均耕地面积的影响力最大。相似文献

6.

The case for small area microdata 总被引：3，自引：2，他引：1

Mark Tranmer rew Pickles Ed Fieldhouse Mark Elliot Angela Dale Mark Brown David Martin David Steel Chris Gardiner 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2005,168(1):29-49

Summary. Census data are available in aggregate form for local areas and, through the samples of anonymized records (SARs), as samples of microdata for households and individuals. In 1991 there were two SAR files: a household file and an individual file. These have a high degree of detail on the census variables but little geographical detail, a situation that will be exacerbated for the 2001 SAR owing to the loss of district level geography on the individual SAR. The paper puts forward the case for an additional sample of microdata, also drawn from the census, that has much greater geographical detail. Small area microdata (SAM) are individual level records with local area identifiers and, to maintain confidentiality, reduced detail on the census variables. Population data from seven local authorities, including rural and urban areas, are used to define prototype samples of SAM. The rationale for SAM is given, with examples that demonstrate the role of local area information in the analysis of census data. Since there is a trade-off between the extent of local detail and the extent of detail on variables that can be made available, the confidentiality risk of SAM is assessed empirically. An indicative specification of the SAM is given, having taken into account the results of the confidentiality analysis. 相似文献

7.

Overcoming biases and misconceptions in ecological studies 总被引：2，自引：1，他引：1

Katherine A. Guthrie & Lianne Sheppard 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(1):141-154

The aggregate data study design provides an alternative group level analysis to ecological studies in the estimation of individual level health risks. An aggregate model is derived by aggregating a plausible individual level relative rate model within groups, such that population-based disease rates are modelled as functions of individual level covariate data. We apply an aggregate data method to a series of fictitious examples from a review paper by Greenland and Robins which illustrated the problems that can arise when using the results of ecological studies to make inference about individual health risks. We use simulated data based on their examples to demonstrate that the aggregate data approach can address many of the sources of bias that are inherent in typical ecological analyses, even though the limited between-region covariate variation in these examples reduces the efficiency of the aggregate study. The aggregate method has the potential to estimate exposure effects of interest in the presence of non-linearity, confounding at individual and group levels, effect modification, classical measurement error in the exposure and non-differential misclassification in the confounder. 相似文献

8.

Dropping out of university: A statistical analysis of the probability of withdrawal for UK university students 总被引：4，自引：0，他引：4

Jeremy P. Smith & Robin A. Naylor 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(2):389-405

From individual level data for an entire cohort of undergraduate students in the 'old' universities in the UK, we use a binomial probit model to estimate the probability that an individual will 'drop out' of university before the completion of their degree course. We examine the cohort of students enrolling full time for a 3- or 4-year degree in the academic year 1989–1990. We find evidence to support both the hypothesis that the completion of courses by students is influenced by the extent of prior academic preparedness and the hypothesis that social integration at university is important. We also find an influence of unemployment in the county of prior residence, especially for poorer male students. Finally, we draw conclusions regarding the public policy of constructing university performance indicators in this area. 相似文献

9.

Determinants of Contraceptive Use in Egypt: A Multilevel Approach

Caterina Giusti Daniele Vignoli 《Statistical Methods and Applications》2006,15(1):89-106

The increasing use of family planning methods seems to be the intermediate determinant which mostly influences the fertility decline in developing countries, and in particular in those countries which are in an advanced phase of demographic transition such as Egypt. Moreover large countries, like Egypt, are characterized by very different geographical realities and even by strong regional heterogeneities. The aim of this study is the analysis of the determinants of contraceptive use in Egypt, with particular reference to the differentials due to the socio-economic context and to the area of residence. To estimate each individual and regional factors’ effect on contraceptive use, a logistic two-level random intercept model is fitted to EDHS 2000 data; the use of a multilevel analysis is suggested by the two-level data structure: the first level units are the women, the second level units are their regions of residence. 相似文献

10.

Alleviating linear ecological bias and optimal design with subsample data

Adam N. Glynn Jon Wakefield Mark S. Handcock Thomas S. Richardson 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(1):179-202

Summary. We illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides two main benefits. First, by including the individual level subsample data, the biases that are associated with linear ecological inference can be eliminated. Second, available ecological data can be used to design optimal subsampling schemes that maximize information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree on wages, showing that small, optimally chosen subsamples can be combined with ecological data to generate precise estimates relative to a simple random subsample. 相似文献

11.

Reducing bias in ecological studies: an evaluation of different methodologies

Gillian A. Lancaster Mick Green Steven Lane 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(4):681-700

Summary. Statistical methods of ecological analysis that attempt to reduce ecological bias are empirically evaluated to determine in which circumstances each method might be practicable. The method that is most successful at reducing ecological bias is stratified ecological regression. It allows individual level covariate information to be incorporated into a stratified ecological analysis, as well as the combination of disease and risk factor information from two separate data sources, e.g. outcomes from a cancer registry and risk factor information from the census sample of anonymized records data set. The aggregated individual level model compares favourably with this model but has convergence problems. In addition, it is shown that the large areas that are covered by local authority districts seem to reduce between-area variability and may therefore not be as informative as conducting a ward level analysis. This has policy implications because access to ward level data is restricted. 相似文献

12.

Modelling bias in combining small area prevalence estimates from multiple surveys

Manzi G Spiegelhalter DJ Turner RM Flowers J Thompson SG 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2011,174(1):31-50

相似文献

13.

What Level of Statistical Model Should We Use in Small Area Estimation?

下载免费PDF全文

Mohammad‐Reza Namazi‐Rad David Steel 《Australian & New Zealand Journal of Statistics》2015,57(2):275-298

If unit‐level data are available, small area estimation (SAE) is usually based on models formulated at the unit level, but they are ultimately used to produce estimates at the area level and thus involve area‐level inferences. This paper investigates the circumstances under which using an area‐level model may be more effective. Linear mixed models (LMMs) fitted using different levels of data are applied in SAE to calculate synthetic estimators and empirical best linear unbiased predictors (EBLUPs). The performance of area‐level models is compared with unit‐level models when both individual and aggregate data are available. A key factor is whether there are substantial contextual effects. Ignoring these effects in unit‐level working models can cause biased estimates of regression parameters. The contextual effects can be automatically accounted for in the area‐level models. Using synthetic and EBLUP techniques, small area estimates based on different levels of LMMs are investigated in this paper by means of a simulation study. 相似文献

14.

Small Area Estimation Using Estimated Population Level Auxiliary Data

Hukum Chandra U. C. Sud Yogita Gharde 《统计学通讯:模拟与计算》2015,44(5):1197-1209

Unit level linear mixed models are often used in small area estimation (SAE), and the empirical best linear unbiased prediction (EBLUP) is widely used for the estimation of small area means under such models. However, EBLUP requires population level auxiliary data, atleast area specific aggregated values. Sometimes population level auxiliary data is either not available or not consistent with the survey data. We describe a SAE method that uses estimated population auxiliary information. Empirical results show that proposed method for SAE produces an efficient set of small area estimates. 相似文献

15.

An elastic net penalized small area model combining unit- and area-level data for regional hypertension prevalence estimation

J. P. Burgard J. Krause R. Münnich 《Journal of applied statistics》2021,48(9):1659

Hypertension is a highly prevalent cardiovascular disease. It marks a considerable cost factor to many national health systems. Despite its prevalence, regional disease distributions are often unknown and must be estimated from survey data. However, health surveys frequently lack in regional observations due to limited resources. Obtained prevalence estimates suffer from unacceptably large sampling variances and are not reliable. Small area estimation solves this problem by linking auxiliary data from multiple regions in suitable regression models. Typically, either unit- or area-level observations are considered for this purpose. But with respect to hypertension, both levels should be used. Hypertension has characteristic comorbidities and is strongly related to lifestyle features, which are unit-level information. It is also correlated with socioeconomic indicators that are usually measured on the area-level. But the level combination is challenging as it requires multi-level model parameter estimation from small samples. We use a multi-level small area model with level-specific penalization to overcome this issue. Model parameter estimation is performed via stochastic coordinate gradient descent. A jackknife estimator of the mean squared error is presented. The methodology is applied to combine health survey data and administrative records to estimate regional hypertension prevalence in Germany. 相似文献

16.

Bayesian Portfolio Selection: An Empirical Analysis of the S&P 500 Index 1970–1996

Nicholas G. Polson Bernard V. Tew 《商业与经济统计学杂志》2013,31(2):164-173

In this article we present a technique for implementing large-scale optimal portfolio selection. We use high-frequency daily data to capture valuable statistical information in asset returns. We describe several statistical issues involved in quantitative approaches to portfolio selection. Our methodology applies to large-scale portfolio-selection problems in which the number of possible holdings is large relative to the estimation period provided by historical data. We illustrate our approach on an equity database that consists of stocks from the Standard and Poor's index, and we compare our portfolios to this benchmark index. Our methodology differs from the usual quadratic programming approach to portfolio selection in three ways: (1) We employ informative priors on the expected returns and variance-covariance matrices, (2) we use daily data for estimation purposes, with upper and lower holding limits for individual securities, and (3) we use a dynamic asset-allocation approach that is based on reestimating and then rebalancing the portfolio weights on a prespecified time window. The key inputs to the optimization process are the predictive distributions of expected returns and the predictive variance-covariance matrix. We describe the statistical issues involved in modeling these inputs for high-dimensional portfolio problems in which our data frequency is daily. In our application, we find that our optimal portfolio outperforms the underlying benchmark. 相似文献

17.

A two-phase sampling scheme with applications to auditing or sed quis custodiet ipsos custodes?

V. Barnett J. Haworth & T. M. F. Smith 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(2):407-422

External auditors such as the National Audit Office (NAO) are the final arbiters on the level of error in accounts presented to them by their clients, and the accuracy or otherwise of individual transactions. In coming to a view on the level of error, they are expected to carry out the audit effectively and efficiently, and therefore need to make the best possible use of all the information at their disposal, even when some of the information may not be totally accurate. We consider the particular situation where the NAO is given access to the results of tests on a relatively large random sample of transactions, typically conducted by the client's internal auditors. A two-phase sampling scheme arises when the NAO subsequently assesses the quality of the client's data by retesting a subsample of these transactions. The paper discusses methodologies for combining the two sets of data to produce optimum estimates of the proportion of transactions in error (the error rate) and of the level of monetary error in the account. Although a maximum likelihood approach yields a relatively straightforward solution to the error rate problem, there is no uniformly optimum way to estimate the monetary error. Three possible methods are proposed, and the results of a series of simulation experiments to compare their performance under a variety of audit conditions is described. 相似文献

18.

中国城镇居民信息消费水平估计与收敛性分析

张肃《统计与信息论坛》2016,(9)

基于面板数据对2002—2013年中国城镇居民信息消费水平进行了估计,并进一步研究了其收敛性。结果表明:城镇居民信息消费水平存在较显著的地区差异,最高收入省份的消费水平与最低收入省份的消费水平差距呈扩大趋势;信息消费水平不具有σ收敛性,也不存在绝对β收敛的情况;但是存在不可观测的个体异质性变量能促进收敛,存在条件β收敛性;引入空间相关性后,收敛速度加快,并且其相邻近地区信息消费水平增长率的误差冲击产生正向作用。相似文献

19.

Bayes methods in the ecological fallacy cntext:estimation of individual correlation from aggregate data

Robert B. Bendel Bradley P. Carlin 《统计学通讯:理论与方法》2013,42(7):2595-2623

The ecological fallacy is related to Simpson's paradox (1951) where relationships among group means may be counterintuitive and substantially different from relationships within groups, where the groups are usually geographic entities such as census tracts. We consider the problem of estimating the correlation between two jointly normal random variables where only ecological data (group means) are available. Two empirical Bayes estimators and one fully Bayesian estimator are derived and compared with the usual ecological estimator, which is simply the Pearson correlation coefficient of the group sample means. We simulate the bias and mean squared error performance of these estimators, and also give an example employing a dataset where the individual level data are available for model checking. The results indicate superiority of the empirical Bayes estimators in a variety of practical situations where, though we lack individual level data, other relevant prior information is available. 相似文献

20.

M-quantile models with application to poverty mapping 总被引：1，自引：0，他引：1

Nikos Tzavidis Nicola Salvati Monica Pratesi Ray Chambers 《Statistical Methods and Applications》2008,17(3):393-411

Over the last decade there has been growing demand for estimates of population characteristics at small area level. Unfortunately, cost constraints in the design of sample surveys lead to small sample sizes within these areas and as a result direct estimation, using only the survey data, is inappropriate since it yields estimates with unacceptable levels of precision. Small area models are designed to tackle the small sample size problem. The most popular class of models for small area estimation is random effects models that include random area effects to account for between area variations. However, such models also depend on strong distributional assumptions, require a formal specification of the random part of the model and do not easily allow for outlier robust inference. An alternative approach to small area estimation that is based on the use of M-quantile models was recently proposed by Chambers and Tzavidis (Biometrika 93(2):255–268, 2006) and Tzavidis and Chambers (Robust prediction of small area means and distributions. Working paper, 2007). Unlike traditional random effects models, M-quantile models do not depend on strong distributional assumption and automatically provide outlier robust inference. In this paper we illustrate for the first time how M-quantile models can be practically employed for deriving small area estimates of poverty and inequality. The methodology we propose improves the traditional poverty mapping methods in the following ways: (a) it enables the estimation of the distribution function of the study variable within the small area of interest both under an M-quantile and a random effects model, (b) it provides analytical, instead of empirical, estimation of the mean squared error of the M-quantile small area mean estimates and (c) it employs a robust to outliers estimation method. The methodology is applied to data from the 2002 Living Standards Measurement Survey (LSMS) in Albania for estimating (a) district level estimates of the incidence of poverty in Albania, (b) district level inequality measures and (c) the distribution function of household per-capita consumption expenditure in each district. Small area estimates of poverty and inequality show that the poorest Albanian districts are in the mountainous regions (north and north east) with the wealthiest districts, which are also linked with high levels of inequality, in the coastal (south west) and southern part of country. We discuss the practical advantages of our methodology and note the consistency of our results with results from previous studies. We further demonstrate the usefulness of the M-quantile estimation framework through design-based simulations based on two realistic survey data sets containing small area information and show that the M-quantile approach may be preferable when the aim is to estimate the small area distribution function. 相似文献