期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Modelling bias in combining small area prevalence estimates from multiple surveys

Manzi G Spiegelhalter DJ Turner RM Flowers J Thompson SG 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2011,174(1):31-50

相似文献

2.

Patterns of consent: evidence from a general household survey 总被引：1，自引：0，他引：1

Stephen P. Jenkins Lorenzo Cappellari Peter Lynn Annette Jäckle Emanuela Sala 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(4):701-722

Summary. We analyse patterns of consent and consent bias in the context of a large general household survey, the 'Improving survey measurement of income and employment' survey, also addressing issues that arise when there are multiple consent questions. A multivariate probit regression model for four binary outcomes with two incidental truncations is used. We show that there are biases in consent to data linkage with benefit and tax credit administrative records that are held by the Department for Work and Pensions, and with wage and employment data held by employers. There are also biases in respondents' willingness and ability to supply their national insurance number. The biases differ according to the question that is considered. We also show that modelling questions on consent independently rather than jointly may lead to misleading inferences about consent bias. A positive correlation between unobservable individual factors affecting consent to Department for Work and Pensions record linkage and consent to employer record linkage is suggestive of a latent individual consent propensity. 相似文献

3.

Bias modelling in evidence synthesis

Rebecca M. Turner David J. Spiegelhalter Gordon C. S. Smith Simon G. Thompson 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(1):21-47

Summary. Policy decisions often require synthesis of evidence from multiple sources, and the source studies typically vary in rigour and in relevance to the target question. We present simple methods of allowing for differences in rigour (or lack of internal bias) and relevance (or lack of external bias) in evidence synthesis. The methods are developed in the context of reanalysing a UK National Institute for Clinical Excellence technology appraisal in antenatal care, which includes eight comparative studies. Many were historically controlled, only one was a randomized trial and doses, populations and outcomes varied between studies and differed from the target UK setting. Using elicited opinion, we construct prior distributions to represent the biases in each study and perform a bias-adjusted meta-analysis. Adjustment had the effect of shifting the combined estimate away from the null by approximately 10%, and the variance of the combined estimate was almost tripled. Our generic bias modelling approach allows decisions to be based on all available evidence, with less rigorous or less relevant studies downweighted by using computationally simple methods. 相似文献

4.

Multiple-bias modelling for analysis of observational data 总被引：3，自引：3，他引：0

Sander Greenland 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2005,168(2):267-306

Summary. Conventional analytic results do not reflect any source of uncertainty other than random error, and as a result readers must rely on informal judgments regarding the effect of possible biases. When standard errors are small these judgments often fail to capture sources of uncertainty and their interactions adequately. Multiple-bias models provide alternatives that allow one systematically to integrate major sources of uncertainty, and thus to provide better input to research planning and policy analysis. Typically, the bias parameters in the model are not identified by the analysis data and so the results depend completely on priors for those parameters. A Bayesian analysis is then natural, but several alternatives based on sensitivity analysis have appeared in the risk assessment and epidemiologic literature. Under some circumstances these methods approximate a Bayesian analysis and can be modified to do so even better. These points are illustrated with a pooled analysis of case–control studies of residential magnetic field exposure and childhood leukaemia, which highlights the diminishing value of conventional studies conducted after the early 1990s. It is argued that multiple-bias modelling should become part of the core training of anyone who will be entrusted with the analysis of observational data, and should become standard procedure when random error is not the only important source of uncertainty (as in meta-analysis and pooled analysis). 相似文献

5.

Gosset,Fisher, and the t Distribution

W. H. Williams 《The American statistician》2013,67(2):61-65

Bias has different sources. Measurement errors create “bad” data and biased estimates. But selection biases occur even with “good” data and can be both subtle and large in magnitude.

Selection biases are not easily detected by internal examination of the data. Detection is more likely by comparison with external data sources. 相似文献

6.

Concurrent generation of binary and nonnormal continuous data through fifth-order power polynomials

Hakan Demirtas 《统计学通讯:模拟与计算》2017,46(1):344-357

Data collection process in most observational and experimental studies yield different types of variables, leading to the use of joint models that are capable of handling multiple data types. Evaluation of various statistical techniques that have been developed for mixed data in simulated environments requires concurrent generation of multiple variables. In this article, I present an important augmentation to a unified framework proposed in our previously published work for simultaneously generating binary and nonnormal continuous data given the marginal characteristics and correlation structure, via fifth-order power polynomials that are known to extend the area covered in the skewness-elongation plane and to provide a better approximation to the probability density function of the continuous variables. I evaluate how well the improved methodology performs in comparison to the original one, in a simulated setting with illustrations of algorithmic steps. Although the relative gains for the associational quantities are not substantial, the augmented version appears to better capture the marginal quantities that are pertinent to the higher-order moments, as indicated by very close resemblance between the specified and empirically computed quantities on average. 相似文献

7.

Nonparametric smoothing in the analysis of air pollution and respiratory illness

Joel Schwartz 《Revue canadienne de statistique》1994,22(4):471-487

While most of epidemiology is observational, rather than experimental, the culture of epidemiology is still derived from agricultural experiments, rather than other observational fields, such as astronomy or economics. The mismatch is made greater as focus has turned to continue risk factors, multifactorial outcomes, and outcomes with large variation unexplainable by available risk factors. The analysis of such data is often viewed as hypothesis testing with statistical control replacing randomization. However, such approaches often test restricted forms of the hypothesis being investigated, such as the hypothesis of a linear association, when there is no prior empirical or theoretical reason to believe that if an association exists, it is linear. In combination with the large nonstochastic sources of error in such observational studies, this suggests the more flexible alternative of exploring the association. Conclusions on the possible causal nature of any discovered association will rest on the coherence and consistency of multiple studies. Nonparametric smoothing in general, and generalized additive models in particular, represent an attractive approach to such problems. This is illustrated using data examining the relationship between particulate air pollution and daily mortality in Birmingham, Alabama; between particulate air pollution, ozone, and SO₂ and daily hospital admissions for respiratory illness in Philadelphia; and between ozone and particulate air pollution and coughing episodes in children in six eastern U.S. cities. The results indicate that airborne particles and ozone are associated with adverse health outcomes at very low concentrations, and that there are likely no thresholds for these relationships. 相似文献

8.

Generalized confidence interval estimation for the mean of delta-lognormal distribution: an application to New Zealand trawl survey data

Wei-Hwa Wu Hsin-Neng Hsieh 《Journal of applied statistics》2014,41(7):1471-1485

Highly skewed and non-negative data can often be modeled by the delta-lognormal distribution in fisheries research. However, the coverage probabilities of extant interval estimation procedures are less satisfactory in small sample sizes and highly skewed data. We propose a heuristic method of estimating confidence intervals for the mean of the delta-lognormal distribution. This heuristic method is an estimation based on asymptotic generalized pivotal quantity to construct generalized confidence interval for the mean of the delta-lognormal distribution. Simulation results show that the proposed interval estimation procedure yields satisfactory coverage probabilities, expected interval lengths and reasonable relative biases. Finally, the proposed method is employed in red cod densities data for a demonstration. 相似文献

9.

Identifying Structural Models of Committee Decisions With Heterogeneous Tastes and Ideological Bias

Yonghong An Xun Tang 《商业与经济统计学杂志》2017,35(3):452-469

In practice, members of a committee often make different recommendations despite a common goal and shared sources of information. We study the nonparametric identification and estimation of a structural model, where such discrepancies are rationalized by the members’ unobserved types, which consist of ideological bias while weighing different sources of information, and tastes for multiple objectives announced in the policy target. We consider models with and without strategic incentives for members to make recommendations that conform to the final committee decision. We show that pure-strategy Bayesian Nash equilibria exist in both cases, and that the variation in common information recorded in the data helps us to recover the distribution of private types from the members’ choices. Building on the identification result, we estimate a structural model of interest rate decisions by the Monetary Policy Committee (MPC) at the Bank of England. We find some evidence that the external committee members are less affected by strategic incentives for conformity in their recommendations than the internal members. We also find that the difference in ideological bias between external and internal members is statistically insignificant. Supplementary materials for this article are available online. 相似文献

10.

Multi-relational learning via hierarchical nonparametric Bayesian collective matrix factorization

Hongxia Yang Aurelie Lozano 《Journal of applied statistics》2015,42(5):1133-1147

Relational learning addresses problems where the data come from multiple sources and are linked together through complex relational networks. Two important goals are pattern discovery (e.g. by (co)-clustering) and predicting unknown values of a relation, given a set of entities and observed relations among entities. In the presence of multiple relations, combining information from different but related relations can lead to better insights and improved prediction. For this purpose, we propose a nonparametric hierarchical Bayesian model that improves on existing collaborative factorization models and frames a large number of relational learning problems. The proposed model naturally incorporates (co)-clustering and prediction analysis in a single unified framework, and allows for the estimation of entire missing row or column vectors. We develop an efficient Gibbs algorithm and a hybrid Gibbs using Newton's method to enable fast computation in high dimensions. We demonstrate the value of our framework on simulated experiments and on two real-world problems: discovering kinship systems and predicting the authors of certain articles based on article–word co-occurrence features. 相似文献

11.

Sequential combining in discriminant analysis

T. Górecki 《Journal of applied statistics》2015,42(2):398-408

In practice, it often happens that we have a number of base methods of classification. We are not able to clearly determine which method is optimal in the sense of the smallest error rate. Then we have a combined method that allows us to consolidate information from multiple sources in a better classifier. I propose a different approach, a sequential approach. Sequentiality is understood here in the sense of adding posterior probabilities to the original data set and so created data are used during classification process. We combine posterior probabilities obtained from base classifiers using all combining methods. Finally, we combine these probabilities using a mean combining method. To the original data set we add obtained posterior probabilities as additional features. In each step we change our additional probabilities to achieve the minimum error rate for base methods. Experimental results on different data sets demonstrate that the method is efficient and that this approach outperforms base methods providing a reduction in the mean classification error rate. 相似文献

12.

Imputation in Data Fusion of Heterogeneous Data Sets A Model-Based Numerical Experiment

Andre Berchtold Andre Jeannin 《统计学通讯:模拟与计算》2013,42(7):1316-1328

Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures. 相似文献

13.

To use or not to use propensity score matching?

Jixian Wang 《Pharmaceutical statistics》2021,20(1):15-24

Propensity score matching (PSM) has been widely used to reduce confounding biases in observational studies. Its properties for statistical inference have also been investigated and well documented. However, some recent publications showed concern of using PSM, especially on increasing postmatching covariate imbalance, leading to discussion on whether PSM should be used or not. We review empirical and theoretical evidence for and against its use in practice and revisit the property of equal percent bias reduction and adapt it to more practical situations, showing that PSM has some additional desirable properties. With a small simulation, we explore the impact of caliper width on biases due to mismatching in matched samples and due to the difference between matched and target populations and show some issue of PSM may be due to inadequate caliper selection. In summary, we argue that the right question should be when and how to use PSM rather than to use or not to use it and give suggestions accordingly. 相似文献

14.

Sample size implications when biases are modelled rather than ignored

Paul Gustafson 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(4):865-881

Summary. Realistic statistical modelling of observational data often suggests a statistical model which is not fully identified, owing to potential biases that are not under the control of study investigators. Bayesian inference can be implemented with such a model, ideally with the most precise prior knowledge that can be ascertained. However, as a consequence of the non-identifiability, inference cannot be made arbitrarily accurate by choosing the sample size to be sufficiently large. In turn, this has consequences for sample size determination. The paper presents a sample size criterion that is based on a quantification of how much Bayesian learning can arise in a given non-identified model. A global perspective is adopted, whereby choosing larger sample sizes for some studies necessarily implies that some other potentially worthwhile studies cannot be undertaken. This suggests that smaller sample sizes should be selected with non-identified models, as larger sample sizes constitute a squandering of resources in making estimator variances very small compared with their biases. Particularly, consider two investigators planning the same study, one of whom admits to the potential biases at hand and consequently uses a non-identified model, whereas the other pretends that there are no biases, leading to an identified but less realistic model. It is seen that the former investigator always selects a smaller sample size than the latter, with the difference being quite marked in some illustrative cases. 相似文献

15.

Measurement Errors and Tests for Rationality

Jinook Jeong G. S. Maddala 《商业与经济统计学杂志》2013,31(4):431-439

The traditional tests for rationality, the regression and volatility tests, have often rejected the hypothesis of rationality for survey data on expectations. It has been argued that these tests are not valid in the presence of unit roots and hence cointegration tests should be applied. The cointegration tests have often failed to reject the hypothesis of rationality. The present article argues that errors in variables affect tests of rationality. We use multiple sources of expectations to correct for the errors-in-variables bias but find that the hypothesis of rationality is rejected even after this correction. The article uses survey data on interest rates, stock prices, and exchange rates. 相似文献

16.

A generalized estimating equation approach to modelling incompatible data formats with covariate measurement error: application to human immunodeficiency virus immune markers

J Kowalski & X. M Tu 《Journal of the Royal Statistical Society. Series C, Applied statistics》2002,51(1):91-114

The integration of technological advances into research studies often raises an issue of incompatibility of data. This problem is common to longitudinal and multicentre studies, taking the form of changes in the definitions, acquisition of data or measuring instruments of some study variables. In our case of studying the relationship between a marker of immune response to human immunodeficiency virus and human immunodeficiency virus infection status, using data from the Multi-Center AIDS Cohort Study, changes in the manufactured tests used for both variables occurred throughout the study, resulting in data with different manufactured scales. In addition, the latent nature of the immune response of interest necessitated a further consideration of a measurement error component. We address the general issue of incompatibility of data, together with the issue of covariate measurement error, in a unified, generalized linear model setting with inferences based on the generalized estimating equation framework. General conditions are constructed to ensure consistent estimates and their variances for the primary model of interest, with the asymptotic behaviour of resulting estimates examined under a variety of modelling scenarios. The approach is illustrated by modelling a repeated ordinal response with incompatible formats, as a function of a covariate with incompatible formats and measurement error, based on the Multi-Center AIDS Cohort Study data. 相似文献

17.

Analyzing propensity matched zero-inflated count outcomes in observational studies

Stacia M. DeSantis Christos Lazaridis Shuang Ji Francis G. Spinale 《Journal of applied statistics》2014,41(1):127-141

Determining the effectiveness of different treatments from observational data, which are characterized by imbalance between groups due to lack of randomization, is challenging. Propensity matching is often used to rectify imbalances among prognostic variables. However, there are no guidelines on how appropriately to analyze group matched data when the outcome is a zero-inflated count. In addition, there is debate over whether to account for correlation of responses induced by matching and/or whether to adjust for variables used in generating the propensity score in the final analysis. The aim of this research is to compare covariate unadjusted and adjusted zero-inflated Poisson models that do and do not account for the correlation. A simulation study is conducted, demonstrating that it is necessary to adjust for potential residual confounding, but that accounting for correlation is less important. The methods are applied to a biomedical research data set. 相似文献

18.

三维平衡轮换的多水平抽样调查设计研究综述及应用

陈光慧《统计与信息论坛》2013,(11):9-15

现行的多水平抽样调查使用的各种形式的轮换模式,在西方各国均得到了广泛应用,但也存在着一系列问题。鉴此,通过对各种形式轮换模式的归纳统一和理论化综述研究,最终归纳出三维平衡多水平轮换模式设计方法,即将多水平轮换模式设计与后续的抽样估计方法研究统一起来,不但能够削减各类轮换偏差的负面影响,还能准确度量轮换样本之间的相关关系,并在多水平调查下得出更加准确的连续性抽样数据。此套设计方法具有极大的推广价值。相似文献

19.

Bayesian hierarchical duration model for repeated events: an application to behavioral observations

Getachew A. Dagne James Snyder 《Journal of applied statistics》2009,36(11):1267-1279

This article presents a continuous-time Bayesian model for analyzing durations of behavior displays in social interactions. Duration data of social interactions are often complex because of repeated behaviors (events) at individual or group (e.g. dyad) level, multiple behaviors (multistates), and several choices of exit from a current event (competing risks). A multilevel, multistate model is proposed to adequately characterize the behavioral processes. The model incorporates dyad-specific and transition-specific random effects to account for heterogeneity among dyads and interdependence among competing risks. The proposed method is applied to child–parent observational data derived from the School Transitions Project to assess the relation of emotional expression in child–parent interaction to risk for early and persisting child conduct problems. 相似文献

20.

The relationship between crime, punishment and economic conditions: is reliable inference possible when crimes are under-recorded?

S. Pudney D. Deadman & D. Pyle 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1999,163(1):81-97

We investigate the estimation of dynamic models of criminal activity, when there is significant under-recording of crime. We give a theoretical analysis and use simulation techniques to investigate the resulting biases in conventional regression estimates. We find the biases to be of little practical significance. We develop and apply a new simulated maximum likelihood procedure that estimates simultaneously the measurement error and crime processes, using extraneous survey data. This also confirms that measurement error biases are small. Our estimation results for data from England and Wales imply a significant response of crime to both the economic and the enforcement environment. 相似文献