共查询到20条相似文献,搜索用时 15 毫秒
1.
Colm Art O'cinneide 《The American statistician》2013,67(4):292-293
Statistics education is often restricted to teaching the mathematical equations and theories that form the foundation of statistical analysis. It can be argued, however, that the interpretation of the analysis and the communication of the results are equally important. The interdisciplinary role of statisticians requires us to examine real-life problems critically and communicate analytical results to nonstatisticians in a clear and concise manner. This article discusses the importance of including writing assignments as a routine part of statistics courses and presents benefits of the increased use of writing. 相似文献
2.
A number of articles have discussed the way lower order polynomial and interaction terms should be handled in linear regression models. Only if all lower order terms are included in the model will the regression model be invariant with respect to coding transformations of the variables. If lower order terms are omitted, the regression model will not be well formulated. In this paper, we extend this work to examine the implications of the ordering of variables in the linear mixed-effects model. We demonstrate how linear transformations of the variables affect the model and tests of significance of fixed effects in the model. We show how the transformations modify the random effects in the model, as well as their covariance matrix and the value of the restricted log-likelihood. We suggest a variable selection strategy for the linear mixed-effects model. 相似文献
3.
Miki Aoyagi 《统计学通讯:理论与方法》2013,42(15):2667-2687
The coefficient of the main term of the generalization error in Bayesian estimation is called a Bayesian learning coefficient. In this article, we first introduce Vandermonde matrix type singularities and show certain orthogonality conditions of them. Recently, it has been recognized that Vandermonde matrix type singularities are related to Bayesian learning coefficients for several hierarchical learning models. By applying the orthogonality conditions of them, we show that their log canonical threshold also corresponds to the Bayesian learning coefficient for normal mixture models, and we obtain the explicit computational results in dimension one. 相似文献
4.
小域估计(Small Area Estimation)是抽样调查领域里一个重要的研究方向,国计民生中的许多重要问题如失业率、传染病的发病率和民意测验等抽样调查都需要采用不同的小域估计方法。本文针对小域估计问题,以估计方法发展脉络为主线,以分层贝叶斯分析的小域估计为重点,对小域估计问题的理论、方法和最新进展进行简述,并利用澳大利亚残疾、老龄化和护理者(SDAC 2003)抽样调查实际数据,从分层贝叶斯分析角度对澳大利亚残疾率进行估计,最后对估计结果进行比较和讨论。 相似文献
5.
6.
Computer models are widely used in scientific research to study and predict the behaviour of complex systems. The run times of computer-intensive simulators are often such that it is impractical to make the thousands of model runs that are conventionally required for sensitivity analysis, uncertainty analysis or calibration. In response to this problem, highly efficient techniques have recently been developed based on a statistical meta-model (the emulator) that is built to approximate the computer model. The approach, however, is less straightforward for dynamic simulators, designed to represent time-evolving systems. Generalisations of the established methodology to allow for dynamic emulation are here proposed and contrasted. Advantages and difficulties are discussed and illustrated with an application to the Sheffield Dynamic Global Vegetation Model, developed within the UK Centre for Terrestrial Carbon Dynamics. 相似文献
7.
In this study, we propose a multivariate stochastic model for Web site visit duration, page views, purchase incidence, and the sale amount for online retailers. The model is constructed by composition from carefully selected distributions and involves copula components. It allows for the strong nonlinear relationships between the sales and visit variables to be explored in detail, and can be used to construct sales predictions. The model is readily estimated using maximum likelihood, making it an attractive choice in practice given the large sample sizes that are commonplace in online retail studies. We examine a number of top-ranked U.S. online retailers, and find that the visit duration and the number of pages viewed are both related to sales, but in very different ways for different products. Using Bayesian methodology, we show how the model can be extended to a finite mixture model to account for consumer heterogeneity via latent household segmentation. The model can also be adjusted to accommodate a more accurate analysis of online retailers like apple.com that sell products at a very limited number of price points. In a validation study across a range of different Web sites, we find that the purchase incidence and sales amount are both forecast more accurately using our model, when compared to regression, probit regression, a popular data-mining method, and a survival model employed previously in an online retail study. Supplementary materials for this article are available online. 相似文献
8.
Double hierarchical generalized linear models (with discussion) 总被引:2,自引:0,他引:2
Youngjo Lee John A. Nelder 《Journal of the Royal Statistical Society. Series C, Applied statistics》2006,55(2):139-185
Summary. We propose a class of double hierarchical generalized linear models in which random effects can be specified for both the mean and dispersion. Heteroscedasticity between clusters can be modelled by introducing random effects in the dispersion model, as is heterogeneity between clusters in the mean model. This class will, among other things, enable models with heavy-tailed distributions to be explored, providing robust estimation against outliers. The h -likelihood provides a unified framework for this new class of models and gives a single algorithm for fitting all members of the class. This algorithm does not require quadrature or prior probabilities. 相似文献
9.
Donald A. Berry 《The American statistician》2013,67(3):241-246
University courses in elementary statistics are usually taught from a frequentist perspective. In this paper I suggest how such courses can be taught using a Bayesian approach, and I indicate why beginning students are well served by a Bayesian course. A principal focus of any good elementary course is the application of statistics to real and important scientific problems. The Bayesian approach fits neatly with a scientific focus. Bayesians take a larger view, and one not limited to data analysis. In particular, the Bayesian approach is subjective, and requires assessing prior probabilities. This requirement forces users to relate current experimental evidence to other available information–-including previous experiments of a related nature, where “related” is judged subjectively. I discuss difficulties faced by instructors and students in elementary Bayesian courses, and provide a sample syllabus for an elementary Bayesian course. 相似文献
10.
Business analytics continues to become increasingly important in business and therefore in business education. We surveyed faculty who teach statistics or whose institutions offer statistics to business students and conducted web searches of business analytics and data science programs that are offered by these faculties associated with schools of business. The intent of the survey and web searches was to gain insight on the current landscape of business analytics and how it may work synergistically with data science at institutions of higher education, as well as inform the role that statistics education plays in the era of big data. The study presents an analysis of subject areas (Statistics, Operations Research, Management Information Systems, Data Analytics, and Soft Skills) covered in courses offered by institutions with undergraduate degrees in business analytics or data science influencing statistics taught to business students. Given the notable contribution of statistics to the study of business analytics and data science and the importance of knowledge and skills acquired in statistics-based courses not only for students pursuing a major or minor in the discipline, but also for all business majors entering the current data-centric business environment, we present findings about who is teaching what in business statistics education. 相似文献
11.
When the target variable exhibits a semicontinuous behavior (a point mass in a single value and a continuous distribution elsewhere), parametric “two-part models” have been extensively used and investigated. The applications have mainly been related to non negative variables with a point mass in zero (zero-inflated data). In this article, a semiparametric Bayesian two-part model for dealing with such variables is proposed. The model allows a semiparametric expression for the two parts of the model by using Dirichlet processes. A motivating example, based on grape wine production in Tuscany (an Italian region), is used to show the capabilities of the model. Finally, two simulation experiments evaluate the model. Results show a satisfactory performance of the suggested approach for modeling and predicting semicontinuous data when parametric assumptions are not reasonable. 相似文献
12.
Multiparameter evidence synthesis in epidemiology and medical decision-making: current approaches 总被引:1,自引:0,他引:1
A. E. Ades A. J. Sutton 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(1):5-35
Summary. Alongside the development of meta-analysis as a tool for summarizing research literature, there is renewed interest in broader forms of quantitative synthesis that are aimed at combining evidence from different study designs or evidence on multiple parameters. These have been proposed under various headings: the confidence profile method, cross-design synthesis, hierarchical models and generalized evidence synthesis. Models that are used in health technology assessment are also referred to as representing a synthesis of evidence in a mathematical structure. Here we review alternative approaches to statistical evidence synthesis, and their implications for epidemiology and medical decision-making. The methods include hierarchical models, models informed by evidence on different functions of several parameters and models incorporating both of these features. The need to check for consistency of evidence when using these powerful methods is emphasized. We develop a rationale for evidence synthesis that is based on Bayesian decision modelling and expected value of information theory, which stresses not only the need for a lack of bias in estimates of treatment effects but also a lack of bias in assessments of uncertainty. The increasing reliance of governmental bodies like the UK National Institute for Clinical Excellence on complex evidence synthesis in decision modelling is discussed. 相似文献
13.
In a 2 × 2 contingency table, when the sample size is small, there may be a number of cells that contain few or no observations, usually referred to as sparse data. In such cases, a common recommendation in the conventional frequentist methods is adding a small constant to every cell of the observed table to find the estimates of the unknown parameters. However, this approach is based on asymptotic properties of the estimates and may work poorly for small samples. An alternative approach would be to use Bayesian methods in order to provide better insight into the problem of sparse data coupled with fewer centers, which would otherwise be difficult to carry out the analysis. In this article, an attempt has been made to use hierarchical Bayesian model to a multicenter data on the effect of a surgical treatment with standard foot care among leprosy patients with posterior tibial nerve damage which is summarized as seven 2 × 2 tables. Monte Carlo Markov Chain (MCMC) techniques are applied in estimating the parameters of interest under sparse data setup. 相似文献
14.
15.
Tassone EC Miranda ML Gelfand AE 《Journal of the Royal Statistical Society. Series C, Applied statistics》2010,59(1):175-190
Summary. We consider joint spatial modelling of areal multivariate categorical data assuming a multiway contingency table for the variables, modelled by using a log-linear model, and connected across units by using spatial random effects. With no distinction regarding whether variables are response or explanatory, we do not limit inference to conditional probabilities, as in customary spatial logistic regression. With joint probabilities we can calculate arbitrary marginal and conditional probabilities without having to refit models to investigate different hypotheses. Flexible aggregation allows us to investigate subgroups of interest; flexible conditioning enables not only the study of outcomes given risk factors but also retrospective study of risk factors given outcomes. A benefit of joint spatial modelling is the opportunity to reveal disparities in health in a richer fashion, e.g. across space for any particular group of cells, across groups of cells at a particular location, and, hence, potential space–group interaction. We illustrate with an analysis of birth records for the state of North Carolina and compare with spatial logistic regression. 相似文献
16.
Modelling daily multivariate pollutant data at multiple sites 总被引:6,自引:1,他引:6
Gavin Shaddick Jon Wakefield 《Journal of the Royal Statistical Society. Series C, Applied statistics》2002,51(3):351-372
Summary. This paper considers the spatiotemporal modelling of four pollutants measured daily at eight monitoring sites in London over a 4-year period. Such multiple-pollutant data sets measured over time at multiple sites within a region of interest are typical. Here, the modelling was carried out to provide the exposure for a study investigating the health effects of air pollution. Alternative objectives include the design problem of the positioning of a new monitoring site, or for regulatory purposes to determine whether environmental standards are being met. In general, analyses are hampered by missing data due, for example, to a particular pollutant not being measured at a site, a monitor being inactive by design (e.g. a 6-day monitoring schedule) or because of an unreliable or faulty monitor. Data of this type are modelled here within a dynamic linear modelling framework, in which the dependences across time, space and pollutants are exploited. Throughout the approach is Bayesian, with implementation via Markov chain Monte Carlo sampling. 相似文献
17.
The increased emphasis on evidence-based medicine creates a greater need for educating future physicians in the general domain of quantitative reasoning, probability, and statistics. Reflecting this trend, more medical schools now require applicants to have taken an undergraduate course in introductory statistics. Given the breadth of statistical applications, we should cover in that course certain essential topics that may not be covered in the more general introductory statistics course. In selecting and presenting such topics, we should bear in mind that doctors also need to communicate probabilistic concepts of risks and benefits to patients who are increasingly expected to be active participants in their own health care choices despite having no training in medicine or statistics. It is also important that interesting and relevant examples accompany the presentation, because the examples (rather than the details) are what students tend to retain years later. Here, we present a list of topics we cover in the introductory biostatistics course that may not be covered in the general introductory course. We also provide some of our favorite examples for discussing these topics. 相似文献
18.
Mary Dupuis Sammel Louise M. Ryan & Julie M. Legler 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1997,59(3):667-678
We propose a latent variable model for mixed discrete and continuous outcomes. The model accommodates any mixture of outcomes from an exponential family and allows for arbitrary covariate effects, as well as direct modelling of covariates on the latent variable. An EM algorithm is proposed for parameter estimation and estimates of the latent variables are produced as a by-product of the analysis. A generalized likelihood ratio test can be used to test the significance of covariates affecting the latent outcomes. This method is applied to birth defects data, where the outcomes of interest are continuous measures of size and binary indicators of minor physical anomalies. Infants who were exposed in utero to anticonvulsant medications are compared with controls. 相似文献
19.
The purpose of this study is to highlight dangerous motorways via estimating the intensity of accidents and study its pattern across the UK motorway network. Two methods have been developed to achieve this aim. First, the motorway-specific intensity is estimated by using a homogeneous Poisson process. The heterogeneity across motorways is incorporated using two-level hierarchical models. The data structure is multilevel since each motorway consists of junctions that are joined by grouped segments. In the second method, the segment-specific intensity is estimated. The homogeneous Poisson process is used to model accident data within grouped segments but heterogeneity across grouped segments is incorporated using three-level hierarchical models. A Bayesian method via Markov Chain Monte Carlo is used to estimate the unknown parameters in the models and the sensitivity to the choice of priors is assessed. The performance of the proposed models is evaluated by a simulation study and an application to traffic accidents in 2016 on the UK motorway network. The deviance information criterion (DIC) and the widely applicable information criterion (WAIC) are employed to choose between models. 相似文献
20.
Mahdi Ebrahimzadeh Noor Akma Ibrahim Abdul Aziz Jemain Adem Kilicman 《统计学通讯:理论与方法》2013,42(18):3373-3400
In the usual credibility model, observations are made of a risk or group of risks selected from a population, and claims are assumed to be independent among different risks. However, there are some problems in practical applications and this assumption may be violated in some situations. Some credibility models allow for one source of claim dependence only, that is, across time for an individual insured risk or a group of homogeneous insured risks. Some other credibility models have been developed on a two-level common effects model that allows for two possible sources of dependence, namely, across time for the same individual risk and between risks. In this paper, we argue for the notion of modeling claim dependence on a three-level common effects model that allows for three possible sources of dependence, namely, across portfolios, across individuals and simultaneously across time within individuals. We also obtain the corresponding credibility premiums hierarchically using the projection method. Then we derive the general hierarchical structure or multi-level credibility premiums for the models with h-level of common effects. 相似文献