共查询到20条相似文献,搜索用时 15 毫秒
1.
Miki Aoyagi 《统计学通讯:理论与方法》2013,42(15):2667-2687
The coefficient of the main term of the generalization error in Bayesian estimation is called a Bayesian learning coefficient. In this article, we first introduce Vandermonde matrix type singularities and show certain orthogonality conditions of them. Recently, it has been recognized that Vandermonde matrix type singularities are related to Bayesian learning coefficients for several hierarchical learning models. By applying the orthogonality conditions of them, we show that their log canonical threshold also corresponds to the Bayesian learning coefficient for normal mixture models, and we obtain the explicit computational results in dimension one. 相似文献
2.
In the past decade there has been a substantial increase in the number of introductory statistics courses taught at the undergraduate level. Many have argued successfully for the extensive use of writing in such courses in an attempt to highlight the interdisciplinary role of statistics and acknowledge that a good statistician must also be good at summarizing his or her analyses to nonstatisticians. This point was made by Radke-Sharpe, who went on to add that incorporating writing demands time, energy, and creativity, but that it is usually well worth the effort. This article discusses the efforts made by the authors to include writing in their courses, and some of the techniques that made the writing process painless and productive for both students and faculty. 相似文献
3.
A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms
Claudio J. Verzilli John C. Whittaker Nigel Stallard Daniel Chasman 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(1):191-206
Summary. Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree. 相似文献
4.
5.
R.T. Sabo 《Australian & New Zealand Journal of Statistics》2016,58(3):319-333
An effective method for improving the communication skills of graduate students in statistics and biostatistics is to provide consultations with non‐statistical researchers. Unfortunately, those experiences can be difficult to arrange or occur too infrequently to be reliable. The current study sought to help students develop both written and oral communication skills within an existing graduate biostatistics course by having students partake in role‐playing consultations. Though the class size was small, the students felt these activities helped improve their oral and written communication skills, and made them more aware of a biostatistician's role in consulting. There was also modest improvement in the students perceived function as a consulting biostatistician. Simulated consultations can be an effective educational tool for promoting the development of soft skills necessary for developing successful statisticians, can be implemented in existing courses, and do not require reliance upon external collaborators. Embedding these types of exercises within an existing curriculum can also be a cost‐effective alternative for programs that do not have formal consulting training. 相似文献
6.
In this paper, we introduce a semi-parametric Bayesian methodology based on the proportional hazard model that assumes that the baseline hazard function is constant over segments but, by contrast to what is usually assumed in the literature, with the periods at which the function changes not being specified in advance. The methodology is applied to explore the impact of Vocational Training courses offered by the University of Zaragoza (Spain) on the duration of the initial periods of unemployment experienced by graduate leavers. The framework is very flexible and allows us, in particular, to capture the presence of seasonality in the job insertion of graduates. 相似文献
7.
Amy Wagaman 《The American statistician》2013,67(4):405-412
Modern students encounter large, messy datasets long before setting foot in our classrooms. Many of these students need to develop skills in exploratory data analysis and multivariate analysis techniques for their jobs after college, but such topics are not covered in traditional introductory statistics courses. This case study describes my experience in designing and teaching an undergraduate course on multivariate data analysis with minimal prerequisites, using real data, active learning, and other interactive activities to help students tackle the material. Multivariate topics covered include clustering and classification (among others) for exploratory data analysis and an introduction to algorithmic modeling. Supplementary materials for this article are available online. 相似文献
8.
9.
10.
In this article, we aim at assessing hierarchical Bayesian modeling for the analysis of multiple exposures and highly correlated effects in a multilevel setting. We exploit an artificial data set to apply our method and show the gains in the final estimates of the crucial parameters. As a motivating example to simulate data, we consider a real prospective cohort study designed to investigate the association of dietary exposures with the occurrence of colon-rectum cancer in a multilevel framework, where, e.g., individuals have been enrolled from different countries or cities. We rely on the presence of some additional information suitable to mediate the final effects of the exposures and to be arranged in a level-2 regression to model similarities among the parameters of interest (e.g., data on the nutrient compositions for each dietary item). 相似文献
11.
Annouschka Laenen Ariel Alonso Geert Molenberghs Tony Vangeneugden 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(1):237-253
Summary. The concept of reliability denotes one of the most important psychometric properties of a measurement scale. Reliability refers to the capacity of the scale to discriminate between subjects in a given population. In classical test theory, it is often estimated by using the intraclass correlation coefficient based on two replicate measurements. However, the modelling framework that is used in this theory is often too narrow when applied in practical situations. Generalizability theory has extended reliability theory to a much broader framework but is confronted with some limitations when applied in a longitudinal setting. We explore how the definition of reliability can be generalized to a setting where subjects are measured repeatedly over time. On the basis of four defining properties for the concept of reliability, we propose a family of reliability measures which circumscribes the area in which reliability measures should be sought. It is shown how different members assess different aspects of the problem and that the reliability of the instrument can depend on the way that it is used. The methodology is motivated by and illustrated on data from a clinical study on schizophrenia. On the basis of this study, we estimate and compare the reliabilities of two different rating scales to evaluate the severity of the disorder. 相似文献
12.
Liliana Garrido L 《统计学通讯:模拟与计算》2013,42(3):355-375
In this article we propose mixture of distributions belonging to the biparametric exponential family, considering joint modeling of the mean and variance (or dispersion) parameters. As special cases we consider mixtures of normal and gamma distributions. A novel Bayesian methodology, using Markov Chain Monte Carlo (MCMC) methods, is proposed to obtain the posterior summaries of interest. We include simulations and real data examples to illustrate de performance of the proposal. 相似文献
13.
We commonly observe many types of paired nature of competitions in which the objects are compared by the respondents pairwise in a subjective manner. The Bayesian statistics, contrary to the classical statistics, presents a generic tool to incorporate new experimental evidence and update the existing information. These and other properties have ushered the statisticians to focus their attention on the Bayesian analysis of different paired comparison models. The present article focuses on the amended Davidson model for paired comparison in which an amendment has been introduced that accommodates the option of not distinguishing the effects of two treatments when they are compared pairwise. However, Bayesian analysis of the amended Davidson model is performed using the noninformative priors after making another small modification of incorporating the parameter of order effect factor. The joint and marginal posterior distributions of the parameters, their posterior estimates, predictive and posterior probabilities to compare the treatment parameters are obtained. 相似文献
14.
《Scandinavian Journal of Statistics》2018,45(3):729-752
The starting point in uncertainty quantification is a stochastic model, which is fitted to a technical system in a suitable way, and prediction of uncertainty is carried out within this stochastic model. In any application, such a model will not be perfect, so any uncertainty quantification from such a model has to take into account the inadequacy of the model. In this paper, we rigorously show how the observed data of the technical system can be used to build a conservative non‐asymptotic confidence interval on quantiles related to experiments with the technical system. The construction of this confidence interval is based on concentration inequalities and order statistics. An asymptotic bound on the length of this confidence interval is presented. Here we assume that engineers use more and more of their knowledge to build models with order of errors bounded by . The results are illustrated by applying the newly proposed approach to real and simulated data. 相似文献
15.
通过对自主探究学习模式的实证研究,分析、总结了学习者在该模式下的英语学习目标定向的构成,探讨各种学习目标定向与教学绩效的关系,发现自主探究模式可以显著提高外语教学绩效。在此模式下,学习者的学习目标定向以成绩目标定向为主。与掌握目标定向相比,成绩目标定向的确可以显著提高教学绩效。 相似文献
16.
This article proposes a Bayesian estimation framework for a typical multi-factor model with time-varying risk exposures to macroeconomic risk factors and corresponding premia to price U.S. publicly traded assets. The model assumes that risk exposures and idiosyncratic volatility follow a break-point latent process, allowing for changes at any point on time but not restricting them to change at all points. The empirical application to 40 years of U.S. data and 23 portfolios shows that the approach yields sensible results compared to previous two-step methods based on naive recursive estimation schemes, as well as a set of alternative model restrictions. A variance decomposition test shows that although most of the predictable variation comes from the market risk premium, a number of additional macroeconomic risks, including real output and inflation shocks, are significantly priced in the cross-section. A Bayes factor analysis massively favors the proposed change-point model. Supplementary materials for this article are available online. 相似文献
17.
A stochastic volatility in mean model with correlated errors using the symmetrical class of scale mixtures of normal distributions is introduced in this article. The scale mixture of normal distributions is an attractive class of symmetric distributions that includes the normal, Student-t, slash and contaminated normal distributions as special cases, providing a robust alternative to estimation in stochastic volatility in mean models in the absence of normality. Using a Bayesian paradigm, an efficient method based on Markov chain Monte Carlo (MCMC) is developed for parameter estimation. The methods developed are applied to analyze daily stock return data from the São Paulo Stock, Mercantile & Futures Exchange index (IBOVESPA). The Bayesian predictive information criteria (BPIC) and the logarithm of the marginal likelihood are used as model selection criteria. The results reveal that the stochastic volatility in mean model with correlated errors and slash distribution provides a significant improvement in model fit for the IBOVESPA data over the usual normal model. 相似文献
18.
Steele F 《Lifetime data analysis》2003,9(2):155-174
Event history models typically assume that the entire population is at risk of experiencing the event of interest throughout the observation period. However, there will often be individuals, referred to as long-term survivors, who may be considered a priori to have a zero hazard throughout the study period. In this paper, a discrete-time mixture model is proposed in which the probability of long-term survivorship and the timing of event occurrence are modelled jointly. Another feature of event history data that often needs to be considered is that they may come from a population with a hierarchical structure. For example, individuals may be nested within geographical regions and individuals in the same region may have similar risks of experiencing the event of interest due to unobserved regional characteristics. Thus, the discrete-time mixture model is extended to allow for clustering in the likelihood and timing of an event within regions. The model is further extended to allow for unobserved individual heterogeneity in the hazard of event occurrence. The proposed model is applied in an analysis of contraceptive sterilization in Bangladesh. The results show that a woman's religion and education level affect her probability of choosing sterilization, but not when she gets sterilized. There is also evidence of community-level variation in sterilization timing, but not in the probability of sterilization. 相似文献
19.
Nicole H. Augustin Stefan Lang Monica Musio Klaus von Wilpert 《Journal of the Royal Statistical Society. Series C, Applied statistics》2007,56(1):29-50
Summary. The data that are analysed are from a monitoring survey which was carried out in 1994 in the forests of Baden-Württemberg, a federal state in the south-western region of Germany. The survey is part of a large monitoring scheme that has been carried out since the 1980s at different spatial and temporal resolutions to observe the increase in forest damage. One indicator for tree vitality is tree defoliation, which is mainly caused by intrinsic factors, age and stand conditions, but also by biotic (e.g. insects) and abiotic stresses (e.g. industrial emissions). In the survey, needle loss of pine-trees and many potential covariates are recorded at about 580 grid points of a 4 km × 4 km grid. The aim is to identify a set of predictors for needle loss and to investigate the relationships between the needle loss and the predictors. The response variable needle loss is recorded as a percentage in 5% steps estimated by eye using binoculars and categorized into healthy trees (10% or less), intermediate trees (10–25%) and damaged trees (25% or more). We use a Bayesian cumulative threshold model with non-linear functions of continuous variables and a random effect for spatial heterogeneity. For both the non-linear functions and the spatial random effect we use Bayesian versions of P -splines as priors. Our method is novel in that it deals with several non-standard data requirements: the ordinal response variable (the categorized version of needle loss), non-linear effects of covariates, spatial heterogeneity and prediction with missing covariates. The model is a special case of models with a geoadditive or more generally structured additive predictor. Inference can be based on Markov chain Monte Carlo techniques or mixed model technology. 相似文献
20.
In this article, we present a compressive sensing based framework for generalized linear model regression that employs a two-component noise model and convex optimization techniques to simultaneously detect outliers and determine optimally sparse representations of noisy data from arbitrary sets of basis functions. We then extend our model to include model order reduction capabilities that can uncover inherent sparsity in regression coefficients and achieve simple, superior fits. Second, we use the mixed ?2/?1 norm to develop another model that can efficiently uncover block-sparsity in regression coefficients. By performing model order reduction over all independent variables and basis functions, our algorithms successfully deemphasize the effect of independent variables that become uncorrelated with dependent variables. This desirable property has various applications in real-time anomaly detection, such as faulty sensor detection and sensor jamming in wireless sensor networks. After developing our framework and inheriting a stable recovery theorem from compressive sensing theory, we present two simulation studies on sparse or block-sparse problems that demonstrate the superior performance of our algorithms with respect to (1) classic outlier-invariant regression techniques like least absolute value and iteratively reweighted least-squares and (2) classic sparse-regularized regression techniques like LASSO. 相似文献