首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
In this article we describe methods for obtaining the predictive distributions of outcome gains in the framework of a standard latent variable selection model. Although most previous work has focused on estimation of mean treatment parameters as the method for characterizing outcome gains from program participation, we show how the entire distributions associated with these gains can be obtained in certain situations. Although the out-of-sample outcome gain distributions depend on an unidentified parameter, we use the results of Koop and Poirier to show that learning can take place about this parameter through information contained in the identified parameters via a positive definiteness restriction on the covariance matrix. In cases where this type of learning is not highly informative, the spread of the predictive distributions depends more critically on the prior. We show both theoretically and in extensive generated data experiments how learning occurs, and delineate the sensitivity of our results to the prior specifications. We relate our analysis to three treatment parameters widely used in the evaluation literature—the average treatment effect, the effect of treatment on the treated, and the local average treatment effect—and show how one might approach estimation of the predictive distributions associated with these outcome gains rather than simply the estimation of mean effects. We apply these techniques to predict the effect of literacy on the weekly wages of a sample of New Jersey child laborers in 1903.  相似文献   


Data Science is one of the newest interdisciplinary areas. It is transforming our lives unexpectedly fast. This transformation is also happening in our learning styles and practicing habits. We advocate an approach to data science training that uses several types of computational tools, including R, bash, awk, regular expressions, SQL, and XPath, often used in tandem. We discuss ways for undergraduate mentees to learn about data science topics, at an early point in their training. We give some intuition for researchers, professors, and practitioners about how to effectively embed real-life examples into data science learning environments. As a result, we have a unified program built on a foundation of team-oriented, data-driven projects.  相似文献   

Missing data in clinical trials are inevitable. We highlight the ICH guidelines and CPMP points to consider on missing data. Specifically, we outline how we should consider missing data issues when designing, planning and conducting studies to minimize missing data impact. We also go beyond the coverage of the above two documents, provide a more detailed review of the basic concepts of missing data and frequently used terminologies, and examples of the typical missing data mechanism, and discuss technical details and literature for several frequently used statistical methods and associated software. Finally, we provide a case study where the principles outlined in this paper are applied to one clinical program at protocol design, data analysis plan and other stages of a clinical trial.  相似文献   

The randomized cluster design is typical in studies where the unit of randomization is a cluster of individuals rather than the individual. Evaluating various intervention strategies across medical care providers at either an institutional level or at a physician group practice level fits the randomized cluster model. Clearly, the analytical approach to such studies must take the unit of randomization and accompanying intraclass correlation into consideration. We review alternative methods to the typical Pearson's chi-square analysis and illustrate these alternatives. We have written and tested a Fortran program that produces the statistics outlined in this paper. The program, in an executable format is available from the author on request.  相似文献   

In anticipation of the development and implementation of BIBFRAME, and its implications for continuing resources, the University of California, Los Angeles (UCLA), Continuing Resources Study Group has been learning about linked data principles and tools. This column will focus on how the Study Group conducted its internal training and how it is positioning itself to eventually participate in the broader discussion on BIBFRAME.  相似文献   

We describe estimation, learning, and prediction in a treatment-response model with two outcomes. The introduction of potential outcomes in this model introduces four cross-regime correlation parameters that are not contained in the likelihood for the observed data and thus are not identified. Despite this inescapable identification problem, we build upon the results of Koop and Poirier (1997) to describe how learning takes place about the four nonidentified correlations through the imposed positive definiteness of the covariance matrix. We then derive bivariate distributions associated with commonly estimated “treatment parameters” (including the Average Treatment Effect and effect of Treatment on the Treated), and use the learning that takes place about the nonidentified correlations to calculate these densities. We illustrate our points in several generated data experiments and apply our methods to estimate the joint impact of child labor on achievement scores in language and mathematics.  相似文献   

We show how a simple modification of the splitting method based on Gibbs sampler can be efficiently used for decision making in the sense that one can efficiently decide whether or not a given set of integer program constraints has at least one feasible solution. We also show how to incorporate the classic capture-recapture method into the splitting algorithm in order to obtain a low variance estimator for the counting quantity representing, say the number of feasible solutions on the set of the constraints of an integer program. We finally present numerical with with both, the decision making and the capture-recapture estimators and show their superiority as compared to the conventional one, while solving quite general decision making and counting ones, like the satisfiability problems.  相似文献   

The North Carolina Serials Conference was very fortunate to have secured Rachel Frick for its keynote speaker for 2013. The conference was a homecoming for Frick, who is a graduate of the University of North Carolina MSLS program and is currently the Director of the Digital Library Federation Program for the Council on Library and Information Resources (CLIR), a think tank and research organization located in Washington, D.C. The Digital Library Federation (DLF) has been in existence since 1995, its target audiences being digital library practitioners and other interested parties who are on the front-lines of teaching and learning in this specialty. In her address entitled “Who, What, Where, Why, and How,” Frick discussed some of the major initiatives and issues currently occurring within and around librarianship, exploring the effect that these large scale initiatives can, and should, have at the local level. She can be reached at her Twitter feed, @rlfrick.  相似文献   

This article presents five principles of learning, derived from cognitive theory and supported by empirical results in cognitive psychology. To bridge the gap between theory and practice, each of these principles is transformed into a practical guideline and exemplified in a real teaching context. It is argued that this approach of putting cognitive theory into practice can offer several benefits to statistics education: A means for explaining and understanding why reform efforts work; a set of guidelines that can help instructors make well-informed design decisions when implementing these reforms; and a framework for generating new and effective instructional innovations.  相似文献   

本文使用LASSO算法构建了基于基金持股数据的基金间动态学习网络,将基金研究中传统的无向网络扩展为有向网络,并检验了正向学习与反向学习两种不同的学习模式(信息利用方式) 对基金业绩的影响,进而探讨了其背后的经济含义。实证结果表明:当基金作为被学习者(信息被观测方)时,被正向学习会显著提高其业绩,被反向学习会显著降低其业绩;当基金作为主动学习者(信息观测方)时,无论是正向学习还是反向学习均不会对其业绩造成显著影响;对基金学习动机的分析表明,基金参与学习是为了提升相对自己上期的业绩、防止业绩倒退,且反向学习相对更加有效。本文分析了信 息传递方向、信息利用方式对基金业绩的影响,为如何将统计学习方法应用于金融问题的分析提供了一个新的视角。  相似文献   

Many seemingly different problems in machine learning, artificial intelligence, and symbolic processing can be viewed as requiring the discovery of a computer program that produces some desired output for particular inputs. When viewed in this way, the process of solving these problems becomes equivalent to searching a space of possible computer programs for a highly fit individual computer program. The recently developed genetic programming paradigm described herein provides a way to search the space of possible computer programs for a highly fit individual computer program to solve (or approximately solve) a surprising variety of different problems from different fields. In genetic programming, populations of computer programs are genetically bred using the Darwinian principle of survival of the fittest and using a genetic crossover (sexual recombination) operator appropriate for genetically mating computer programs. Genetic programming is illustrated via an example of machine learning of the Boolean 11-multiplexer function and symbolic regression of the econometric exchange equation from noisy empirical data.Hierarchical automatic function definition enables genetic programming to define potentially useful functions automatically and dynamically during a run, much as a human programmer writing a complex computer program creates subroutines (procedures, functions) to perform groups of steps which must be performed with different instantiations of the dummy variables (formal parameters) in more than one place in the main program. Hierarchical automatic function definition is illustrated via the machine learning of the Boolean 11-parity function.  相似文献   

This paper discusses the role of statistics within an interdisciplinary program on real problem solving in elementary schools (through grade 8). We first describe some general features of the USMES (Unified Science and Mathematics for Elementary Schools) curriculum and some situations where the application of statistical principles and techniques can enter the program. Then we present our ideas concerning the kinds of statistical methods that are appropriate in this environment, and we discuss the use of this material with both elementary school students and elementary  相似文献   

Numerous professional fields have an increasing need for individuals trained in statistics and other quantitative analysis techniques. Today there exists great potential to fulfill this need by providing opportunities through online learning. However, to provide a high-quality education for returning adult professionals seeking advanced degrees in applied statistics online, many challenges need to be overcome. Based on our experience developing Penn State University’s online program in applied statistics, we discuss the evolution of the program’s curriculum, recruitment and development of online faculty, and meeting the requirements of students as important areas that require consideration in the development of an online program. We also highlight program evaluation strategies employed to ensure innovation and improvement in online education as cornerstones to a program’s success.  相似文献   

Survival models assume that fates of individuals are independent, yet the robustness of this assumption has been poorly quantified. We examine how empirically derived estimates of the variance of survival rates are affected by dependency in survival probability among individuals. We used Monte Carlo simulations to generate known amounts of dependency among pairs of individuals and analyzed these data with Kaplan-Meier and Cormack-Jolly-Seber models. Dependency significantly increased these empirical variances as compared to theoretically derived estimates of variance from the same populations. Using resighting data from 168 pairs of black brant ( Branta bernicla nigricans ), we used a resampling procedure and program RELEASE to estimate empirical and mean theoretical variances. We estimated that the relationship between paired individuals caused the empirical variance of the survival rate to be 155% larger than the empirical variance for unpaired individuals. Monte Carlo simulations and use of this resampling strategy can provide investigators with information on how robust their data are to this common assumption of independent survival probabilities.  相似文献   

Survival models assume that fates of individuals are independent, yet the robustness of this assumption has been poorly quantified. We examine how empirically derived estimates of the variance of survival rates are affected by dependency in survival probability among individuals. We used Monte Carlo simulations to generate known amounts of dependency among pairs of individuals and analyzed these data with Kaplan-Meier and Cormack-Jolly-Seber models. Dependency significantly increased these empirical variances as compared to theoretically derived estimates of variance from the same populations. Using resighting data from 168 pairs of black brant ( Branta bernicla nigricans ), we used a resampling procedure and program RELEASE to estimate empirical and mean theoretical variances. We estimated that the relationship between paired individuals caused the empirical variance of the survival rate to be 155% larger than the empirical variance for unpaired individuals. Monte Carlo simulations and use of this resampling strategy can provide investigators with information on how robust their data are to this common assumption of independent survival probabilities.  相似文献   

Research concerning hospital readmissions has mostly focused on statistical and machine learning models that attempt to predict this unfortunate outcome for individual patients. These models are useful in certain settings, but their performance in many cases is insufficient for implementation in practice, and the dynamics of how readmission risk changes over time is often ignored. Our objective is to develop a model for aggregated readmission risk over time – using a continuous-time Markov chain – beginning at the point of discharge. We derive point and interval estimators for readmission risk, and find the asymptotic distributions for these probabilities. Finally, we validate our derived estimators using simulation, and apply our methods to estimate readmission risk over time using discharge and readmission data for surgical patients.  相似文献   

Longitudinal clinical trials with long follow-up periods almost invariably suffer from a loss to follow-up and non-compliance with the assigned therapy. An example is protocol 128 of the AIDS Clinical Trials Group, a 5-year equivalency trial comparing reduced dose zidovudine with the standard dose for treatment of paediatric acquired immune deficiency syndrome patients. This study compared responses to treatment by using both clinical and cognitive outcomes. The cognitive outcomes are of particular interest because the effects of human immunodeficiency virus infection of the central nervous system can be more acute in children than in adults. We formulate and apply a Bayesian hierarchical model to estimate both the intent-to-treat effect and the average causal effect of reducing the prescribed dose of zidovudine by 50%. The intent-to-treat effect quantifies the causal effect of assigning the lower dose, whereas the average causal effect represents the causal effect of actually taking the lower dose. We adopt a potential outcomes framework where, for each individual, we assume the existence of a different potential outcomes process at each level of time spent on treatment. The joint distribution of the potential outcomes and the time spent on assigned treatment is formulated using a hierarchical model: the potential outcomes distribution is given at the first level, and dependence between the outcomes and time on treatment is specified at the second level by linking the time on treatment to subject-specific effects that characterize the potential outcomes processes. Several distributional and structural assumptions are used to identify the model from observed data, and these are described in detail. A detailed analysis of AIDS Clinical Trials Group protocol 128 is given; inference about both the intent-to-treat effect and average causal effect indicate a high probability of dose equivalence with respect to cognitive functioning.  相似文献   

Summary.  When modelling multivariate financial data, the problem of structural learning is compounded by the fact that the covariance structure changes with time. Previous work has focused on modelling those changes by using multivariate stochastic volatility models. We present an alternative to these models that focuses instead on the latent graphical structure that is related to the precision matrix. We develop a graphical model for sequences of Gaussian random vectors when changes in the underlying graph occur at random times, and a new block of data is created with the addition or deletion of an edge. We show how a Bayesian hierarchical model incorporates both the uncertainty about that graph and the time variation thereof.  相似文献   

In literature there are several studies on the performance of Bayesian network structure learning algorithms. The focus of these studies is almost always the heuristics the learning algorithms are based on, i.e., the maximization algorithms (in score-based algorithms) or the techniques for learning the dependencies of each variable (in constraint-based algorithms). In this article, we investigate how the use of permutation tests instead of parametric ones affects the performance of Bayesian network structure learning from discrete data. Shrinkage tests are also covered to provide a broad overview of the techniques developed in current literature.  相似文献   

A late‐stage clinical development program typically contains multiple trials. Conventionally, the program's success or failure may not be known until the completion of all trials. Nowadays, interim analyses are often used to allow evaluation for early success and/or futility for each individual study by calculating conditional power, predictive power and other indexes. It presents a good opportunity for us to estimate the probability of program success (POPS) for the entire clinical development earlier. The sponsor may abandon the program early if the estimated POPS is very low and therefore permit resource savings and reallocation to other products. We provide a method to calculate probability of success (POS) at an individual study level and also POPS for clinical programs with multiple trials in binary outcomes. Methods for calculating variation and confidence measures of POS and POPS and timing for interim analysis will be discussed and evaluated through simulations. We also illustrate our approaches on historical data retrospectively from a completed clinical program for depression. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号