首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A Note on Dirichlet One-Armed Bandits
Authors:Manas K Chattopadhyay
Institution:The Gallup Organization , Maryland , U.S.A. , Rockville
Abstract:One of the two independent stochastic processes (or ‘arms’) is selected and observed sequentially at each of n(≤ ∝) stages. Arm 1 yields observations identically distributed with unknown probability measure P with a Dirichlet process prior whereas observations from arm 2 have known probability measure Q. Future observations are discounted and at stage m, the payoff is a m(≥0) times the observation Z m at that stage. The objective is to maximize the total expected payoff. Clayton and Berry (1985) consider this problem when a m equals 1 for mn and 0 for m > n(< ∝) In this paper, the Clayton and Berry (1985) results are extended to the case of regular discount sequences of horizon n, which may also be infinite. The results are illustrated with numerical examples. In case of geometric discounting, the results apply to a bandit with many independent unknown Dirichlet arms.
Keywords:AMS 1980 subject classifications  Primary 62L05  secondary 62L15
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号