A Note on Dirichlet One-Armed Bandits期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A Note on Dirichlet One-Armed Bandits

Authors:	Manas K Chattopadhyay

Institution:	The Gallup Organization , Maryland , U.S.A. , Rockville

Abstract:	One of the two independent stochastic processes (or ‘arms’) is selected and observed sequentially at each of n(≤ ∝) stages. Arm 1 yields observations identically distributed with unknown probability measure P with a Dirichlet process prior whereas observations from arm 2 have known probability measure Q. Future observations are discounted and at stage m, the payoff is a _m(≥0) times the observation Z _m at that stage. The objective is to maximize the total expected payoff. Clayton and Berry (1985) consider this problem when a _m equals 1 for m ≤ n and 0 for m > n(< ∝) In this paper, the Clayton and Berry (1985) results are extended to the case of regular discount sequences of horizon n, which may also be infinite. The results are illustrated with numerical examples. In case of geometric discounting, the results apply to a bandit with many independent unknown Dirichlet arms.

Keywords:	AMS 1980 subject classifications Primary 62L05 secondary 62L15