A Note on Dirichlet One-Armed Bandits |
| |
Authors: | Manas K. Chattopadhyay |
| |
Affiliation: | The Gallup Organization , Maryland , U.S.A. , Rockville |
| |
Abstract: | One of the two independent stochastic processes (or ‘arms’) is selected and observed sequentially at each of n(≤ ∝) stages. Arm 1 yields observations identically distributed with unknown probability measure P with a Dirichlet process prior whereas observations from arm 2 have known probability measure Q. Future observations are discounted and at stage m, the payoff is a m(≥0) times the observation Z m at that stage. The objective is to maximize the total expected payoff. Clayton and Berry (1985) consider this problem when a m equals 1 for m ≤ n and 0 for m > n(< ∝) In this paper, the Clayton and Berry (1985) results are extended to the case of regular discount sequences of horizon n, which may also be infinite. The results are illustrated with numerical examples. In case of geometric discounting, the results apply to a bandit with many independent unknown Dirichlet arms. |
| |
Keywords: | AMS 1980 subject classifications Primary 62L05 secondary 62L15 |
|
|