Abstract: | Repeated games with unknown payoff distributions are analogous to a single decision maker's “multi‐armed bandit” problem. Each state of the world corresponds to a different payoff matrix of a stage game. When monitoring is perfect, information about the state is public, and players are sufficiently patient, the following result holds: For any function that maps each state to a payoff vector that is feasible and individually rational in that state, there is a sequential equilibrium in which players experiment to learn the realized state and achieve a payoff close to the one specified for that state. |