Problem
Give pseudo-code for a complete algorithm for the n -armed bandit problem. Use greedy action selection and incremental computation of action values with α = 1/k step size. Assume a function bandit(a) that takes an action and returns a reward. Use arrays and variables; do not subscript anything by the time index t. Indicate how the action values are initialized and updated after each reward.