Problem
1. How well does the algorithm you proposed in
2. Perform? Design and run an experiment assessing the performance of your method. Discuss the role of parameter value settings in your experiment.
3. The pursuit algorithm described above is suited only for stationary environments because the action probabilities converge, albeit slowly, to certainty. How could you combine the pursuit idea with the ?-greedy idea to obtain a method with performance close to that of the pursuit algorithm, but which always continues to explore to some small degree?