Value iteration
A dynamic-programming problem involves a total of N possible states and M admissible actions Assuming the use of a stationary policy, show that a single iteration of the value iteration algorithm requires on the order of N2 M operations.