(The Odoni bound) Let k∗ be the optimal stationary policy for a Markov decision problem and let g∗ and π ∗ be the corresponding gain and steady-state probability respectively. Let v∗(n, u) be the optimal dynamic expected reward for starting in state i at stage n with final reward vector u.
(a) Show that mini[v∗(n, u) - v∗(n - 1, u)] ≤ g∗ ≤ maxi[v∗(n, u) - v∗(n - 1, u)] ; n ≥ 1. Hint: Consider premultiplying v∗(n, u) - v∗(n - 1, u) by π ∗ or π where k is the optimal dynamic policy at stage n.
(b) Show that the lower bound is non-decreasing in n and the upper bound is non- increasing in n.
Text Book: Stochastic Processes: Theory for Applications By Robert G. Gallager.