Reconsider Prob. 21.5-7.
(a) Formulate a linear programming model for finding an optimal policy.
(b) Use the simplex method to solve this model. Use the resulting optimal solution to identify an optimal policy
Prob. 21.5-7
A chemical company produces two chemicals, denoted by 0 and 1, and only one can be produced at a time. Each month a decision is made as to which chemical to produce that month. Because the demand for each chemical is predictable, it is known that if 1 is produced this month, there is a 70 percent chance that it will also be produced again next month. Similarly, if 0 is produced this month, there is only a 20 percent chance that it will be produced again next month. To combat the emissions of pollutants, the chemical company has two processes, process A, which is efficient in combating the pollution from the production of 1 but not from 0, and process B, which is efficient in combating the pollution from the production of 0 but not from 1. Only one process can be used at a time. The amount of pollution from the production of each chemical under each process is
Unfortunately, there is a time delay in setting up the pollution control processes, so that a decision as to which process to use must be made in the month prior to the production decision. Management wants to determine a policy for when to use each pollution control process that will minimize the expected total discounted amount of all future pollution with a discount factor of α = 0.5.
(a) Formulate this problem as a Markov decision process by identifying the states, the decisions, and the Cik. Identify all the (stationary deterministic) policies.
(b) Use the policy improvement algorithm to find an optimal policy.