Consider the following maze environment. Write a Java program to solve the following questions.
The transition model is as follows: the intended outcome occurs with probability 0.8, and with probability 0.1 the agent moves at either right angle to the intended direction. If the move would make the agent walk into a wall, the agent stays in the same place as before. The rewards for the white squares are -0.04, for the green squares are +1, and for the brown squares are -1. Note that there are no terminal states; the agent's state sequence is infinite.
Part 1: Assuming the known transition model and reward function listed above, find the optimal policy and the utilities of all the (non-wall) states using both value iteration and policy iteration. Display the optimal policy and the utilities of all the states, and plot utility estimates as a function of the number of iterations as in Figure 17.5(a) in the above reference book (for value iteration, you should need no more than 50 iterations to get convergence). In this question, use a discount factor of 0.99. Below are some reference utility values (computed with a different discount factor) to help you get an idea if the trend of your answers is correct.
Part 2: Design a more complicated maze environment of your own and re-run the algorithms designed for Part 1 on it. How does the number of states and the complexity of the environment affect convergence? How complex can you make the environment and still be able to learn the right policy?
Using method of value iteration for Part 1
- Descriptions of implemented solutions
- Plot of optimal policy
- Utilities of all states
- Plot of utility estimates as a function of the number of iterations
Using method of policy iteration for Part 1
- Descriptions of implemented solutions
- Plot of optimal policy
- Utilities of all states
- Plot of utility estimates as a function of the number of iterations
Source code for Part 1.
Part 2 bonus questions
- Answers of the questions in the report
- Source code
Attachment:- Assignment Files.rar