Implement a passive learning agent in a simple environment


Question: Implement a passive learning agent in a simple environment, such as the 4 x 3 world. For the case of an initially unknown environment model, compare the learning performance of the direct utility estimation, TD, and ADP algorithms. Do the comparison for the optimal policy and for several random policies. For which do the utility estimates converge faster? What happens when the size of the environment is increased? (Try environments with and without obstacles.)

Solution Preview :

Prepared by a verified Expert
Basic Computer Science: Implement a passive learning agent in a simple environment
Reference No:- TGS02473711

Now Priced at $15 (50% Discount)

Recommended (99%)

Rated (4.3/5)