Answered step by step
Verified Expert Solution
Question
1 Approved Answer
1. Optimal Policy (4pt) An agent lives in the 23 world shown above. Once it reaches the top right cell, the only action it can
1. Optimal Policy (4pt) An agent lives in the 23 world shown above. Once it reaches the top right cell, the only action it can take is to exit, receiving a reward of +10. In any other cell, the agent has the option to go either east, west, north, or south. Furthermore, if it tries to move outside of the borders of the grid, it will bounce off the wall and stay put. In all these cases, it receives the reward of the cell that it lands on as shown on the figure. We assume, a stochastic transition model where 70% of the time the agent goes in the direction it selects; the rest of the time it moves at right angles to the intended direction (15\% to the right and 15% to the 1eft ). If an intended or unintended actions is impossible it is still tried but would result in remaining in the same state and collecting the reward associtaed with that cell. Assuming no discounts (=1), please answer the following questions: (i) What is the optimal policy for r=0 ? Justify your answer, by explaining intuitively why the value of r leads to this policy. (ii) What is the optimal policy for r=+3 ? Justify your answer, by explaining intuitively why the value of r leads to this policy
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started