Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Markov Decision Process: You are given the Gridworld shown in the figure below. Assume a known Markov Decision Process (MDP) as follows: In all states,
Markov Decision Process:
You are given the Gridworld shown in the figure below. Assume a known Markov Decision Process (MDP) as follows: In all states, your agent can perform 4 actions: Up, Down, Left, Right; with a living reward of -1. In squares D & E your agent can ONLY take the Exit action with an immediate reward of -10 & +10 respectively. Your agent's actions are successful 90% of the time; 10% of the time the agent moves in one of the two orthogonal (perpendicular) directions, with equal probability (5%). If the movement is blocked by a wall (the blue squares and the outer edges), the agent stays in place. For example, if your agent is in square B, and picks the Right action, 90% of the time, it ends up in C, and 10% of the time it ends up in the same square, B. Assume a discount factor of 0.9. ( = 0.9). Input Policy n A B| C|D E Given a policy, a, as specified by the arrows, evaluate it for two it- erations using the Policy Evaluation algorithm - fill in the values in the table below - note that the first iteration has already been done for you: C D E A VT -1 -10 +10 -1 V V3" You are given the Gridworld shown in the figure below. Assume a known Markov Decision Process (MDP) as follows: In all states, your agent can perform 4 actions: Up, Down, Left, Right; with a living reward of -1. In squares D & E your agent can ONLY take the Exit action with an immediate reward of -10 & +10 respectively. Your agent's actions are successful 90% of the time; 10% of the time the agent moves in one of the two orthogonal (perpendicular) directions, with equal probability (5%). If the movement is blocked by a wall (the blue squares and the outer edges), the agent stays in place. For example, if your agent is in square B, and picks the Right action, 90% of the time, it ends up in C, and 10% of the time it ends up in the same square, B. Assume a discount factor of 0.9. ( = 0.9). Input Policy n A B| C|D E Given a policy, a, as specified by the arrows, evaluate it for two it- erations using the Policy Evaluation algorithm - fill in the values in the table below - note that the first iteration has already been done for you: C D E A VT -1 -10 +10 -1 V V3Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started