Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Markov Decision Process: You are given the Gridworld shown in the figure below. Assume a known Markov Decision Process (MDP) as follows: In all states,

Markov Decision Process: image text in transcribed

You are given the Gridworld shown in the figure below. Assume a known Markov Decision Process (MDP) as follows: In all states, your agent can perform 4 actions: Up, Down, Left, Right; with a living reward of -1. In squares D & E your agent can ONLY take the Exit action with an immediate reward of -10 & +10 respectively. Your agent's actions are successful 90% of the time; 10% of the time the agent moves in one of the two orthogonal (perpendicular) directions, with equal probability (5%). If the movement is blocked by a wall (the blue squares and the outer edges), the agent stays in place. For example, if your agent is in square B, and picks the Right action, 90% of the time, it ends up in C, and 10% of the time it ends up in the same square, B. Assume a discount factor of 0.9. ( = 0.9). Input Policy n A B| C|D E Given a policy, a, as specified by the arrows, evaluate it for two it- erations using the Policy Evaluation algorithm - fill in the values in the table below - note that the first iteration has already been done for you: C D E A VT -1 -10 +10 -1 V V3" You are given the Gridworld shown in the figure below. Assume a known Markov Decision Process (MDP) as follows: In all states, your agent can perform 4 actions: Up, Down, Left, Right; with a living reward of -1. In squares D & E your agent can ONLY take the Exit action with an immediate reward of -10 & +10 respectively. Your agent's actions are successful 90% of the time; 10% of the time the agent moves in one of the two orthogonal (perpendicular) directions, with equal probability (5%). If the movement is blocked by a wall (the blue squares and the outer edges), the agent stays in place. For example, if your agent is in square B, and picks the Right action, 90% of the time, it ends up in C, and 10% of the time it ends up in the same square, B. Assume a discount factor of 0.9. ( = 0.9). Input Policy n A B| C|D E Given a policy, a, as specified by the arrows, evaluate it for two it- erations using the Policy Evaluation algorithm - fill in the values in the table below - note that the first iteration has already been done for you: C D E A VT -1 -10 +10 -1 V V3

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Economics In Minutes 200 Key Concepts Explained In An Instant

Authors: Niall Kishtainy

1st Edition

1782066470, 9781782066477

More Books

Students also viewed these Accounting questions

Question

What was the influence of the strength of the treatment?

Answered: 1 week ago

Question

Define indirect financial compensation (employee benefits).

Answered: 1 week ago