Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Utility, Policy, and Their Calculation Consider the 4 x 3 environment discussed in the lecture. Let a be the following policy Right if you can;

image text in transcribed

Utility, Policy, and Their Calculation Consider the 4 x 3 environment discussed in the lecture. Let a be the following policy Right if you can; Else, UP if you can; Otherwise, Left; For example, 7(1,1) = Right, (1, 2) = Up, and (4,1) = Left(???) Assume that the discount factor y = 1 and the transition is deterministic -1 > i.e. P('|s, a) is either 0 or 1. E.g., P((2,1)|(1,1), Right) = 1, while P(1, 2)|(1,1), Right) = 0 1 2 3 Q.3) Value Iteration / 10 Calculate U" (s) for every s (excluding (4, 1)) using the Bellman Equation and the reward function discussed in class. U*(s) = R(s) + P(s'|s, 7(8))U"(8') (For example, U"(3, 3) = 1 and U*(3, 2) = -1.) 5 Q.4) Policy Iteration What would (1,1) be if using the U* calculated in Q.3), one step of the following policy update rule is applied on (1, 1); (8) +- arg max (R(s, a) + P(s'|s, a)U"(8') GEAC) where A(8) is the set of actions available to the state s. P(165,0a) (m) Utility, Policy, and Their Calculation Consider the 4 x 3 environment discussed in the lecture. Let a be the following policy Right if you can; Else, UP if you can; Otherwise, Left; For example, 7(1,1) = Right, (1, 2) = Up, and (4,1) = Left(???) Assume that the discount factor y = 1 and the transition is deterministic -1 > i.e. P('|s, a) is either 0 or 1. E.g., P((2,1)|(1,1), Right) = 1, while P(1, 2)|(1,1), Right) = 0 1 2 3 Q.3) Value Iteration / 10 Calculate U" (s) for every s (excluding (4, 1)) using the Bellman Equation and the reward function discussed in class. U*(s) = R(s) + P(s'|s, 7(8))U"(8') (For example, U"(3, 3) = 1 and U*(3, 2) = -1.) 5 Q.4) Policy Iteration What would (1,1) be if using the U* calculated in Q.3), one step of the following policy update rule is applied on (1, 1); (8) +- arg max (R(s, a) + P(s'|s, a)U"(8') GEAC) where A(8) is the set of actions available to the state s. P(165,0a) (m)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Making Databases Work The Pragmatic Wisdom Of Michael Stonebraker

Authors: Michael L. Brodie

1st Edition

1947487167, 978-1947487161

More Books

Students also viewed these Databases questions

Question

What are Measures in OLAP Cubes?

Answered: 1 week ago

Question

How do OLAP Databases provide for Drilling Down into data?

Answered: 1 week ago

Question

How are OLAP Cubes different from Production Relational Databases?

Answered: 1 week ago