Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Utility, Policy, and Their Calculation Consider the 4 x 3 environment discussed in the lecture. Let a be the following policy Right if you can;
Utility, Policy, and Their Calculation Consider the 4 x 3 environment discussed in the lecture. Let a be the following policy Right if you can; Else, UP if you can; Otherwise, Left; For example, 7(1,1) = Right, (1, 2) = Up, and (4,1) = Left(???) Assume that the discount factor y = 1 and the transition is deterministic -1 > i.e. P('|s, a) is either 0 or 1. E.g., P((2,1)|(1,1), Right) = 1, while P(1, 2)|(1,1), Right) = 0 1 2 3 Q.3) Value Iteration / 10 Calculate U" (s) for every s (excluding (4, 1)) using the Bellman Equation and the reward function discussed in class. U*(s) = R(s) + P(s'|s, 7(8))U"(8') (For example, U"(3, 3) = 1 and U*(3, 2) = -1.) 5 Q.4) Policy Iteration What would (1,1) be if using the U* calculated in Q.3), one step of the following policy update rule is applied on (1, 1); (8) +- arg max (R(s, a) + P(s'|s, a)U"(8') GEAC) where A(8) is the set of actions available to the state s. P(165,0a) (m) Utility, Policy, and Their Calculation Consider the 4 x 3 environment discussed in the lecture. Let a be the following policy Right if you can; Else, UP if you can; Otherwise, Left; For example, 7(1,1) = Right, (1, 2) = Up, and (4,1) = Left(???) Assume that the discount factor y = 1 and the transition is deterministic -1 > i.e. P('|s, a) is either 0 or 1. E.g., P((2,1)|(1,1), Right) = 1, while P(1, 2)|(1,1), Right) = 0 1 2 3 Q.3) Value Iteration / 10 Calculate U" (s) for every s (excluding (4, 1)) using the Bellman Equation and the reward function discussed in class. U*(s) = R(s) + P(s'|s, 7(8))U"(8') (For example, U"(3, 3) = 1 and U*(3, 2) = -1.) 5 Q.4) Policy Iteration What would (1,1) be if using the U* calculated in Q.3), one step of the following policy update rule is applied on (1, 1); (8) +- arg max (R(s, a) + P(s'|s, a)U"(8') GEAC) where A(8) is the set of actions available to the state s. P(165,0a) (m)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started