Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Alice is taking CS 2 3 4 and has just learned about the Q - values. She is trying to explore a large nitehorizon MDP

Alice is taking CS234 and has just learned about the Q-values. She is trying to explore a large nitehorizon MDP with \gamma =1. The transitions are deterministic and QH+1(s, a)=0 for all s, a. To
help her with her MDP you tell her the optimal policy \pi
(s, t), dened in every state s and timestep
t, that Alice should follow to maximize her reward. Denote with Q
t
(s, a) the Q-values of the optimal
policy upon taking action a in state s at timestep t.
A) First Step Error
In the rst timestep t =1 Alice is in state s1 and chooses action a, which is suboptimal. If she then
follows the optimal policy from t =2 until the end of the episode, what is the value of this policy
compared to the optimal one? Express your result only using Q
1
(s1,).

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions