Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Consider the infinite MDP with discount factor 1 illustrated in Figure 1 . It consists of 3 states, and rewards are given upon taking an

Consider the infinite MDP with discount factor 1 illustrated in Figure 1. It consists of 3 states,
and rewards are given upon taking an action from the state. From state s0, action a1 has zero
immediate reward and causes a deterministic transition to state s1 where there is reward +1 for
every time step afterwards (regardless of action). From state s0, action a2 causes a deterministic
transition to state s2 with immediate reward of 21- but state s2 has zero reward for every
time step afterwards (regardless of action).
Figure 1: infinite 3-state MDP
(a) What is the total discounted return (t=0trt) of taking action a1 from state s0 at time step
t=0?[5pts]
(b) What is the total discounted return (t=0trt) of taking action a2 from state s0 at time step
t=0? What is the optimal action? [5 pts]
(c) Assume we initialize value of each state to zero, (i.e. at iteration n=0,AAs:Vn=0(s)=0).
Show that value iteration continues to choose the sub-optimal action until iteration n** where,
n**log(1-)log12log(11-)11-
Thus, value iteration has a running time that grows faster than 11-.(You just need to
show the first inequality)[10 pts]
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions