Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

q 1 . Consider the following MDP , in which all of the transitions are deterministic. States: s 0 , s 1 , s 2

q1. Consider the following MDP, in which all of the transitions are deterministic.
States: s0,s1,s2
Actions: [a0, a1]
Transitions: [(s0,a0,s0),(s0,a1,s1),(s1,a0,s0),(s1,a1,s2),(s2,a0,s2),(s2,a1,s2)]
Rewards: R(s0,a0)=1,R(s0,a1)=-1,R(s1,a0)=2,R(s1,a1)=-1,R(s2,a0)=0,R(s2,a1)=4
We have the following policy that maps states to actions:
?PI(s0)=a1
?PI(s1)=a1
?PI(s2)=a1
The policy will be executed from state s0. Which technique is most appropriate to calculate the reward that will be gained?
Select one:
a. Policy Iteration
b. Value Iteration
c. Policy Evaluation q2. The value of each state is initially set to 0.
V(s)=0, for all s
Apply a single iteration of the Bellman Backup with discount factor gamma=0.5 to update the estimated value of each state under the policy from question 1.
What is the estimated value of state s0? q3. Perform a second iteration to improve the value estimates.
What is the new estimated value of state s1? q4. True or false: If the only difference between two MDPs is the value of the discount factor then they must have the same optimal policy. q5. True or false: For an infinite-horizon MDP with a finite number of states and actions, and discount factor (0 gamma 1), value iteration is guaranteed to converge.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke, David J. Auer

3rd Edition

0131986252, 978-0131986251

More Books

Students also viewed these Databases questions

Question

What do Dimensions represent in OLAP Cubes?

Answered: 1 week ago