Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Question 2 (RL) [50 points - each part 12.5 points]: Consider the following grid world with five different states. The actions are move east, west,
Question 2 (RL) [50 points - each part 12.5 points]: Consider the following grid world with five different states. The actions are move east, west, south, north, and exit if it is in a terminal state. (a) We would like to use Model-based learning using the following four observations. What is the estimated Transition and reward based on these observations? (b) Implement direct evaluation as a model-free based learning based on those four observations and calculate the value states for each state. Assume =0.9. (c) We would like to use TD learning and Q-learning to find the values of these states. Suppose that we have the following observed transitions (s,a,s,r) : (B, East, C,3), (C, South, E, 3), (C, East, E,4) , (D, West, C,1), (A,South,C,3) The initial value of each state is 0 . Assume that =0.9 and =0.4. What are the learned values from TD learning after all five observations? Show the process of computing these values. (d) What are the learned Q-values from Q-learning after all five observations? Show the process of computing these values
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started