Question
Consider applying the Q learning algorithm to the same grid world as in Problem 1. Assume that the table of q values is initialized to
Consider applying the Q learning algorithm to the same grid world as in Problem 1. Assume that the table of q values is initialized to 0. Assume the agent begins in State S7 and then travels clockwise around the perimeter of the grid until it reaches the absorbing goal state, completing the first training episode. Assume that = 0.8 and that = 1.
(a) Determine which q(, ) values are modified as a result of this episode, and give their revised values.
(b) Assume that the agent now performs a second identical episode. Determine which q(, ) values are modified as a result of this episode, and give their revised values.
(c) Assume that the agent now performs a third identical episode. Determine which q(, ) values are modified as a result of this episode, and give their revised values.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started