Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Q - Learning Let's simulate the Q - learning algorithm! Assume there are states 0 , 1 , 2 , 3 and actions ( b

Q-Learning
Let's simulate the Q-learning algorithm! Assume there are states 0,1,2,3 and actions (b','c), and discount factor =0.9. Furthermore, assume that all the Q values are initialized to 0 and that the learning rate =0.5.
Each row, t, in the table represents a record of experience at time t : (st,at,rt).
In each row t, indicate what update Q(st,at)larrq will be made by the Q learning algorithm based on (st,at,rt,st+1). Note that st+1 is on the next row (you might need to look ahead to the next part of the problem to see that next state value.) You will want to keep track of the overall table Q(st,at) as these updates take place, spanning the multiple parts of this question.
As a reminder, the Q-learning update formula is the following:
Q(s,a)=(1-)Q(s,a)+(r+maxa'Q(s',a'))
You are welcome to do this problem by hand, though writing a small program to solve may be a good idea. To help with that, here is a variable with the history of experience:
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro PowerShell For Database Developers

Authors: Bryan P Cafferky

1st Edition

1484205413, 9781484205419

More Books

Students also viewed these Databases questions