Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

4 - Assuming that all Q - values are initialized to 0 , what are the Q - values for the following state - action

4- Assuming that all Q-values are initialized to 0, what are the Q-values for the following state-action pairs after running [tabular]
Q-learning for the first episode? [skip/disregard episodes 2 and 3]. Use discount factor =0.8 and learning rate =0.6
Q(A, Down)
Q(B,Up)
Hint: Use the following equations and update Q values after each transition until the end of episode 1.
Consider your new sample estimate
target =R(s,a,s')+maxa'hat(Q)(s',a')
Incorporate the new estimate into a running average
hat(Q)(s,a)larr(1-)hat(Q)(s,a)+()[ target ]
5- Repeat part 4 if you run SARSA (temporal difference) with the above experience sequence (again assume that all Q-values
are initialized to 0 and use only episode 1)? Use discount factor =0.8 and learning rate =0.6
Hint: Use the following equations and update Q values after each transition until the end of episode 1.
Sample of hat(Q)(s,a):, target =R(s,a,s')+hat(Q)(s',a')
Update hat(Q)(s,a):,hat(Q)(s,a)larr(1-)hat(Q)(s,a)+ target
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

List the procedures for testing priced inventory listings.

Answered: 1 week ago