Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Math: An alternative learning algorithm [ 1 0 points ] Consider a learning algorithm which at - tempts to learn a Q - function, but

Math: An alternative learning algorithm [10 points] Consider a learning algorithm which at-
tempts to learn a Q-function, but instead of using the usual Q-learning target R+maxaQ(s',a),
it uses as target a mixture of
R+((1-)maxaQ(s',a)+a?(s',a)Q(s',a))
where in(0,1) is a hyper-parameter.
Assume that is an lon-greedy policy derived from Q, and the episodes used for training are collected
using only.
(a)[5 points] Recall that an on-policy control algorithm estimates q(s,a) for the current be-
haviour policy and for all states s and actions a. Is this algorithm on-policy or off-policy?
Justify your answer.
(b)[5 points] For different values of , how would you expect this algorithm to perform com-
pared to Q-learning and SARSA? Include bias, variance, and maximization bias in your
discussion.
(c)[5 points] Bonus question: try this algorithm on the Taxi Problem in Question 1, and compare
it to the other algorithms. Are the results consistent with your hypothesis?
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Students also viewed these Databases questions