Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Math: An alternative learning algorithm [ 1 0 points ] Consider a learning algorithm which at - tempts to learn a Q - function, but
Math: An alternative learning algorithm points Consider a learning algorithm which at
tempts to learn a Qfunction, but instead of using the usual Qlearning target
it uses as target a mixture of
where is a hyperparameter.
Assume that is an greedy policy derived from and the episodes used for training are collected
using only.
a points Recall that an onpolicy control algorithm estimates for the current be
haviour policy and for all states and actions Is this algorithm onpolicy or offpolicy?
Justify your answer.
b points For different values of how would you expect this algorithm to perform com
pared to Qlearning and SARSA? Include bias, variance, and maximization bias in your
discussion.
c points Bonus question: try this algorithm on the Taxi Problem in Question and compare
it to the other algorithms. Are the results consistent with your hypothesis?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started