Off-policy learning, such as Q-learning, learns the value of the optimal policy. On-policy learning, such as SARSA,
Question:
• Off-policy learning, such as Q-learning, learns the value of the optimal policy. On-policy learning, such as SARSA, learns the value of the policy the agent is actually carrying out (which includes the exploration).
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Artificial Intelligence Foundations Of Computational Agents
ISBN: 9781107195394
2nd Edition
Authors: David L. Poole, Alan K. Mackworth
Question Posted: