Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step
Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step 1: Start-Si, Action = al, Reward =-10. End Step 2: Start-Si, Action-a2, Reward =-10. End-S2 Step 3: Start-S2, Action-ai, Reward = +20. End-Si Step 4: Start-Si, Action-a2, Reward--10. End-S2 1. Perform Q-learning. The discount factor is = 0.5 and the learning rate is = 0.5. Assume that your all Q values are initialized to 0. 2. What is the policy that Q-learning has learned at this point? Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step 1: Start-Si, Action = al, Reward =-10. End Step 2: Start-Si, Action-a2, Reward =-10. End-S2 Step 3: Start-S2, Action-ai, Reward = +20. End-Si Step 4: Start-Si, Action-a2, Reward--10. End-S2 1. Perform Q-learning. The discount factor is = 0.5 and the learning rate is = 0.5. Assume that your all Q values are initialized to 0. 2. What is the policy that Q-learning has learned at this point
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started