Answered step by step
Verified Expert Solution
Link Copied!

Question

00
1 Approved Answer

Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step

image text in transcribed

Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step 1: Start-Si, Action = al, Reward =-10. End Step 2: Start-Si, Action-a2, Reward =-10. End-S2 Step 3: Start-S2, Action-ai, Reward = +20. End-Si Step 4: Start-Si, Action-a2, Reward--10. End-S2 1. Perform Q-learning. The discount factor is = 0.5 and the learning rate is = 0.5. Assume that your all Q values are initialized to 0. 2. What is the policy that Q-learning has learned at this point? Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step 1: Start-Si, Action = al, Reward =-10. End Step 2: Start-Si, Action-a2, Reward =-10. End-S2 Step 3: Start-S2, Action-ai, Reward = +20. End-Si Step 4: Start-Si, Action-a2, Reward--10. End-S2 1. Perform Q-learning. The discount factor is = 0.5 and the learning rate is = 0.5. Assume that your all Q values are initialized to 0. 2. What is the policy that Q-learning has learned at this point

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions