Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For the environment to the right, the agent tried 6 episodes from the start state A to one of the terminal states (C, D, and

For the environment to the right, the agent tried 6 episodes from the start state A to one of the terminal states (C, D, and E), which are listed below:

Episode #1: state = A, action = R, new state = C, reward = +10 Episode #2: state = A, action = L, new state = B, reward = 0 state = B, action = R, new state = E, reward = 1000 Episode #3: state = A, action = L, new state = B, reward = 0 state = B, action = L, new state = D, reward = +200 Episode #4: state = A, action = L, new state = B, reward = 0 state = B, action = R, new state = E, reward = 100 Episode #5: state = A, action = R, new state = C, reward = +25 Episode #6: state = A, action = L, new state = B, reward = 0 state = B, action = L, new state = D, reward = +400

Your task is to build the Q-table from these results. The Q-table has two states and two actions per state. Use learning rate = 0.5 and discount factor = 1. All entries of the Q-table are zero initially.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database 101

Authors: Guy Kawasaki

1st Edition

0938151525, 978-0938151524

Students also viewed these Databases questions

Question

How wide are Salary Structure Ranges?

Answered: 1 week ago