Question
For the environment to the right, the agent tried 6 episodes from the start state A to one of the terminal states (C, D, and
For the environment to the right, the agent tried 6 episodes from the start state A to one of the terminal states (C, D, and E), which are listed below:
Episode #1: state = A, action = R, new state = C, reward = +10 Episode #2: state = A, action = L, new state = B, reward = 0 state = B, action = R, new state = E, reward = 1000 Episode #3: state = A, action = L, new state = B, reward = 0 state = B, action = L, new state = D, reward = +200 Episode #4: state = A, action = L, new state = B, reward = 0 state = B, action = R, new state = E, reward = 100 Episode #5: state = A, action = R, new state = C, reward = +25 Episode #6: state = A, action = L, new state = B, reward = 0 state = B, action = L, new state = D, reward = +400
Your task is to build the Q-table from these results. The Q-table has two states and two actions per state. Use learning rate = 0.5 and discount factor = 1. All entries of the Q-table are zero initially.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started