Question

1 Approved Answer

Posted on Jul 27, 2024

solve and explain please cheak this is a COE topics HW Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F}

solve and explain please

cheak this is a COE topics HW

Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts]