Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Pacman Agent aims to efficiently reach the Exit in the shortest way possible and tries to ea maximum number of Green food pellets and minimum
Pacman Agent aims to efficiently reach the "Exit" in the shortest way possible and tries to ea maximum number of Green food pellets and minimum number of Red food pellets possible (though Pacman may eat both types of food pellets while exploring). Pacman has four actions moveNorth, moveSouth, moveWest, or moveEast and it does not have a "stay" action. Eack action incurs a cost of -2 for the agent. In addition to this, if an agent reaches a cell with a pelle it automatically consumes it and eating a red pellet and green pellet incurs a reward of -5 anc +3 respectively. In below fixed board configuration of dimension NM, there are 3 red pellet: and 6 green pellets. 12345 1 2 3 4 5 a. Construct partially filled Q-Table, Reward table and Transition table. b. Apply the reinforcement learning with initial Q-Table initialized to value = " 1 ", learning rate =0.8 and discount factor =0.5 for the sequence of action listed below. It's mandatory to show the update of Q-table at the end of every iteration. Iteration 1: moveEast moveWest (Starts the next iteration 2 again from the same position (row, column) =(2,3) ) Iteration 2: moveEast c. If the discount factor is set as 0 and learning rate as 1 , how the expected behavior the agent is learnt? Explain with the given above use case. Pacman Agent aims to efficiently reach the "Exit" in the shortest way possible and tries to ea maximum number of Green food pellets and minimum number of Red food pellets possible (though Pacman may eat both types of food pellets while exploring). Pacman has four actions moveNorth, moveSouth, moveWest, or moveEast and it does not have a "stay" action. Eack action incurs a cost of -2 for the agent. In addition to this, if an agent reaches a cell with a pelle it automatically consumes it and eating a red pellet and green pellet incurs a reward of -5 anc +3 respectively. In below fixed board configuration of dimension NM, there are 3 red pellet: and 6 green pellets. 12345 1 2 3 4 5 a. Construct partially filled Q-Table, Reward table and Transition table. b. Apply the reinforcement learning with initial Q-Table initialized to value = " 1 ", learning rate =0.8 and discount factor =0.5 for the sequence of action listed below. It's mandatory to show the update of Q-table at the end of every iteration. Iteration 1: moveEast moveWest (Starts the next iteration 2 again from the same position (row, column) =(2,3) ) Iteration 2: moveEast c. If the discount factor is set as 0 and learning rate as 1 , how the expected behavior the agent is learnt? Explain with the given above use case
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started