Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Pacman Agent aims to efficiently reach the Exit in the shortest way possible and tries to ea maximum number of Green food pellets and minimum

image text in transcribed

Pacman Agent aims to efficiently reach the "Exit" in the shortest way possible and tries to ea maximum number of Green food pellets and minimum number of Red food pellets possible (though Pacman may eat both types of food pellets while exploring). Pacman has four actions moveNorth, moveSouth, moveWest, or moveEast and it does not have a "stay" action. Eack action incurs a cost of -2 for the agent. In addition to this, if an agent reaches a cell with a pelle it automatically consumes it and eating a red pellet and green pellet incurs a reward of -5 anc +3 respectively. In below fixed board configuration of dimension NM, there are 3 red pellet: and 6 green pellets. 12345 1 2 3 4 5 a. Construct partially filled Q-Table, Reward table and Transition table. b. Apply the reinforcement learning with initial Q-Table initialized to value = " 1 ", learning rate =0.8 and discount factor =0.5 for the sequence of action listed below. It's mandatory to show the update of Q-table at the end of every iteration. Iteration 1: moveEast moveWest (Starts the next iteration 2 again from the same position (row, column) =(2,3) ) Iteration 2: moveEast c. If the discount factor is set as 0 and learning rate as 1 , how the expected behavior the agent is learnt? Explain with the given above use case. Pacman Agent aims to efficiently reach the "Exit" in the shortest way possible and tries to ea maximum number of Green food pellets and minimum number of Red food pellets possible (though Pacman may eat both types of food pellets while exploring). Pacman has four actions moveNorth, moveSouth, moveWest, or moveEast and it does not have a "stay" action. Eack action incurs a cost of -2 for the agent. In addition to this, if an agent reaches a cell with a pelle it automatically consumes it and eating a red pellet and green pellet incurs a reward of -5 anc +3 respectively. In below fixed board configuration of dimension NM, there are 3 red pellet: and 6 green pellets. 12345 1 2 3 4 5 a. Construct partially filled Q-Table, Reward table and Transition table. b. Apply the reinforcement learning with initial Q-Table initialized to value = " 1 ", learning rate =0.8 and discount factor =0.5 for the sequence of action listed below. It's mandatory to show the update of Q-table at the end of every iteration. Iteration 1: moveEast moveWest (Starts the next iteration 2 again from the same position (row, column) =(2,3) ) Iteration 2: moveEast c. If the discount factor is set as 0 and learning rate as 1 , how the expected behavior the agent is learnt? Explain with the given above use case

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Accounting questions

Question

To solve p + 3q = 5z + tan( y - 3x)

Answered: 1 week ago