[Solved] Pacman Agent aims to efficiently reach th

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 14, 2024

Pacman Agent aims to efficiently reach the Exit in the shortest way possible and tries to ea maximum number of Green food pellets and minimum

Pacman Agent aims to efficiently reach the "Exit" in the shortest way possible and tries to ea maximum number of Green food pellets and minimum number of Red food pellets possible (though Pacman may eat both types of food pellets while exploring). Pacman has four actions moveNorth, moveSouth, moveWest, or moveEast and it does not have a "stay" action. Eack action incurs a cost of -2 for the agent. In addition to this, if an agent reaches a cell with a pelle it automatically consumes it and eating a red pellet and green pellet incurs a reward of -5 anc +3 respectively. In below fixed board configuration of dimension NM, there are 3 red pellet: and 6 green pellets. 12345 1 2 3 4 5 a. Construct partially filled Q-Table, Reward table and Transition table. b. Apply the reinforcement learning with initial Q-Table initialized to value = " 1 ", learning rate =0.8 and discount factor =0.5 for the sequence of action listed below. It's mandatory to show the update of Q-table at the end of every iteration. Iteration 1: moveEast moveWest (Starts the next iteration 2 again from the same position (row, column) =(2,3) ) Iteration 2: moveEast c. If the discount factor is set as 0 and learning rate as 1 , how the expected behavior the agent is learnt? Explain with the given above use case. Pacman Agent aims to efficiently reach the "Exit" in the shortest way possible and tries to ea maximum number of Green food pellets and minimum number of Red food pellets possible (though Pacman may eat both types of food pellets while exploring). Pacman has four actions moveNorth, moveSouth, moveWest, or moveEast and it does not have a "stay" action. Eack action incurs a cost of -2 for the agent. In addition to this, if an agent reaches a cell with a pelle it automatically consumes it and eating a red pellet and green pellet incurs a reward of -5 anc +3 respectively. In below fixed board configuration of dimension NM, there are 3 red pellet: and 6 green pellets. 12345 1 2 3 4 5 a. Construct partially filled Q-Table, Reward table and Transition table. b. Apply the reinforcement learning with initial Q-Table initialized to value = " 1 ", learning rate =0.8 and discount factor =0.5 for the sequence of action listed below. It's mandatory to show the update of Q-table at the end of every iteration. Iteration 1: moveEast moveWest (Starts the next iteration 2 again from the same position (row, column) =(2,3) ) Iteration 2: moveEast c. If the discount factor is set as 0 and learning rate as 1 , how the expected behavior the agent is learnt? Explain with the given above use case

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Health Care Marketing Audit A Complete Guide

Authors: Gerardus Blokdyk

2020 Edition

ISBN: 0655947469, 978-0655947462

More Books

Students also viewed these Accounting questions

Question

★★★★★

The New York Stock Exchange requires all companies traded on it to utilize internal auditors. Commonly, companies do not directly hire their own internal auditors. Rather, internal auditors often are...

Answered: 1 week ago

Question

★★★★★

7. We say that X is stochastically larger than Y, written X Y, if for all 1, $$P(X> r) \ge P(Y > 1)$$ Show that if X Y, then E[X] [Y] when (a) X and Y are nonnegative random variables; (b) X and Y...

Answered: 1 week ago

Question

★★★★★

Salza Technology Corporation increased its sales from $375,000 in 2009 to $450,000 in 2010 as shown in the firms income statements presented below. LeAnn Sands, chief executive officer and founder of...

Answered: 1 week ago

Question

★★★★★

Selected information taken from the financial statements of Verbeke Co. for the year ended December 31, 2019, follows: Gross profit $411,000 General and administrative 84,000 expenses Net cash used...

Answered: 1 week ago

Question

★★★★★

Garden Yetl manufactures garden sculptures. Each sculpture requires 9 pounds of direct materials at a cost of $3 per pound and 0.4 direct labor hour at a rate of $16 per hour. Varlable overhead is...

Answered: 1 week ago

Question

★★★★★

To solve p + 3q = 5z + tan( y - 3x)

Answered: 1 week ago

Question

★★★★★

Determine the points on the curve y = x 2 4x + 4 where the tangent is horizontal.

Answered: 1 week ago

Question

★★★★★

Find the equations of both the tangent and the normal to the curve y = x 3 6x 2 + 11x 6 at x = 2.

Answered: 1 week ago

Question

★★★★★

Question: In Java, what is the impact of marking a method with the synchronized keyword? A) It makes the method execute faster by prioritizing thread execution. B) It allows only one thread to...

Answered: 1 week ago

Question

★★★★★

Give examples of information to list on your Individual Profile inventory. (Objective 1)

Answered: 1 week ago

Question

★★★★★

Explain why you should complete an inventory of your qualifications and job preferences before beginning your job search. (Objective 1)

Answered: 1 week ago

Question

★★★★★

You have been invited to return to your high school and speak with business students about your school and your major.Prepare the opening to your presentation; include an anecdote or a quote. As your...

Answered: 1 week ago

Previous Question Next Question