Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

10 marks Question 2) Reinforcement Learning Consider the following environment of PacMan 6,4 0,0 For the environment design a Reinforcement Learning Agent (Pacman), the objective

image text in transcribed

10 marks Question 2) Reinforcement Learning Consider the following environment of PacMan 6,4 0,0 For the environment design a Reinforcement Learning Agent (Pacman), the objective of the agent is to figure out the best actions the agent can take at any given state. The rules of the game are as follows: Every move has a reward of -1 Consuming a food pellet will have a reward of +10 If pacman collides with a ghost, then the reward will be - 500 If the pacman has eaten all the food pellets without colliding with the ghosts, then the reward will be +500 Assume a discount factor of 0.8 The action noise is 0.3 (the consequences are the same as in the grid world example) The environment is static i.e. no ghosts are moving The actions for pacman are Up, Down, North and Right You can cross the walls Use Q-Learning to figure out the best action at every state. Show your working for every iteration of Q-Learning. 10 marks Question 2) Reinforcement Learning Consider the following environment of PacMan 6,4 0,0 For the environment design a Reinforcement Learning Agent (Pacman), the objective of the agent is to figure out the best actions the agent can take at any given state. The rules of the game are as follows: Every move has a reward of -1 Consuming a food pellet will have a reward of +10 If pacman collides with a ghost, then the reward will be - 500 If the pacman has eaten all the food pellets without colliding with the ghosts, then the reward will be +500 Assume a discount factor of 0.8 The action noise is 0.3 (the consequences are the same as in the grid world example) The environment is static i.e. no ghosts are moving The actions for pacman are Up, Down, North and Right You can cross the walls Use Q-Learning to figure out the best action at every state. Show your working for every iteration of Q-Learning

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Harness The Power Of Big Data The IBM Big Data Platform

Harness The Power Of Big Data The IBM Big Data Platform

Authors: Paul Zikopoulos, David Corrigan James Giles Thomas Deutsch Krishnan Parasuraman Dirk DeRoos Paul Zikopoulos

1st Edition

0071808183, 9780071808187

More Books

Students also viewed these Databases questions

Question

★★★★★

Suppose that r chips are drawn with replacement from an urn containing n chips, numbered 1 through n. Let V denote the sum of the numbers drawn. Find E(V).

Answered: 1 week ago

Question

★★★★★

Piekarski Corporation had the following transactions. 1. Issued $200,000 of bonds payable. 2. Paid utilities expense. 3. Issued 500 shares of preferred stock for $45,000. 4. Sold land and a building...

Answered: 1 week ago

Question

★★★★★

2. Do the employees know how to perform effectively? Perhaps they received little or no previous training or the training was ineffective. (This problem is a characteristic of the person.)

Answered: 1 week ago

Question

★★★★★

Nex Company uses both special journals and a general journal as described in this chapter. On June 30, after all monthly postings had been completed, the Accounts Receivable control account in the...

Answered: 1 week ago

Question

★★★★★

10 marks Question 2) Reinforcement Learning Consider the following environment of PacMan 6,4 0,0 For the environment design a Reinforcement Learning Agent (Pacman), the objective of the agent is to...

Answered: 1 week ago

Question

★★★★★

/** * Merge into this sorted sequence elements from an array sorted with the same comparator. * This code should not use any additional space arrays, other than ensuring sufficient space. * It...

Answered: 1 week ago

Question

★★★★★

Your graduating class has decided to endow a chair at Stern for a worthy young assistant professor of finance. The University suggests an endowment that generates $100,000 a year forever. The...

Answered: 1 week ago

Question

★★★★★

You plan to borrow $2,000 to take a vacation and want to repay the loan in a year. The banker offers you a simple interest rate of 12 percent with repayments in two equal installments, 6 months and...

Answered: 1 week ago

Question

★★★★★

Consider a commercial transport with the following parameters: wing span (b) reference wing area (Sref) 61 m 325 m 200,000 kg mass Mach number 0.82 zero lift (parasitic) drag coefficient (Cpp) 0.01...

Answered: 1 week ago

Question

★★★★★

Substantive or Procedural State or Federal Civil or Criminal a)In State v. Kenney, (May 16, 2014) the Kansas Supreme Court ruled that when (1) an attorney incorrectly explains the appeal rights the...

Answered: 1 week ago

Question

★★★★★

A content strategy is a tactical plan for how you will achieve your goals with your socials. What are the three key components of a content strategy

Answered: 1 week ago

Question

★★★★★

What are the Variable columns settings available in the Mining Models Tab?

Answered: 1 week ago

Question

★★★★★

What does the Mining Content Viewer in Visual Studio show in terms of Probabilities?

Answered: 1 week ago

Question

★★★★★

How are continuous variables normally handled in Decision Tree Algorithms?

Answered: 1 week ago

Previous Question Next Question