Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

We would like to use a Q - learning agent for Pacman, but the size of the state space for a large grid is too

We would like to use a Q

-

learning agent for Pacman, but the size of the state

space for a large grid is too massive to hold in memory. To solve this, we will

switch to feature

-

based representation of Pacman

s state.

1 .

We will have two features, Fg and Fp

,

defined as follows:

(

,

) =

(

) +

(

,

) +

(

,

)

(

,

) =

(

) + 2

(

,

)

where

(

) =

number of ghosts within

1

step of state s

(

,

) =

number of ghosts Pacman touches after taking action a from state s

(

,

) =

number of ghosts within

1

step of the state Pacman ends up in after taking action a

(

) =

number of food pellets within

1

step of state s

(

,

) =

number of food pellets eaten after taking action a from state s

For this pacman board, the ghosts will always be stationary, and the action

space is

{

lef t

,

right, up

,

down, stay

} .

Calculate the features for the actions in

{

lef t

,

right, up

,

stay

}

from the current state.

2 .

After a few episodes of Q

-

learning, the weights are wg

= 10 (1)

and wp

= 100 + (3) .

Calculate the Q value for each action in

{

lef t

,

right, up

,

stay

}

from the current state.

3 .

We observe a transition that starts from the state above, s

,

takes action up

,

ends in state s

(

the state with the food pellet above

)

and receives a reward R

(

,

,

) = 250 .

The available actions from state s

are down and stay. Assuming a discount of

\

gamma

= 0.5,

calculate the new estimate of the Q value for s based on this episode.

4 .

With this new estimate and a learning rate

\

alpha

= 0.5,

update the weights

for each feature.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2010 Barcelona Spain September 2010 Proceedings Part 2 Lnai 6322

Authors: Jose L. Balcazar ,Francesco Bonchi ,Aristides Gionis ,Michele Sebag

2010th Edition

364215882X, 978-3642158827

More Books

Students also viewed these Databases questions

Question

★★★★★

Mobile Services Inc. provides wireless communications services to a variety of customers. The following information relates to the companys investments in SAS in 2015 and 2014: Required: 1. Compute...

Answered: 1 week ago

Question

★★★★★

Review this sample Daily Scrum Meeting video. Reflect on the format and effective practices of a Daily Scrum by addressing the following: What are the key questions that can be used to frame a Daily...

Answered: 1 week ago

Question

★★★★★

(22) For a causal AR(2) process, derive the autocorrelation and partial autocorrelation function.

Answered: 1 week ago

Question

★★★★★

LMC, Inc., is equally owned by Larry, Maurice, and Charles. The owners are sports agents. LMC's income consists solely of fees from the owners' clients. During the current year, LMC's net income from...

Answered: 1 week ago

Question

★★★★★

We would like to use a Q - learning agent for Pacman, but the size of the state space for a large grid is too massive to hold in memory. To solve this, we will switch to feature - based...

Answered: 1 week ago

Question

★★★★★

Epley Industries stock has a beta of 1.25. The company just paid a dividend of $.40, and the dividends are expected to grow at 5 percent. The expected return on the market is 12 percent, and Treasury...

Answered: 1 week ago

Question

★★★★★

MAKE A DECISION: What do you think is the best course of action for Kate? Kate is not ready for discharge nor is she ready to be responsible for her son. She should spend at least 30 more days...

Answered: 1 week ago

Question

★★★★★

Based on his experiment, was Dr. Moesteller's conclusion correct? a. No, because he did not randomly select his subjects. b. No, because he knew some of his subjects better than others. c. Yes,...

Answered: 1 week ago

Question

★★★★★

You are a high school counselor. Maria is a 10th grade student who has just moved to the U.S. from Spain because her parents passed away. She is living with her cousin, who is currently on free and...

Answered: 1 week ago

Question

★★★★★

Reflection Questions Please review the assignment instructions and then respond to the following questions. This section must be completed before proceeding with the assignment. 1. Based on what you...

Answered: 1 week ago

Question

★★★★★

An important component of working with clients in any setting is understanding the problems that policies are designed to address, the details of how the policy is designed to affect the problem...

Answered: 1 week ago

Question

★★★★★

What is a guide for wording the visual aid title to make the meaning clear? (Objective 2)

Answered: 1 week ago

Question

★★★★★

You are writing a formal report that includes a variety of visual aids: three tables, a pie chart, a bar graph, and a pictograph. (a) If you are using APA style, which would be numbered as tables and...

Answered: 1 week ago

Question

★★★★★

What is the main consideration in deciding how many slides to include in an oral presentation? (Objective 7)

Answered: 1 week ago

Previous Question Next Question