Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Suppose we are learning Q * * ( s , a ) for Pacman's world. Pacman can take the following actions { N , S

Suppose we are learning Q**(s,a) for Pacman's world.
Pacman can take the following actions
{N,S,E,W}
Currently, Pacman's estimate is Q(s,a) such that for all s
Q(s,N)=10,Q(s,S)=-10,Q(s,E)=5,Q(s,W)=2
Suppose Pacmans scheme for exploration is to
take a random action with probability lon=0.2
act according to the current policy (s)=argmaxaQ(s,a), with probability 1-lon=0.8
What is the probability of Pacman moving north, i.e., taking action N?
Suppose Pacman updates the Q(s,a) estimate using a running average with parameter =0.1.
If Pacman moves south, i.e., makes the action S and receives a reward of 100 what is the new estimate of Q(s,a)?
Q(s,N)=
Q(s,S)=
Q(s,E)=
Q(s,W)=
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Apache Cassandra Development

Authors: Vivek Mishra

1st Edition

1484201426, 9781484201428

More Books

Students also viewed these Databases questions

Question

LO6 Define harassment and the role that HR plays in addressing it.

Answered: 1 week ago