Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This question is from my intro to artificial intelligence class and with figuring out the problem in terms of q-learning. Basically, you as the character

image text in transcribed

This question is from my intro to artificial intelligence class and with figuring out the problem in terms of q-learning. Basically, you as the character are trying to make it to the exit and not be killed by the monster in the process. He told us to think about it in terms of cells and in one cell there is a monster, the character, and the exit.

Exercise 1: Approximate Q-Learning (30 points) To evaluate your bomberman, you decided to employ a Q-function to approximate the Q-values. You function considers three features: fe is the Manhattan distance to the exit, normalized between 0 and 1; when fe = 0 your agent reached the exit. fm is the Manhattan distance to the (single) monster present in the environment; again, fm =0 means you and the monster occupy the same cell, and your agent is killed by the monster. fo is the Manhattan distance to the closest bomb explosion, again normalized between 0 and 1 as with the two previous definitions. You decide to employ a linear function to approximate Q(s,a): @(s, a) = wefe + Wmfm + wxfx where we, wm, and wc are real-valued weights. At first, you don't know how to initialize the weights, so you pick random values: We = 2 Wm = -1 Wc -3 You also decide that the learning rate is a = 0.25. Your bomberman is currently in a cell where it evaluates fe = 0.5 fm = 0.1 fx = 0.3 It decides to move east and it dies, getting a reward r= -1000. 1. What is the initial value of (s, east) (i.e., the value before moving)? 2. What is the new value of the weights (i.e., the value after moving)? 3. What is the new value of (s, east) (i.e., the value after moving)? Exercise 1: Approximate Q-Learning (30 points) To evaluate your bomberman, you decided to employ a Q-function to approximate the Q-values. You function considers three features: fe is the Manhattan distance to the exit, normalized between 0 and 1; when fe = 0 your agent reached the exit. fm is the Manhattan distance to the (single) monster present in the environment; again, fm =0 means you and the monster occupy the same cell, and your agent is killed by the monster. fo is the Manhattan distance to the closest bomb explosion, again normalized between 0 and 1 as with the two previous definitions. You decide to employ a linear function to approximate Q(s,a): @(s, a) = wefe + Wmfm + wxfx where we, wm, and wc are real-valued weights. At first, you don't know how to initialize the weights, so you pick random values: We = 2 Wm = -1 Wc -3 You also decide that the learning rate is a = 0.25. Your bomberman is currently in a cell where it evaluates fe = 0.5 fm = 0.1 fx = 0.3 It decides to move east and it dies, getting a reward r= -1000. 1. What is the initial value of (s, east) (i.e., the value before moving)? 2. What is the new value of the weights (i.e., the value after moving)? 3. What is the new value of (s, east) (i.e., the value after moving)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Modern Database Management

Authors: Jeff Hoffer, Ramesh Venkataraman, Heikki Topi

13th Edition Global Edition

1292263350, 978-1292263359

More Books

Students also viewed these Databases questions

Question

Write down the circumstances in which you led.

Answered: 1 week ago