Question

1 Approved Answer

Posted on Sep 24, 2024

This question is from my intro to artificial intelligence class and with figuring out the problem in terms of q-learning. Basically, you as the character

image text in transcribed

This question is from my intro to artificial intelligence class and with figuring out the problem in terms of q-learning. Basically, you as the character are trying to make it to the exit and not be killed by the monster in the process. He told us to think about it in terms of cells and in one cell there is a monster, the character, and the exit.

Exercise 1: Approximate Q-Learning (30 points) To evaluate your bomberman, you decided to employ a Q-function to approximate the Q-values. You function considers three features: fe is the Manhattan distance to the exit, normalized between 0 and 1; when fe = 0 your agent reached the exit. fm is the Manhattan distance to the (single) monster present in the environment; again, fm =0 means you and the monster occupy the same cell, and your agent is killed by the monster. fo is the Manhattan distance to the closest bomb explosion, again normalized between 0 and 1 as with the two previous definitions. You decide to employ a linear function to approximate Q(s,a): @(s, a) = wefe + Wmfm + wxfx where we, wm, and wc are real-valued weights. At first, you don't know how to initialize the weights, so you pick random values: We = 2 Wm = -1 Wc -3 You also decide that the learning rate is a = 0.25. Your bomberman is currently in a cell where it evaluates fe = 0.5 fm = 0.1 fx = 0.3 It decides to move east and it dies, getting a reward r= -1000. 1. What is the initial value of (s, east) (i.e., the value before moving)? 2. What is the new value of the weights (i.e., the value after moving)? 3. What is the new value of (s, east) (i.e., the value after moving)? Exercise 1: Approximate Q-Learning (30 points) To evaluate your bomberman, you decided to employ a Q-function to approximate the Q-values. You function considers three features: fe is the Manhattan distance to the exit, normalized between 0 and 1; when fe = 0 your agent reached the exit. fm is the Manhattan distance to the (single) monster present in the environment; again, fm =0 means you and the monster occupy the same cell, and your agent is killed by the monster. fo is the Manhattan distance to the closest bomb explosion, again normalized between 0 and 1 as with the two previous definitions. You decide to employ a linear function to approximate Q(s,a): @(s, a) = wefe + Wmfm + wxfx where we, wm, and wc are real-valued weights. At first, you don't know how to initialize the weights, so you pick random values: We = 2 Wm = -1 Wc -3 You also decide that the learning rate is a = 0.25. Your bomberman is currently in a cell where it evaluates fe = 0.5 fm = 0.1 fx = 0.3 It decides to move east and it dies, getting a reward r= -1000. 1. What is the initial value of (s, east) (i.e., the value before moving)? 2. What is the new value of the weights (i.e., the value after moving)? 3. What is the new value of (s, east) (i.e., the value after moving)