Answered step by step
Verified Expert Solution
Question
1 Approved Answer
This question is from my intro to artificial intelligence class and with figuring out the problem in terms of q-learning. Basically, you as the character
This question is from my intro to artificial intelligence class and with figuring out the problem in terms of q-learning. Basically, you as the character are trying to make it to the exit and not be killed by the monster in the process. He told us to think about it in terms of cells and in one cell there is a monster, the character, and the exit.
Exercise 1: Approximate Q-Learning (30 points) To evaluate your bomberman, you decided to employ a Q-function to approximate the Q-values. You function considers three features: fe is the Manhattan distance to the exit, normalized between 0 and 1; when fe = 0 your agent reached the exit. fm is the Manhattan distance to the (single) monster present in the environment; again, fm =0 means you and the monster occupy the same cell, and your agent is killed by the monster. fo is the Manhattan distance to the closest bomb explosion, again normalized between 0 and 1 as with the two previous definitions. You decide to employ a linear function to approximate Q(s,a): @(s, a) = wefe + Wmfm + wxfx where we, wm, and wc are real-valued weights. At first, you don't know how to initialize the weights, so you pick random values: We = 2 Wm = -1 Wc -3 You also decide that the learning rate is a = 0.25. Your bomberman is currently in a cell where it evaluates fe = 0.5 fm = 0.1 fx = 0.3 It decides to move east and it dies, getting a reward r= -1000. 1. What is the initial value of (s, east) (i.e., the value before moving)? 2. What is the new value of the weights (i.e., the value after moving)? 3. What is the new value of (s, east) (i.e., the value after moving)? Exercise 1: Approximate Q-Learning (30 points) To evaluate your bomberman, you decided to employ a Q-function to approximate the Q-values. You function considers three features: fe is the Manhattan distance to the exit, normalized between 0 and 1; when fe = 0 your agent reached the exit. fm is the Manhattan distance to the (single) monster present in the environment; again, fm =0 means you and the monster occupy the same cell, and your agent is killed by the monster. fo is the Manhattan distance to the closest bomb explosion, again normalized between 0 and 1 as with the two previous definitions. You decide to employ a linear function to approximate Q(s,a): @(s, a) = wefe + Wmfm + wxfx where we, wm, and wc are real-valued weights. At first, you don't know how to initialize the weights, so you pick random values: We = 2 Wm = -1 Wc -3 You also decide that the learning rate is a = 0.25. Your bomberman is currently in a cell where it evaluates fe = 0.5 fm = 0.1 fx = 0.3 It decides to move east and it dies, getting a reward r= -1000. 1. What is the initial value of (s, east) (i.e., the value before moving)? 2. What is the new value of the weights (i.e., the value after moving)? 3. What is the new value of (s, east) (i.e., the value after moving)Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started