Answered step by step
Verified Expert Solution
Question
1 Approved Answer
(a) [3 pts] Ql.l Let T be the set of all possible game trees with alternating levels of maximizer and expectation nodes. Consider each of
(a) [3 pts] Ql.l Let T be the set of all possible game trees with alternating levels of maximizer and expectation nodes. Consider each of the following conditions independently. For each condition, select the condition if and only if there exists a tree in T such that knowing that condition and no others allows us to prune. B When all the leaf node values are bounded by some lower bound I] When all the leaf node values are bounded by some upper bound B When all the leaf node values are negative D It is possible to prune a tree in T, but the necessary condition is not in the above list El We can never prune any tree in T (b) [3 pts] 01.2 Jason is on Memorial Glade, and he is searching for Soda Hall. He knows that Soda Hall exists and is reachable in some nite distance. For each action, he can only move one step in the direction North, East, South, or West. Each action incurs a cost of 1. Jason can step 05 campus so he represents his search problem as a graph with innite nodes. Which of the following algorithms will allow him to eventually reach Soda? [:1 Breadth First Graph Search I] Depth First Graph Search U Uniform Cost Search I] A\" tree search with an admissible heuristic I] None of the above (c) Consider the following grid. Here, D, E and F are exit states. Pacman starts in state A. The reward for entering each state is reected in the grid. Assume that discount factor y = 1. (i) [3 pts] Write the optimal values V*(s) for s = A and s = C and the optimal policy 1*(5) for s = A. 01.3 V'(A) = Q1.4 V*(C) = 01.5 :r*(A)= 0 Up 0 Down 0 Left 0 Right (ii) [4 pts] Now, instead of Pacman, Pacbaby is travelling in this grid. Pacbaby has a more limited set of actions than Pacman and can never go left. Hence. Pacbaby has to choose between actions: Up, Down and Right. Pacman is rational, but Pacbaby is indecisive. If Pacbaby enters state C, Pacbaby nds the two best actions and randomly, with equal probability, chooses between the two. Let Jr*(s) represent the optimal policy for Pacman. Let V(s) be the values under the policy where Pacbaby acts according to rr'(s) for all s 9': C, and follows the indecisive policy when at state C. What are the values V(s) for s = A and s = C? Ql.6 V(A) = Q1.7 V(C) = (iii) [3 pts] Now Pacman knows that Pacbaby is going to be indecisive when at state C and he decides to recompute the optimal policy for Pacbaby at all other states, anticipating his indecisiveness at C. What is Pacbahy's new policy 2:5(5) and new value V(s) for s = A? 01.8 V(A)= 01.9 mt): 0 Up 0 Down 0 Left 0 Right
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started