Question: Consider the 101 Ã 3 world shown in Figure 17.14(b). In the start state the agent has a choice of two deterministic actions, Up or
Consider the 101 à 3 world shown in Figure 17.14(b). In the start state the agent has a choice of two deterministic actions, Up or Down, but in the other states the agent has one deterministic action, Right. Assuming a discounted reward function, for what values of the discount γ should the agent choose Up and for which Down? Compute the utility of each action as a function of γ. That this simple example actually reflects many real-world situations in which one must weigh the value of an immediate action versus the potential continual long-term consequences, such as choosing to dump pollutants into a lake.)
Figure 17.14

-1 +50 +10 -1 -1 -1 -1 -1 -1 -1 Start -1 -1 -1 -1 +1 -50 +1 +1 +1 +1 +1 +1 -1 -1 (b) (a) (b) 1.
Step by Step Solution
3.44 Rating (157 Votes )
There are 3 Steps involved in it
The utility of Up is while the utility of Down is Solving n... View full answer
Get step-by-step solutions from verified subject matter experts
