Answered step by step
Verified Expert Solution
Question
1 Approved Answer
I have a question about one of the textbook answers here on Chegg. For the solution of 17.9 in Artificial Intelligence: A Modern Approach, The
I have a question about one of the textbook answers here on Chegg. For the solution of 17.9 in Artificial Intelligence: A Modern Approach, The way Gamma is found is obscure to me. How do we go from to . Secondly, is the reason R(Up) = R(down) because they eventually reach the same reward since +50 is reduced by -1 and -50 increased by +1 which leads to equivalent reward by the end state?!
Below is the question followed by what I am referring to:
= 50 7 0.9844 Chapter 17, Problem 9E 5 Bookmarks Show all steps: ON Consider the 101 x 3 world shown in Figure 17.14(b). In the start state the agent has a choice of two deterministic actions, Up or Down, but in the other states the agent has one deterministic action, Right. Assuming a discounted reward function, for what values of the discount y should the agent choose Up and for which Down? Compute the utility of each action as a function of y. (Note that this simple example actually reflects many real-world situations in which one must weigh the value of an immediate action versus the potential continual long-term consequences, such as choosing to dump pollutants into a lake.) +50 - 1 - 1 -1 - 1 -1 -1 Start -50 +1 +1 +1 +1 +1 +1 +1 (b) Step-by-step solution Step 1 of 1 A Deterministic action The computation of the Up and Down actions by the agent for a 101x3 world states, using discount reward y is as shown below: The UP utility calculation is made for 100x3 world as first row is the starting point for the agent R(UP) = 50+(-1)x+(-1)y? +(-1)y+...+(-1)100 = 50-(y + y2 +y* +...+ yl(0) = 50 - i=1 The Down utility calculation is made for 100x3 world as first row is the starting point for the agent R(Down) =(-1)50+ y + y2 + y +...+ 10 =(-1)50+(y + y2 + y +...+10 Chapter 17, Problem 9E 5 Bookmarks Show all steps: ON Deterministic action The computation of the Up and Down actions by the agent for a 101x3 world states, using discount reward y is as shown below: The UP utility calculation is made for 100x3 world as first row is the starting point for the agent R(Up)=50+(-1)y +(-1)y? +(-1)y+...+(-1) 7100 = 50 -(y + y2 +y? +...+y!00) 100 = 50- i=1 The Down utility calculation is made for 100x3 world as first row is the starting point for the agent R(Down) =(-1)50+ y + y2 + y +...+ y lo = (-1)50+(y+y* +y? +...+100) 100 = -50+ re For the given two actions Up and Down, the reward can be calculated by equating them to each other, as below: 100 100 50- = -50+ i=1 Balancing the equation, to get 100 100 50 + 50 = ' + 100 2x50 = 2x, 2xy' = 2 x 50 100 2' = 100 ' = 50 7 = 0.9844 For y = 0.9844 the actions reward is the same. Hence, for y>0.9844 the agent should take the Down direction and for y0.9844 the agent should take the Down direction and for yStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started