Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Semi - Gradient Update This problem presents a brief glimpse of the problems that can arise in off - policy learning with function approximation, through

Semi-Gradient Update
This problem presents a brief glimpse of the problems that can arise in off-policy learning with function approximation, through the concepts that have been introduced so far. If
you would like a more detailed discussion on these issues, you may refer to Chapter 11. Let us now apply semi-gradient TD learning from Chapter 9 with batch updates (Section
6.3) to the following value-function approximation problem, based on a problem known as Baird's Counterexample:
In the above diagram, each circle is a state. The arrows represent some possible transitions between states.
The formulas shown in each state are for their values in terms of some parameters wi,i=0,1,dots,6 that comprise the value function approximator that we wish to learn.
Consider that we see each of the transitions shown above exactly once in a batch of data.
The reward for each transition is 0, and the discount factor =0.95.
We update the weights in the function approximation using semi-gradient TD(0).
Before the update, the weights are: (1,1,1,1,1,1,5), with w6=5.
With a learning rate =0.1, what will be the weights after the update?
Consider the updates happening in a batch, with the individual updates summed up and applied to the weights after computing the update for all the six transitions. Enter your
answers to 3 decimal places.
0 points earned
w0
(no answer)
6
0 points earned
7
0 points earned
8
0 points earned
(no answer)
(no answer)
Results for question 9.
9
0 points earned
Incorrect answer:
(no answer)
Results for question 10.
10
0 points earned
Incorrect answer:
(no answer)
Results for question 11.
11
0 points earned
Incorrect answer:
(no answer)
Feedback
General Feedback
Using the update rule given in page 203 of the textbook in the algorithm (not the algorithm itself, as this is a batch update), the contribution to the update from the leftmost transition is
Similarly, the contributions from the other transitions from left to right are:
And for the bottom "loop" transition, it is
\[[-0.070,0,0,0,0,0,-0.035]^{T}\
Adding the above updates together and adding the total update to gives the updated weights.
Results for question 12.
12
0 points earned
After this update, what will be the new value of the state whose value is calculated as ?
Enter your answer to 3 decimal places.
Incorrect answer:
(no answer)
Results for question 13.
13
0 points earned
And what will be the value of the state whose value is calculated as ?
Enter your answer to 3 decimal places.
Incorrect answer:
(no answer)
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Managing Your Information How To Design And Create A Textual Database On Your Microcomputer

Authors: Tenopir, Carol, Lundeen, Gerald

1st Edition

1555700233, 9781555700232

More Books

Students also viewed these Databases questions

Question

Describe the linkages between HRM and strategy formulation. page 74

Answered: 1 week ago

Question

Identify approaches to improving retention rates.

Answered: 1 week ago