Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Semi - Gradient Update This problem presents a brief glimpse of the problems that can arise in off - policy learning with function approximation, through
SemiGradient Update
This problem presents a brief glimpse of the problems that can arise in offpolicy learning with function approximation, through the concepts that have been introduced so far. If
you would like a more detailed discussion on these issues, you may refer to Chapter Let us now apply semigradient TD learning from Chapter with batch updates Section
to the following valuefunction approximation problem, based on a problem known as Baird's Counterexample:
In the above diagram, each circle is a state. The arrows represent some possible transitions between states.
The formulas shown in each state are for their values in terms of some parameters dots, that comprise the value function approximator that we wish to learn.
Consider that we see each of the transitions shown above exactly once in a batch of data.
The reward for each transition is and the discount factor
We update the weights in the function approximation using semigradient
Before the update, the weights are: with
With a learning rate what will be the weights after the update?
Consider the updates happening in a batch, with the individual updates summed up and applied to the weights after computing the update for all the six transitions. Enter your
answers to decimal places.
points earned
no answer
points earned
points earned
points earned
no answer
no answer
Results for question
points earned
Incorrect answer:
no answer
Results for question
points earned
Incorrect answer:
no answer
Results for question
points earned
Incorrect answer:
no answer
Feedback
General Feedback
Using the update rule given in page of the textbook in the algorithm not the algorithm itself, as this is a batch update the contribution to the update from the leftmost transition is
Similarly, the contributions from the other transitions from left to right are:
And for the bottom "loop" transition, it is
T
Adding the above updates together and adding the total update to gives the updated weights.
Results for question
points earned
After this update, what will be the new value of the state whose value is calculated as
Enter your answer to decimal places.
Incorrect answer:
no answer
Results for question
points earned
And what will be the value of the state whose value is calculated as
Enter your answer to decimal places.
Incorrect answer:
no answer
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started