Question: 3 3 points Suppose that a Q - learning agent transitions selects action a = 0 from state s = 4 2 and transitions to

33 points
Suppose that a Q-learning agent transitions selects action a=0 from state s=42 and transitions to state s'=17 and earns a reward of r=5.2. Suppose that the
agent's current estimate of the state value function for the two states involve are V(17)=24.5 and V(42)=16.3.
The target value used in the Q-learning update formula is defined as follows: QT=r+*V(s')
Using a discount rate of =0.8, calculate the Q-learning target value for this transition. Provide an exact answer.
3 3 points Suppose that a Q - learning agent

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!