Exercise 11.10 Suppose your friend presented you with the following example where SARSA() seems to give unintuitive

Question:

Exercise 11.10 Suppose your friend presented you with the following example where SARSA(λ) seems to give unintuitive results. There are two states, A and B. There is a reward of 10 coming into state A and no other rewards or penalties.

There are two actions: left and right. These actions only make a difference in state B.

Going left in state B goes directly to state A, but going right has a low probability of going into state A. In particular:

• P(A|B, left) = 1; reward is 10.

• P(A|B, right) = 0.01; reward is 10. P(B|B, right) = 0.99; reward is 0.

• P(A|A, left) = P(A|A, right) = 0.999 and P(B|A, left) = P(B|A, right) = 0.001.

This is small enough that the eligibility traces will be close enough to zero when state B is entered.

• γ and λ are 0.9 and α is 0.4.

Suppose that your friend claimed that that Q(λ) does not work in this example, because the eligibility trace for the action right in state B ends up being bigger than the eligibility trace for action left in state B and the rewards and all of the parameters are the same. In particular, the eligibility trace for action right will be about 5 when it ends up entering state A, but it be 1 for action left. Therefore, the best action will be to go right in state B, which is not correct.

What is wrong with your friend’s argument? What does this example show?

Fantastic news! We've Found the answer you've been seeking!