20.4 The environments used in the chapter all assume that training sequences are finite. In environments with

Question:

20.4 The environments used in the chapter all assume that training sequences are finite. In environments with no clear termination point, the unlimited accumulation of rewards can lead to problems with infinite utilities. To avoid this, a discount factor 7 is often used, where 7 < 1.

A reward k steps in the future is discounted by a factor of -/. For each constraint and update equation in the chapter, explain how to incorporate the discount factor.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: