Question: Consider the car domain above ( without knowing the T or R ) and given the following experiences: Episode 1 : cool, fast, warm, +

Consider the car domain above (without knowing the T or R) and given the following experiences:
Episode 1:
cool, fast, warm, +2
warm, fast, overheated, -10
Episode 2:
cool, slow, cool, +1
cool, slow, cool, +1
cool, fast, cool, +2
cool, fast, cool, +2
cool, fast, warm, +2
warm, fast, overheated, -10
Episode 3:
cool, fast, warm, +2
warm, slow, cool, +1
cool, slow, cool, +1
cool, fast, cool, +2
cool, fast, warm, +2
warm, fast, overheated, -10
a.(3pt) Estimating the parameters for T and R for model-based reinforcement learning.
b.(3pt) Use MC reinforcement learning method (direct evaluation) to estimate the Q function, assuming =1.0. Count all occurrences of a state in each episode.
 Consider the car domain above (without knowing the T or R)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!