Question: Lets consider the following 3-state MDP(Markov Decision Process) for a robot trying to walk, the three states being Fallen , Standing and Moving , as

Lets consider the following 3-state MDP(Markov Decision Process) for a robot trying to walk, the three states being Fallen, Standing and Moving, as shown in the following figure.

Use the MDP formulation to code the following problem and find the optimal Values using the value iteration algorithm. And then use policy iteration method to find optimal policy for discount factor =0.1. Try using this method with a different discount factor, for example a much larger discount factor like 0.9 or 0.99, or a much smaller one like 0.01. Does the optimal policy change comment on it?

Lets consider the following 3-state MDP(Markov Decision Process) for a robot trying

1, +1 1, +1 Standing 0.6, +2 0.4, +1 0.4, -1 Moving Fallen 0.2, -1 0.8, +2 0.6, -1 slow action (black) fast action (green)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!