Question: Consider an undiscounted MDP having three states, (1, 2, 3), with rewards 1, 2, 0 respectively. State 3 is a terminal stale. In states I

Consider an undiscounted MDP having three states, (1, 2, 3), with rewards —1, —2, 0 respectively. State 3 is a terminal stale. In states I and 2 there are two possible actions: a and b. The transition model is as follows:

• In state 1, action a moves the agent to state 2 with probability 0.8 and makes the agent stay put with probability 0.2.

• In state 2, action a moves the agent to state 1 with probability 0.8 and makes the agent stay put with probability 0.2.

• In either state 1 or state 2, action b moves the agent to stale 3 with probability 0.1 and makes the agent stay put with probability 0.9. Answer the following questions:

a. What can he determined qualitatively about the optimal policy in states 1 and 2?

b. Apply policy iteration, showing each step in full, to determine the optimal policy and the values of states 1 and 2. Assume that the initial policy has action bin both states.

c. What happens to policy iteration if the initial policy has action a in both stales? Does discounting help? Does the optimal policy depend on the discount factor?

Step by Step Solution

3.28 Rating (169 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

a Intuitively the agent wants to get to state 3 as soon as possible because it will pay a cost for each time step it spends in states 1 and 2 However ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Document Format (1 attachment)

Word file Icon

21-C-S-A-I (250).docx

120 KBs Word File

Students Have Also Explored These Related Artificial Intelligence Questions!