Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 2 (Policy Iteration Using Action Value Function) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 4 in the

image text in transcribed
Problem 2 (Policy Iteration Using Action Value Function) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 4 in the book by (Sutton and Barto), answer the following questions. (a) Assume now the policy is deterministic, i.e., there is only one action assigned probability 1 at each state s, denote by (s). Or we can express: (as)={1,0,ifa=(s)otherwise. Note that with slight abuse of notation, (as) denotes the probability and (s) denotes the action, given policy . Using the above expression, what would the Bellman equation for q derived in HW7, Problem 2 part (c) become? HW7 Problem2 part (c) as following: (c) Derive the Bellman equation for q. That is, express q(s,a) using q(s,a). Rearrange the expression such that the summations is next to each other, like the expression in Bellman equation for v. (Hint: Use the results from part (a) and part (b). Make sure to use the notation a as the action taken in the next state s.) (b) Given the current action value function q, what is the greedy policy with respect to q ? (Write down the mathematical expression for (s).) (c) Use the results from part (a) and part (b), write the policy iteration pseudocode (refer to lecture note or Chapter 4.3 in the book) using the action value function Q(s,a) instead of the value function V(s). (Hint: The policy evaluation process can be rewritten using part (a), and the policy improvement process can be rewritten using part (b).)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these General Management questions