Consider the following deterministic MDP with 1-dimensional continuous states and actions and a finite task horizon: State

Question:

Consider the following deterministic MDP with 1-dimensional continuous states and actions and a finite task horizon:

State Space S: R

Action Space A: R

Reward Function: R(s, a, s') = −qs² − ra² where r > 0 and q ≥ 0 Deterministic Dynamics/Transition Function: s' = cs + da (i.e., the next state s' is a deterministic function of the action a and current state s)

Task Horizon: T ∈ N

Discount Factor: γ = 1 (no discount factor)

Hence, we would like to maximize a quadratic reward function that rewards small actions and staying close to the origin. In this problem, we will design an optimal agent π^∗_t and also solve for the optimal agent’s value function V^∗_t for all timesteps.

By induction, we will show that V^∗_t is quadratic. Observe that the base case t = 0 trivially holds because V^∗₀ (s) = 0 For all parts below, assume that V^∗_t (s) = −p_ts² (Inductive Hypothesis).

a. (i) Write the equation for V^∗ _t+1(s) as a function of s, q, r, a, c, d, and pt . If your expression contains max, you do not need to simplify the max.

(ii) Now, solve for π^∗ _t+1(s). Recall that you can find local maxima of functions by computing the first derivative and setting it to 0.

b. Assume π^∗ _t+1 = k_t+1s for some k_t+1 ∈ R. Solve for p_t+1 in V^∗ _t+1(s) = −p_t+1s².

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Answer rating: 60% (10 reviews)

a i ii We want to solve for the a that maximizes the value of V t1 s This ...View the full answer

Answered By

Bree Normandin

Success in writing necessitates a commitment to grammatical excellence, a profound knack to pursue information, and a staunch adherence to deadlines, and the requirements of the individual publication. My background comprises writing research projects, research meta-analyses, literature reviews, white paper reports, multimedia projects, reports for peer-reviewed journals, among others. I work efficiently, with ease and deliver high-quality outputs within the stipulated deadline. I am proficient in APA, MLA, and Harvard referencing styles. I have good taste in writing and reading. I understand that this is a long standing and coupled with excellent research skills, analysis, well-articulated expressions, teamwork, availability all summed up by patience and passion. I put primacy on client satisfaction to gain loyalty, and trust for future projects. As a detail-oriented researcher with extensive experience surpassing eight years crafting high-quality custom written essays and numerous academic publications, I am confident that I could considerably exceed your expectations for the role of a freelance academic writer.

5.00+ 7+ Reviews 21+ Question Solved

Related Book For book-img-for-question

Artificial Intelligence A Modern Approach

ISBN: 9780134610993

4th Edition

Authors: Stuart Russell, Peter Norvig

See More Books

Question Posted: Mar 10, 2023 08:56 AM

See More Questions

Consider the following deterministic MDP with 1-dimensional continuous states and actions and a finite task horizon: State

Question:

Step by Step Answer:

a i ii We want to solve for the a that maximizes the value of V t1 s This ...View the full answer

Artificial Intelligence A Modern Approach

Students also viewed these Computer science questions