Answered step by step
Verified Expert Solution
Question
1 Approved Answer
in an infinite-horizon discounted MDP, there are three states x, y1, y2 and only one action a. At state x with probability 1 the state
in an infinite-horizon discounted MDP, there are three states x, y1, y2 and only one action a. At state x with probability 1 the state transits to y1. At state y1 we have P(y1|y1) = p, P(y2|y1) = 1 - p. Finally y2 is the absorbing state so that P(y2|y2) = 1. The instant reward is set as 1 for starting in state y1 and 0 elsewhere: R(y1,a,y1) = 1, R(y1,a,y2) = 1, R(s,a,s') = 0 otherwise. The discount factor is denoted by gamma: 0 < gamma < 1. Define V*(y1) as the optimal value function of the state y1. Compute V*(y1) via Bellman's Equation in terms of gamma and p. Find Q*(x,a) in terms of gamma and p
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started