Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

At state x , with probability 1 the state transits to y1 , i.e., P(y1|x)=1. Then at state y1 , we have P(y1|y1)=p,P(y2|y1)=1p, which says

At state x , with probability 1 the state transits to y1 , i.e., P(y1|x)=1. Then at state y1 , we have P(y1|y1)=p,P(y2|y1)=1p, which says there is probability p we stay in y1 and probability 1p the state transits to y2 . Finally, state y2 is the absorbing state so that P(y2|y2)=1. The instant reward is set as 1 for starting in state y1 and 0 elsewhere: R(y1,a,y1)=1,R(y1,a,y2)=1,,R(s,a,s)=0 otherwise. The discount factor is denoted by ( 0<<1 ). My problem is defining this with p and 1-p . It confuses me. I know how to do Bellman equations when they involve the usual T, R and V* . This is the question: Define V(y1) as the optimal value function of the state y1 . Compute V(y1) via Bellman's Equation. (The answer is a formula in terms of ,p ). V(y1)= Find Q(x,a) . Q(x,a)=

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intermediate Accounting

Authors: Donald E. Kieso, Jerry J. Weygandt, And Terry D. Warfield

13th Edition

9780470374948, 470423684, 470374942, 978-0470423684

Students also viewed these Mathematics questions