Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Open the image attached for the qustion [25pts] [Non-programming problem] When we describe the Markov Decision Processes, we derived a formula to calculate the state
Open the image attached for the qustion
[25pts] [Non-programming problem] When we describe the Markov Decision Processes, we derived a formula to calculate the state value function for a policy . Please derive the formula for the action value function for a policy. The following specifies the definition of state value function (Eq 3.12) and the definition of action value function (Eq 3.13). Gtk=t+1Tkt1Rkv(s)E[GtSt=s]=E[k=0kRt+k+1St=s],forallsS,q(s,a)E[GtSt=s,At=a]=E[k=0kRt+k+1St=s,At=a] We derived the following formula for v(s)E[GtSt=s]=E[Rt+1+Gt+1St=s]=a(as)srp(s,rs,a)[r+E[Gt+1St+1=s]]=a(as)s,rp(s,rs,a)[r+v(s)],forallsS, Please derive a formula, similar to Eq 3.14 , but for the action value function. [25pts] [Non-programming problem] When we describe the Markov Decision Processes, we derived a formula to calculate the state value function for a policy . Please derive the formula for the action value function for a policy. The following specifies the definition of state value function (Eq 3.12) and the definition of action value function (Eq 3.13). Gtk=t+1Tkt1Rkv(s)E[GtSt=s]=E[k=0kRt+k+1St=s],forallsS,q(s,a)E[GtSt=s,At=a]=E[k=0kRt+k+1St=s,At=a] We derived the following formula for v(s)E[GtSt=s]=E[Rt+1+Gt+1St=s]=a(as)srp(s,rs,a)[r+E[Gt+1St+1=s]]=a(as)s,rp(s,rs,a)[r+v(s)],forallsS, Please derive a formula, similar to Eq 3.14 , but for the action value functionStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started