Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Consider a Markov chain with three states { 1 , 2 , 3 } . In each state, we can choose one of the two

Consider a Markov chain with three states {1,2,3}. In each state, we can choose one of the two
possible actions {1,2}. The transition probability matrices under the two actions are given below:
P(1)=([0.5,0.3,0.2],[0.1,0.4,0.5],[0.3,0.3,0.4]) and P(2)=([0.3,0.3,0.4],[0.5,0.1,0.4],[0.2,0.5,0.3]).
The cost for a given (state, action) pair is a Bernoulli random variable. The mean costs are given
below
C=([0.1,0.9],[0.8,0.1],[0,0])
We are interested in solving the following discounted cost problem
minlimNE[k=0N0.9kc(xk,uk)|x0=1,u0=1]
where xk is the state at time k,uk is the action at time k, and denotes a policy.
Assume we do not know the model but are given the following trace (xk,uk,c(xk,uk)) instead:
(1,1,1)(2,1,0)(3,2,1)(2,2,0).
Consider the Q-learning algorithm with Q0=([0,0.5],[0.3,0],[0.2,0.1]) and step size lon=0.1. Please calculate the
sequence of Q-values under Q-learning with the trace given above.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Real Time Database Systems Architecture And Techniques

Authors: Kam-Yiu Lam ,Tei-Wei Kuo

1st Edition

1475784023, 978-1475784022

More Books

Students also viewed these Databases questions

Question

8. Explain the relationship between communication and context.

Answered: 1 week ago