Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

Consider a Markov chain with three states { 1 , 2 , 3 } . In each state, we can choose one of the two

Consider a Markov chain with three states

{1, 2, 3} .

In each state, we can choose one of the two

possible actions

{1, 2} .

The transition probability matrices under the two actions are given below:

P (1) = ([0.5, 0.3, 0.2], [0.1, 0.4, 0.5], [0.3, 0.3, 0.4])

and

P (2) = ([0.3, 0.3, 0.4], [0.5, 0.1, 0.4], [0.2, 0.5, 0.3]) .

The cost for a given

(

state

,

action

)

pair is a Bernoulli random variable. The mean costs are given

below

C = ([0.1, 0.9], [0.8, 0.1], [0, 0])

We are interested in solving the following discounted cost problem

m i n_{} lim_{N} E [_{k = 0}^{N} {0.9}^{k} c (x_{k}, u_{k}) | x_{0} = 1, u_{0} = 1]

where

x_{k}

is the state at time

k, u_{k}

is the action at time

k,

and

denotes a policy.

Assume we do not know the model but are given the following trace

(x_{k}, u_{k}, c (x_{k}, u_{k}))

instead:

(1, 1, 1) (2, 1, 0) (3, 2, 1) (2, 2, 0) .

Consider the Q

-

learning algorithm with

Q_{0} = ([0, 0.5], [0.3, 0], [0.2, 0.1])

and step size

l o n = 0.1 .

Please calculate the

sequence of Q

-

values under Q

-

learning with the trace given above.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Real Time Database Systems Architecture And Techniques

Authors: Kam-Yiu Lam ,Tei-Wei Kuo

1st Edition

1475784023, 978-1475784022

More Books

Students also viewed these Databases questions

Question

★★★★★

On December 31, 2013, Rhone-Metro Industries leased equipment to Western Soya Co. for a four-year period ending December 31, 2017, at which time possession of the leased asset will revert back to...

Answered: 1 week ago

Question

★★★★★

Japanese Motors exports cars and trucks to the U.S. market. On February 25, 2013, its most popular model was selling (wholesale) to U.S. dealers for $20,000. What price must Japanese Motors charge...

Answered: 1 week ago

Question

★★★★★

=+60-3 Describe the four components of emotional intelligence.

Answered: 1 week ago

Question

★★★★★

A firms cost of capital is 12 percent. The firm has three investments to choose among; the cash flows of each are as follows: Each investment requires a $1,000 cash outlay, and investments B and C...

Answered: 1 week ago

Question

★★★★★

Consider a Markov chain with three states { 1 , 2 , 3 } . In each state, we can choose one of the two possible actions { 1 , 2 } . The transition probability matrices under the two actions are given...

Answered: 1 week ago

Question

★★★★★

Ive been provided with some statistical parameters ( = 83, = 7) about a population. Any sample of the population when transformed to a SND had the following properties Z score properties z = 0, z =...

Answered: 1 week ago

Question

★★★★★

Question 6: Given the data below, what is the simple linear regression model that can be used to predict sales in future weeks? You have the following regression equation: Yt = 143.5 + 6.3x

Answered: 1 week ago

Question

★★★★★

1. What is P(A=a2,B=b2) P(A=a2,B=b2)? 2. Observing events from this probability distribution, what is the probability of seeing (a1, b1) then (a2, b2)? 3. Calculate the marginal probability...

Answered: 1 week ago

Question

★★★★★

1. According to Kant, what are the benefits to the individual in exercising their reason? What are the benefits to society? 2. How might this understanding of the "freedom of reason" have been seen...

Answered: 1 week ago

Question

★★★★★

create a practice research question for which the collection of observational data will be appropriate. What are three observations you might make when conducting this research? How will these...

Answered: 1 week ago

Question

★★★★★

A pressure-fed monopropellant thruster has a 2 m propellant tank. The propellant tank is pressurized to 12 MPa by helium, which enters through a regulator that keeps the pressure in the propellant...

Answered: 1 week ago

Question

★★★★★

. A very sweet pie made from molasses that originated with the Pennsylvania Dutch: a. Mincemeat pie b. Sugar pie c. Shoofly pie d. Lancaster pie

Answered: 1 week ago

Question

★★★★★

4. Which of the following is not the name of a Native American tribe? a. Seminole b. Apache c. Arapaho d. Illini

Answered: 1 week ago

Question

★★★★★

8. Explain the relationship between communication and context.

Answered: 1 week ago

Previous Question Next Question