Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 07, 2024

Problem 1 . ( 5 0 pt ) Given a Markov stationary policy , consider the policy evaluation problem to compute v . For example,

Problem

1 . (50

)

Given a Markov stationary policy

,

consider the policy evaluation problem

to compute

v .

For example, we can apply the temporal difference

(

)

learning algorithm

given by

v_{t + 1} (s) = v_{t} (s) +_{t} (s) * I_{{s_{t})} = s,

where

_{t}

= r_{t} + v_{t} (s_{t + 1}) - v_{t} (s_{t})

is known as TD error. Alternatively, we can apply the

n -

step

TD learning algorithm given by

v_{n + t} (s) = v_{n + t - 1} (s) + (G_{t}^{(n)} - v_{t + n - 1} (s)) * I_{{s_{t})} = s,

where

G_{t}^{(n)}

= r_{t} + r_{t + 1} +

dots

+^{n - 1} r_{t + n - 1} +^{n} v_{n + t - 1} (s_{t + n})

for

n = 1, 2,

dots. Note that

_{t} = G_{t}^{(1)} - v_{t} (s_{t}) .

The

n -

step TD algorithms for

n

use bootstrapping. Therefore, they use biased estimate

v .

On the other hand, as

n,

the

n -

step TD algorithm becomes a Monte Carlo method,

where we use an unbiased estimate of

v .

However, these approaches delay the update for

n

stages and we update the value function estimate only for the current state.

As an intermediate step to address these challenges, we first introduce the

-

return algo

-

rithm given by

v_{t + 1} (s) = v_{t} (s) + (G_{t}^{} - v_{t} (s)) * I_{{s_{t})} = s,

where given

i n [0, 1],

we define

G_{t}^{}

= (1 -)_{n = 1}^{}^{n - 1} G_{t}^{(n)}

taking a weighted average of

G_{t}^{(n)}

.

(

)

By the definition of

G_{t}^{(n)},

we can show that

G_{t}^{(n)} = r_{t} + G_{t + 1}^{(n - 1)} .

Derive an analogous

recursive relationship for

G_{t}^{}

and

G_{t + 1}^{} .

(

)

Show that the term

G_{t}^{} - v_{t} (s)

in the

-

return update can be written as the sum of TD errors.

The TD algorithm, Monte Carlo method and

-

return algorithm looks forward to approx

-

imate

v .

Alternatively, we can look backward via the eligibility trace method. TheTD

()

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems Design Implementation And Management

Authors: Peter Rob, Carlos Coronel

6th International Edition

061921323X, 978-0619213237

More Books

Students also viewed these Databases questions

Question

★★★★★

Examine the heat engine given in Problem 7.50 to see if it satisfies the inequality of Clausius.

Answered: 1 week ago

Question

★★★★★

Auditors use the audit risk model to identify areas of the financial statements that are most at risk for material misstatement. Once risks are identified, auditors can more effectively determine...

Answered: 1 week ago

Question

★★★★★

10. Outline how researchers study sleep in the laboratory. Differentiate among the stages of sleep and identify what makes REM sleep so special.

Answered: 1 week ago

Question

★★★★★

Jen and Barrys Ice Milk Company used cash to purchase a new ice milk mixer on January 1, 2011. The new mixer is estimated to have a 20,000-hour service life. Jen and Barrys depreciates equipment on...

Answered: 1 week ago

Question

★★★★★

Question 1 ( 3 0 Marks ) The catalytic reaction of carbon monoxide with hydrogen was conducted in a gas phase for the production of methanol according to following reaction: C O ( g ) + 2 H 2 ( g ) C...

Answered: 1 week ago

Question

★★★★★

Assignment - Franchises If you wanted to start a new franchise, what would it be? A couple of franchise ideas to start are listed below. Come up with possible franchises you might be interested in,...

Answered: 1 week ago

Question

★★★★★

To influence the way in which managers make decisions within organisations, which of the following approaches should be adopted?" Performance measurement system. Reward structure. Performance...

Answered: 1 week ago

Question

★★★★★

Tutorial 10: Real Trigonometric Integrals Using a suitable contour integral, evaluate the following trigonometric integrals. 2T 2T 1. S cos 30 de 2. 5 - 4 cos 0 So 4 de 5+4 sin 0 2T 2T de de 4. 3. 2...

Answered: 1 week ago

Question

★★★★★

Personal Relationship O Criminal Intent Q.13. What type of Workplace Violence does this scenario match? Scenario: An Emergency Department patient kicks the attending doctor in the chest...

Answered: 1 week ago

Question

★★★★★

Suppose an influenza vaccine trial carried out during an epidemic. Of 460 adults who took part, 240 received influenza vaccine and 220 received placebo. Overall, 100 people contracted influenza, of...

Answered: 1 week ago

Question

★★★★★

. The following items appeared on December 31 Excel work sheet. Based on the following information, what is profit for the year? Prepare the adjusted trial balance. Income statement Balance sheet....

Answered: 1 week ago

Question

★★★★★

Describe what you understand is required in the position you are applying for. Summarize your qualifications in light of this description.

Answered: 1 week ago

Question

★★★★★

Give me a specific example of something you learned from a previous work experience.

Answered: 1 week ago

Question

★★★★★

Why do you want to leave your current employer? or Why did you leave your last employer?

Answered: 1 week ago

Previous Question Next Question