Objective Reinforcement Learning Homework 3 Model Free Monte Carlo Prediction In this homework assignment, you will apply the Monte Carlo prediction method to estimate the state values for a four state problem You will be provided with four episodes Your task is to calculate the state values using the Monte Carlo method with a specified discount factor ( gamma ) and initial values for the states Problem Setup States ( S ) Four states, labeled as S 1 , S 2 , S 3 , and S 4 Rewards ( R ) Provided within each episode, including a final reward Discount Factor ( gamma ) 0 9 Initial State Values ( V ) V ( S 1 ) 0 V ( S 2 ) 0 V ( S 3 ) 0 V ( S 4 ) 0 Episodes S 1 , 0 , S 2 , 1 , S 3 , 0 , S 4 , 1 0 S 1 , 0 , S 2 , 0 , S 2 , 0 , S 3 , 0 , S 4 , 5 S 1 , 0 , S 1 , 1 , S 2 , 0 , S 3 , 0 , S 4 , 8 S 1 , 0 , S 2 , 0 , S 2 , 0 , S 3 , 1 , S 4 , 1 2 Tasks 1 Calculate the returns ( G ) for each state in each episode 2 Use the Every Visit Monte Carlo method to update the state values ( V ) based on the returns and the discount factor ( gamma ) 3 Calculate the updated values for each state after processing all four episodes

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 31, 2024

Objective Reinforcement Learning Homework 3 : Model - Free Monte Carlo Prediction In this homework assignment, you will apply the Monte Carlo prediction method to

Objective

Reinforcement Learning

Homework

3

: Model

-

Free Monte Carlo Prediction

In this homework assignment, you will apply the Monte Carlo prediction method to estimate the state values for a four

-

state problem. You will be provided with four episodes. Your task is to calculate the state values using the Monte Carlo method with a specified discount factor

(

gamma

)

and initial values for the states.

Problem Setup

-

States

(

)

: Four states, labeled as S

1,

2,

3,

and S

4 .

-

Rewards

(

)

: Provided within each episode, including a final reward.

-

Discount Factor

(\

gamma

)

0.9

-

Initial State Values

(

)

-

(

1) = 0 -

(

2) = 0 -

(

3) = 0 -

(

4) = 0

Episodes

-

1, 0,

2, 1,

3, 0,

4, 10

-

1, 0,

2, 0,

2, 0,

3, 0,

4, - 5 -

1, 0,

1, 1,

2, 0,

3, 0,

4, 8 -

1, 0,

2, 0,

2, 0,

3, 1,

4, 12

Tasks

1 .

Calculate the returns

(

)

for each state in each episode.

2 .

Use the Every

-

Visit Monte Carlo method to update the state values

(

)

based on the returns and the discount factor

(\

gamma

) .

3 .

Calculate the updated values for each state after processing all four episodes.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Learning PostgreSQL

Authors: Salahaldin Juba, Achim Vannahme, Andrey Volkov

1st Edition

178398919X, 9781783989195

More Books

Students also viewed these Databases questions

Question

★★★★★

The Stratton Township Park is located on a piece of property that contains two golf courses, a swimming pool, and 800 acres of woods and open spaces. Three years ago, the Stratton Park Department...

Answered: 1 week ago

Question

★★★★★

HW Score: 0%, 0 of 5 pls -top and Think 3-2 (book/static) (1) Aclient pays Windsor Group Lid 5900 on March 15 for conting services to be performed April 1 to June 30 Asing the company es accrual...

Answered: 1 week ago

Question

★★★★★

58. Suppose that the conditional distribution of N, given that Y = y, is Poisson with mean y. Further suppose that Y is a gamma random variable with parameters (r, ), where r is a positive integer....

Answered: 1 week ago

Question

★★★★★

The stockholders equity section of Maley Corporations balance sheet consists of common stock ($8 par) $1,000,000 and retained earnings $300,000. A 10% stock dividend (12,500 shares) is declared when...

Answered: 1 week ago

Question

★★★★★

Houston, Inc., planned and actually manufactured 240,000 units of its single product in 2017, its first year of operation. Variable manufacturing cost was $24 per unit produced. Variable operating...

Answered: 1 week ago

Question

★★★★★

LEILATE HESSAGE WW INSTRUCTOR STAR VIEW RACE Exercise 197 Matlock Company reported net income of $300,000 for the current year. Depreciation recorded on buildings and equipment amounted to $82,000...

Answered: 1 week ago

Question

★★★★★

sa Problema 1 A boy riding a bicycle can be modeled as a mass-spring-shock absorber system. The weight of the child is 80N, the constant of elasticity of the spring is 5000 N/m, and the damping...

Answered: 1 week ago

Question

★★★★★

Exercise 3.10 3.10 Use the Normal table. Use Table A to find the proportion of observations from a standard Normal distribution that satisfies each of the following statements. In each case, sketch a...

Answered: 1 week ago

Question

★★★★★

PROBLEM 6. (20 points) Assume a solution of boron and water. Density of water is 1.0 g/cm and the concentration of natural boron is 72g/cm. This solution is exposed to a source of thermal neutrons...

Answered: 1 week ago

Question

★★★★★

a) What is the cost per kg of oranges for each supplier? b) Which is the cheapest supplier and which is the most expensive supplier? c) List the priority in which you will purchase from each...

Answered: 1 week ago

Question

★★★★★

In a survey of 1050 U.S. adults, 588 said that chocolate was their favorite ice cream flavor. Construct a 95% CI for the population proportion of U.S. adults who say chocolate is their favorite ice...

Answered: 1 week ago

Question

★★★★★

Based on your life and work experience, what percentage of people would you say really has integrity (that is, are honestdont lie, steal, or cheatand sincere)? Give some examples of how certain...

Answered: 1 week ago

Question

★★★★★

How would you rate Hsiehs leadership using the Leadership Grid?

Answered: 1 week ago

Question

★★★★★

Which leadership challenges might occur if Zappos goes international?

Answered: 1 week ago

Previous Question Next Question