Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

Please write the code for part one in python. No pusedo code please. Objectives: python. No pusedo code please. To implement a reinforcement based on

Please write the code for part one in

python. No pusedo code please.

Objectives:

python. No pusedo code please.

To implement a reinforcement based on task

-

based rewards

To take a continuous environment and discretize it so that it is suitable for a

reinforcement learning task

This is the CartPole task. The idea here is to balance this pole using a one

-

dimensional robot

(

can only move left and right

) .

The robot's state has

4

components:

? x

: the location of the robot

(0

is the center,

- 2.4

is the leftmost part of the board,

2.4

the rightmost part of the board

)

OpenAI Gym:

You do not have to implement the problem domain yourself, there is a resource called openAl

gym which has a set of common training examples. Gym can be installed with the following

command:After running the provided command, you may also be asked to install some additional

packages for the video encoding. You'll see an error message with instructions to follow.

State Discretization:

We will discretize the space in order to simplify the reinforcement learning algorithm. One

example can be as follows:

?

(

one bucket for

-.08,

one for

> . 08 -.5>.5-50de\frac{g}{s},50de\frac{g}{s}0..161QQp=

random

() Q x p

Choose random action

Else:

Choose action that gives max

Q

value

Your Task

(p a r t 2)

Now that you've implemented

q -

learning for one task, you will move

t o

the mountain car task.

Instead

o f 2

actions

(l e f t,

right

),

this task has three

(l e f t,

null, right

) .

The task also has different

state variables

(o n l y 2)

? x

: the location

o f

the robot

(- 1.2 i s

the left,

- . 45 i s

approximately the valley,

0.6 i s

the

rightmost part

o f

the board,

0.5 i s

the location

o f

the flag

)

?

xdot: the velocity

o f

the robot

(t h i s

can

g o

from

- 0.07 t o 0.07)

This will require you

t o

change the number

o f

bins for state descritization

a s

well

a s

the

alpha and gamma values. Additionally, you need

t o

implement the exploration

v s

exploitation part for this problem

a s

well.

Once your model

i s

trained

i t

will

b e

saved the

Q -

table

a s

'car.

n p y'

file. Make sure

t o

that you

don't change this file name.

- . 5,

one for

> . 5 - . 08,

one for

> . 08

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Database Technology Edbt 94 4th International Conference On Extending Database Technology Cambridge United Kingdom March 1994 Proceedings Lncs 779

Authors: Matthias Jarke ,Janis Bubenko ,Keith Jeffery

1994th Edition

3540578188, 978-3540578185

More Books

Students also viewed these Databases questions

Question

★★★★★

Suppose you coded your 10 participants from Practice 5.1 as such: Participant 1: Pepsi (sugary drink) Participant 6: Water (water) Participant 2: Orange juice (juice) Participant 7: Water (water)...

Answered: 1 week ago

Question

★★★★★

The financial statements for Castile Products, Inc., are given below: Castile Products, Inc. Balance Sheet December 31 Assets Current assets: Cash Accounts receivable, net Merchandise inventory...

Answered: 1 week ago

Question

★★★★★

Simmons Market Research conducted a national consumer study of 13,787 respondents. The respondents were asked to indicate the primary source of the vitamins or mineral supplements they consume....

Answered: 1 week ago

Question

★★★★★

In figure, show that there are wage rates and capital rental costs such that the firm is indifferent between using the wafer-handling stepper technology and the stepper technology. How does this...

Answered: 1 week ago

Question

★★★★★

Please write the code for part one in python. No pusedo code please. Objectives: python. No pusedo code please. To implement a reinforcement based on task - based rewards To take a continuous...

Answered: 1 week ago

Question

★★★★★

Mr. Huskers Tuxedos Corp. ended the year 2021 with an average collection period of 42 days. The firms credit sales for 2021 were $57.1 million. What is the year-end 2021 balance in accounts...

Answered: 1 week ago

Question

★★★★★

What is true about effective change leaders? Group of answer choices They are organized and able to delegate decisions to lower-level leaders. They are both proactive and reactive and able to...

Answered: 1 week ago

Question

★★★★★

The US dollar derives its value Group of answer choices because the government says it's "legal tender" and due to its acceptability all are correct due to its acceptability from the amount of gold...

Answered: 1 week ago

Question

★★★★★

Mark and Betty want to keep $300,000 of their savings Federal Deposit Insurance Corporation (FDIC) insured. How could they accomplish this? A. Divide the money equally into a mutual fund and a...

Answered: 1 week ago

Question

★★★★★

Saved Companies routinely find ways to increase their efficiency by relying more on automation. For example, computer tax software (e.g. Turbotax, H&R Block) allows companies to sell their services...

Answered: 1 week ago

Question

★★★★★

Monetary resources A. Are not necessary if an appropriate culturally sensitive environment is in place. B. Should not be used to hire interpreters. C. Should be used to recruit a diverse workforce....

Answered: 1 week ago

Question

★★★★★

Evaluate the strength of the five forces individually for the fast-food industry.

Answered: 1 week ago

Question

★★★★★

Make a judgement about how attractive an industry like this would be to invest $5 million.

Answered: 1 week ago

Question

★★★★★

5. After watching the video what advice would you give to an employer such as Stephen Elop when they are announcing redundancies.

Answered: 1 week ago

Previous Question Next Question