Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

Reinforcement Learning: The Q - learning Algorithm Please write a code in Python to produce the same outputs as in the pictures but on a

Reinforcement Learning: The Q

-

learning Algorithm

Please write a code in Python to produce the same outputs as in the pictures but on a bigger grid like

6

6

10

10 .

Please use Python and DO NOT use open AI

s gym package!

The taxi driving problem:

There are four designated locations in the grid world indicated by R

(

),

(

reen

),

(

ellow

),

and B

(

lue

) .

When the episode starts, the taxi starts off at a random square and the passenger is at a random location

(

,

,

Y or B

) .

The taxi drives to the passenger

s location, picks up the passenger, drives to the passenger

s destination

(

another one of the four specified locations

),

and then drops off the passenger. While doing so

,

our taxi driver needs to drive carefully to avoid hitting any wall, marked as

| .

Once the passenger is dropped off, the episode ends.

What are the actions the agent can choose from at each step?

0

drive down

1

drive up

2

drive right

3

drive left

4

pick up a passenger

5

drop off a passenger

And the states?

25

possible taxi positions, because the world is a

5

5

grid.

5

possible locations of the passenger, which are R

,

,

,

,

plus the case when the passenger is in the taxi.

4

destination locations

Which gives us

25

5

4 = 500

states

What about rewards?

- 1

default per

-

step reward. Why

- 1,

and not simply

0 ?

Because we want to encourage the agent to spend the shortest time, by penalizing each extra step. This is what you expect from a taxi driver, don

t you?

+ 20

reward for delivering the passenger to the correct destination.

- 10

reward for executing a pickup or dropoff at the wrong location.

Random agent baseline

Before you start implementing any complex algorithm, you should always build a baseline model.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Database Technology Edbt 94 4th International Conference On Extending Database Technology Cambridge United Kingdom March 1994 Proceedings Lncs 779

Authors: Matthias Jarke ,Janis Bubenko ,Keith Jeffery

1994th Edition

3540578188, 978-3540578185

More Books

Students also viewed these Databases questions

Question

★★★★★

On March 16, the March Treasury bond futures settlement price was 101 21/32. Assume that the 12 1/2 percent bond maturing in about 22 years is the cheapest bond to deliver. The CF is 1.4639. Assume...

Answered: 1 week ago

Question

★★★★★

Winter Waters started an environmental consulting company and during the first month of operations, the business completed the following transactions. Record each transaction in the journal below....

Answered: 1 week ago

Question

★★★★★

A linear rotary bearing is designed so that the distance between the retaining rings is 0.875 inch. The quality-control manager suspects that the manufacturing process needs to be recalibrated...

Answered: 1 week ago

Question

★★★★★

A worker views leisure and income as goods and has an opportunity to work at an hourly wage of $10 per hour. a. Illustrate the workers opportunity set in a given 24-hour period. b. Suppose the worker...

Answered: 1 week ago

Question

★★★★★

Reinforcement Learning: The Q - learning Algorithm Please write a code in Python to produce the same outputs as in the pictures but on a bigger grid like 6 x 6 or 1 0 x 1 0 . Please use Python and DO...

Answered: 1 week ago

Question

★★★★★

Which yield curve does the current US economy look like? What is the indication of US economy over the next few years

Answered: 1 week ago

Question

★★★★★

accounting 2 unit 6 quiz Ames Corporation's net accounts receivable were $750,000 on December 31, 20X1, and $1,250,000 on December 31, 20X2. Net cash sales for 20X2 were $3,300,000. The accounts...

Answered: 1 week ago

Question

★★★★★

Patients are given a 10-point Likert scale form to fill out after meeting with their doctor. Ayanda has calculated a 92% confidence interval to estimate the true mean satisfaction rating from...

Answered: 1 week ago

Question

★★★★★

Vignana Corporation manufactures and sells hand-painted clay figurines of popular sports heroes. Shown below are some of the costs incurred by Vignana for last year: Cost of clay used in production $...

Answered: 1 week ago

Question

★★★★★

Jennifer Prescott, age 47, has developed chest pain and difficulty breathing Jennifer has had several episodes of coughing up thick blood-tinged sputum A diagnostic bronchoscopy is performed and a...

Answered: 1 week ago

Question

★★★★★

Which option best describes traits of adaptive/agile development approaches? A) Iterative and incremental development approaches, constant feedback loops, changes are expected B) Planning,...

Answered: 1 week ago

Question

★★★★★

6. Explain the power of labels.

Answered: 1 week ago

Question

★★★★★

5. Give examples of variations in contextual rules.

Answered: 1 week ago

Question

★★★★★

f. What stereotypes were reinforced in the commercials?

Answered: 1 week ago

Previous Question Next Question