Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Reinforcement Learning: The Q - learning Algorithm Please write a code in Python to produce the same outputs as in the pictures but on a

Reinforcement Learning: The Q-learning Algorithm
Please write a code in Python to produce the same outputs as in the pictures but on a bigger grid like 6x6 or 10x10. Please use Python and DO NOT use open AIs gym package!
The taxi driving problem:
There are four designated locations in the grid world indicated by R(ed), G(reen), Y(ellow), and B(lue).
When the episode starts, the taxi starts off at a random square and the passenger is at a random location (R, G, Y or B).
The taxi drives to the passengers location, picks up the passenger, drives to the passengers destination (another one of the four specified locations), and then drops off the passenger. While doing so, our taxi driver needs to drive carefully to avoid hitting any wall, marked as |. Once the passenger is dropped off, the episode ends.
What are the actions the agent can choose from at each step?
0 drive down
1 drive up
2 drive right
3 drive left
4 pick up a passenger
5 drop off a passenger
And the states?
25 possible taxi positions, because the world is a 5x5 grid.
5 possible locations of the passenger, which are R, G, Y, B, plus the case when the passenger is in the taxi.
4 destination locations
Which gives us 25 x 5 x 4=500 states
What about rewards?
-1 default per-step reward. Why -1, and not simply 0? Because we want to encourage the agent to spend the shortest time, by penalizing each extra step. This is what you expect from a taxi driver, dont you?
+20 reward for delivering the passenger to the correct destination.
-10 reward for executing a pickup or dropoff at the wrong location.
Random agent baseline
Before you start implementing any complex algorithm, you should always build a baseline model.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

6. Explain the power of labels.

Answered: 1 week ago

Question

5. Give examples of variations in contextual rules.

Answered: 1 week ago

Question

f. What stereotypes were reinforced in the commercials?

Answered: 1 week ago