Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 13, 2024

Task 2 : Reinforcement Learning Q - Learning with Smart Taxi ( Self - Driving Cab ) . In the lab, you have been asked

Task

2

: Reinforcement Learning

-

Learning with Smart Taxi

(

Self

-

Driving Cab

) .

In the lab, you have been asked to develop a Smart Taxi using Q

-

Learning algorithm in the following environment: a

5

5

grid:

In this task, you are asked to extend this environment into a bigger grid

(

so that you do not use Open AI

s gym package

) .

There are still four

(4)

locations that we can pick up and drop off a passenger: R

,

,

,

B at the coordinates you set.

The actions and rewards are still the same. The actions are: north, south, east, west, pickup, dropoff.

All the movement actions

(

north

,

south, east, west

)

have a

- 1

reward and the pickup

/

dropoff actions have

- 10

reward in a state with no passengers. If we are in a state where the taxi has a passenger and is on top of the right destination, we would see a reward of

20

at the dropoff action.

(

)

Implement the Q

-

Learning algorithm and solve the Smart Taxi Problem in a language of your choice.

(1)

Initialize the Q

-

table:

(2)

Set the hyperparameters: Choose the learning rate

(\

alpha

),

the discount factor

(\

gamma

),

and the exploration rate

(\

epsi

) .

(3)

Start training the agent by iterating through episodes:

Initialize the environment: Place the taxi at a coordinate, randomly select a passenger location

(

,

,

,

),

and a destination different from the passenger

s location.

Loop Until the passenger is dropped off at the right destination:

Choose an action: Either explore

(

choose a random action

)

with probability

\

epsi or exploit

(

choose the action with the highest Q

-

value for the current state

)

with probability

(1 \

epsi

) .

Perform the action and observe the reward and new state.

Update the Q

-

table using the formula:

Qnew

(

state

,

action

)

(

state

,

action

) + \

alpha reward

+ \

gamma max a Q

(

new state, a

)

(

state

,

action

)

Update the current state to the new state.

Decay the exploration rate

(\

epsi

)

over time to reduce random exploration and focus on exploiting the learned Q

-

values.

(4)

After enough episodes, the Q

-

table should converge, and the agent will have learned the optimal policy to solve the taxi problem.

(5)

Find the best sequence of actions for any given state by using the learned Q

-

table and choosing the action with the highest Q

-

value for that state.

(

)

Compare the performance of your Q

-

Learning agent with a random agent.

(

)

Experiment with the use of different learning rate

(\

alpha

),

the discount factor

(\

gamma

),

and the exploration rate

(\

epsi

) .

You need to submit the code and a report on your program design and the experimental results.

The making will be based on the clarity and rationality on your report and the correctness of your code.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Spatio Temporal Database Management International Workshop Stdbm 99 Edinburgh Scotland September 10 11 1999 Proceedings Lncs 1678

Authors: Michael H. Bohlen ,Christian S. Jensen ,Michel O. Scholl

1999th Edition

3540664017, 978-3540664017

More Books

Students also viewed these Databases questions

Question

★★★★★

Matthews Produce harvests and sells Florida oranges. Matthews has hired you to determine its return on investment (ROI) based on both net book value and on gross book value. You are given that...

Answered: 1 week ago

Question

★★★★★

a. What does a debit balance in Cash Short and Over mean? b. Where does a debit balance in Cash Short and Over appear in the financial statements? c. What does a credit balance in Cash Short and Over...

Answered: 1 week ago

Question

★★★★★

=+2. Johnson-Laird and Steedman (1978) presented the following premises to participants drawn from students at Columbia Teachers College: All gourmets are shopkeepers. All bowlers are shopkeepers....

Answered: 1 week ago

Question

★★★★★

A simple random sample of 5 months of sales data provided the following information: a. Develop a point estimate of the population mean number of units sold per month. b. Develop a point estimate of...

Answered: 1 week ago

Question

★★★★★

Task 2 : Reinforcement Learning Q - Learning with Smart Taxi ( Self - Driving Cab ) . In the lab, you have been asked to develop a Smart Taxi using Q - Learning algorithm in the following...

Answered: 1 week ago

Question

★★★★★

Facey Co. manufactures beanies. The budgeted units to be produced and sold are below: Expected Production Expected Sales August 2,500 6,200 September 3,700 7,100 It takes 2 metres of yarn to produce...

Answered: 1 week ago

Question

★★★★★

Mulalo is the PR manager of Power Chain Stores. Which one of the following options is not related to Mulalo's PR activities? a. The planning and execution of the company's PR strategy should be...

Answered: 1 week ago

Question

★★★★★

Consider the following. f(x) = x + 20x - 125 (a) Write the polynomial as the product of factors that are irreducible over the rationals. f(x) = (b) Write the polynomial as the product of linear and...

Answered: 1 week ago

Question

★★★★★

The length and width of a cardboard are 26 ft and 12 ft. We are going to cut out the corners and fold up the sides to form a box. Determine the height of the box that will give maximum volume. Note:...

Answered: 1 week ago

Question

★★★★★

9.11 A multihospital system would like to compare the productivity of nursing staff in three of its cardiac intensive care units.Table EX 9.11 Measurement Unit 1 Unit 2 Unit 3 Annual Hours 185,200...

Answered: 1 week ago

Question

★★★★★

An inverted pendulum on a rolling disk is shown below. The disk has a radius R and mass M and is rolling on a at surface without slipping. The inverted pendulum consists of a rod of length L and mass...

Answered: 1 week ago

Question

★★★★★

9. Are your presentation aids solid, and do they back up your main points (without overwhelming your speech)?

Answered: 1 week ago

Question

★★★★★

7. Have you defined your terms clearly and related unfamiliar terms to familiar ideas?

Answered: 1 week ago

Question

★★★★★

3. Locate a persuasive speech that you found particularly compelling. Print it out and edit it, removing any and all of the material that you feel is persuasive in nature (for example, the speakers...

Answered: 1 week ago

Previous Question Next Question