Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Task 2 : Reinforcement Learning Q - Learning with Smart Taxi ( Self - Driving Cab ) . In the lab, you have been asked

Task 2: Reinforcement Learning
Q-Learning with Smart Taxi (Self-Driving Cab). In the lab, you have been asked to develop a Smart Taxi using Q-Learning algorithm in the following environment: a 5x5 grid:
In this task, you are asked to extend this environment into a bigger grid (so that you do not use Open AIs gym package). There are still four (4) locations that we can pick up and drop off a passenger: R, G, Y,B at the coordinates you set.
The actions and rewards are still the same. The actions are: north, south, east, west, pickup, dropoff.
All the movement actions (north, south, east, west) have a -1 reward and the pickup/dropoff actions have -10 reward in a state with no passengers. If we are in a state where the taxi has a passenger and is on top of the right destination, we would see a reward of 20 at the dropoff action.
(a) Implement the Q-Learning algorithm and solve the Smart Taxi Problem in a language of your choice.
(1) Initialize the Q-table:
(2) Set the hyperparameters: Choose the learning rate (\alpha ), the discount factor (\gamma ), and the exploration rate (\epsi ).
(3) Start training the agent by iterating through episodes:
Initialize the environment: Place the taxi at a coordinate, randomly select a passenger location (R, G, Y, B), and a destination different from the passengers location.
Loop Until the passenger is dropped off at the right destination:
Choose an action: Either explore (choose a random action) with probability \epsi or exploit (choose the action with the highest Q-value for the current state) with probability (1\epsi ).
Perform the action and observe the reward and new state.
Update the Q-table using the formula:
Qnew(state, action) Q(state, action)+\alpha reward +\gamma max a Q(new state, a) Q(state, action)
Update the current state to the new state.
Decay the exploration rate (\epsi ) over time to reduce random exploration and focus on exploiting the learned Q-values.
(4) After enough episodes, the Q-table should converge, and the agent will have learned the optimal policy to solve the taxi problem.
(5) Find the best sequence of actions for any given state by using the learned Q-table and choosing the action with the highest Q-value for that state.
(b) Compare the performance of your Q-Learning agent with a random agent.
(c) Experiment with the use of different learning rate (\alpha ), the discount factor (\gamma ), and the exploration rate (\epsi ).
You need to submit the code and a report on your program design and the experimental results.
The making will be based on the clarity and rationality on your report and the correctness of your code.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Spatio Temporal Database Management International Workshop Stdbm 99 Edinburgh Scotland September 10 11 1999 Proceedings Lncs 1678

Authors: Michael H. Bohlen ,Christian S. Jensen ,Michel O. Scholl

1999th Edition

3540664017, 978-3540664017

More Books

Students also viewed these Databases questions