Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 12, 2024

The goal of this assignment is to implement QLearning method on Taxi - v 3 enviroment at openai gym framework. Your task in this enviroment

The goal of this assignment is to implement QLearning method on Taxi

-

3

enviroment at openai gym framework.

Your task in this enviroment is to pick up the passenger at one location and drop him off in another, located at possible

4

locations

(

labeled by different letters

) .

In the example given below, you are expected to pick him up at Y and drop him at G

.

You receive

+ 20

points for a successful dropoff, and lose

1

point for every timestep it takes. There is also a

10

point penalty for illegal pick

-

up and drop

-

off actions.

Note that dynamics of the model are assumed to be unknown.

below is the original code, impliment the QLearning method accordingly

import gymnasium as gym

import time

import numpy as np

import os

import random

def qLearning

(

env

)

=

env.observation

_

space.n

=

env.action

_

space.n

=

.

zeros

([

,

],

dtype

=

.

int

32)

alpha

= 0.8

gamma

= 0.9

epsilon

= 1

num

_

iter

= 10000

for i in range

(

num

_

iter

)

,

actions

=

env. reset

()

for step in range

(100)

action

=

env.action

_

space.sample

()

#action

=

.

argmax

(

[

])

,

reward, done, info

=

env.step

(

action

)

[

,

action

] =

[

,

action

] +

alpha

* (

reward

+

gamma

*

.

max

(

[

,

]) -

[

,

action

])

=

if i

% 1000 = = 0

(

"

Episode

{

} ")

return Q

def SARSA

(

env

)

=

env.observation

_

space.n

=

env.action

_

space.n

=

.

zeros

([

,

],

dtype

=

.

int

32)

alpha

= 0.8

gamma

= 0.9

epsilon

= 1

num

_

iter

= 1000

for i in range

(

num

_

iter

)

,

actions

=

env.reset

()

=

env.action

_

space.sample

()

for step in range

(100)

,

reward, done, truncated, info

=

env. step

(

)

=

.

argmax

(

[

])

[

,

] =

[

,

] +

alpha

* (

reward

+

gamma

*

[

,

] -

[

,

])

=

=

if i

% 1000 = = 0

(

"

Episode

{

} ")

return Q

env

=

gym.make

('

Taxi

-

3',

render

_

mode

=

"human"

)

observation,info

=

env.reset

()

=

SARSA

(

env

)

observation

=

env. reset

()

done

=

False

sumreward

= 0

while not done:

.

system

('

cls

')

env. render

()

action

=

.

argmax

(

[

observation

])

observation, reward, done, truncated, info

=

env. step

(

action

)

sumreward

+ =

reward

time.sleep

(0.5)

if done:

observation

=

env. reset

()

('

done with reward:

',

reward

)

env. close

()

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases Illuminated

Authors: Catherine Ricardo

2nd Edition

1449606008, 978-1449606008

More Books

Students also viewed these Databases questions

Question

★★★★★

Find the area under a normal distribution curve with = 18.3 and = 3.4 a. To the left of x = 10.9 b. To the right of x = 14 c. To the left of x = 22.7 d. To the right of x = 29.2

Answered: 1 week ago

Question

★★★★★

=+49. The article cited in Exercise 21 also reported on another experiment in which the authors investigated whether the percent by weight of nickel in the alloy layer is affected by niobium powder...

Answered: 1 week ago

Question

★★★★★

The following scores represent a nurses assessment (X) and a physicians assessment (Y) of the condition of 10 patients at time of admission to a trauma center. X: 18 13 18 15 10 12 8 4 7 3 Y: 23 20...

Answered: 1 week ago

Question

★★★★★

A company has the following data: net sales, $405,000; cost of goods sold, $220,000; selling expenses, $90,000; general and administrative expenses, $60,000; interest expense, $4,000; and interest...

Answered: 1 week ago

Question

★★★★★

At Exodus Inc., 40,000 units are produced and 30,000 units are sold for a total of $720,000 in the first year of operations, resulting in operating income of $240,000. Fixed manufacturing costs are...

Answered: 1 week ago

Question

★★★★★

A real estate project has a beta with respect to the S&P 500 of 1.44. The expected return on the market is 7.0%. The relevant risk free rate is 0.80%. What is the minimum return a diversified...

Answered: 1 week ago

Question

★★★★★

2.) Suppose that X~ NB(4, 0.3), and Y = (X-3) 1/2. Determine the PMF of Y, (y). 3.) Suppose that X~ Exp(1/4) and Y = x - 2. Determine the PDF of Y, fy(y).

Answered: 1 week ago

Question

★★★★★

Let's consider that we have trained a word2vec model from scratch with a vocabulary size of 20,000, having 500 neurons in the hidden layer. What will be the size of the word embedding of each word?...

Answered: 1 week ago

Question

★★★★★

From the following Information for the month ending October, 2013, prepare Process Cost accounts for Process III. Use First-in-fist-out (FIFO) method to value equivalent production. Direct materials...

Answered: 1 week ago

Question

★★★★★

Why is the government responsible for long-term care? Or your parents or neighbors or friends LTC needs and costs? At the tune of $40,000 a year for a long-term care stay for skilled nursing who is...

Answered: 1 week ago

Question

★★★★★

The quality manager of a computer services company wants to create a specification for the amount of time it takes the company to respond to problems reported by customers. The goal time limits is 10...

Answered: 1 week ago

Question

★★★★★

4. What are unfair labour practices? What are the consequences of unfair labour practices? Use examples to explain your answer.

Answered: 1 week ago

Question

★★★★★

2. What major HR issues must be addressed as an organization moves from an international form to a multidomestic, global, and transnational form?

Answered: 1 week ago

Question

★★★★★

OUTCOME 2 Explain why employees join unions and describe the process by which unions organize employees and gain recognition as their bargaining agent.

Answered: 1 week ago

Previous Question Next Question