Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

import numpy as np import random # Define the grid world GRID _ SIZE = ( 4 , 5 ) START _ STATE = (

import numpy as np

import random

# Define the grid world

GRID

_

SIZE

= (4, 5)

START

_

STATE

= (0, 0)

GOAL

_

STATE

= (3, 4)

OBSTACLES

= [(1, 1), (2, 2), (1, 3)]

# Q

-

learning parameters

LEARNING

_

RATE

= 0.1

DISCOUNT

_

FACTOR

= 0.9

EPISODES

= 500

# Initialize Q

-

table

_

table

=

.

zeros

((

GRID

_

SIZE

[0],

GRID

_

SIZE

[1], 4))

4

actions: up

,

down, left, right

# Define actions

ACTIONS

= ["

",

"DOWN", "LEFT", "RIGHT"

]

# Function to choose an action using epsilon

-

greedy strategy

def choose

_

action

(

state

,

epsilon

)

if random.uniform

(0, 1) <

epsilon:

return random.choice

(

range

(4))

# choose a random action

else:

return np

.

argmax

(

_

table

[

state

[0],

state

[1]])

# Function to perform Q

-

learning

def q

_

learning

()

for episode in range

(

EPISODES

)

state

=

START

_

STATE

while state

! =

GOAL

_

STATE:

action

=

choose

_

action

(

state

,

epsilon

= 0.1)

_

state

=

take

_

action

(

state

,

action

)

reward

=

calculate

_

reward

(

_

state

)

update

_

_

table

(

state

,

action, reward, next

_

state

)

state

=

_

state

# Function to take an action and return the next state

def take

_

action

(

state

,

action

)

if action

= = 0

: # UP

return

(

max

(0,

state

[0] - 1),

state

[1])

elif action

= = 1

: # DOWN

return

(

min

(

GRID

_

SIZE

[0] - 1,

state

[0] + 1),

state

[1])

elif action

= = 2

: # LEFT

return

(

state

[0],

max

(0,

state

[1] - 1))

elif action

= = 3

: # RIGHT

return

(

state

[0],

min

(

GRID

_

SIZE

[1] - 1,

state

[1] + 1))

# Function to calculate the reward for a given state

def calculate

_

reward

(

state

)

if state

= =

GOAL

_

STATE:

return

1

elif state in OBSTACLES:

return

- 1

else:

return

0

# Function to update the Q

-

table based on the Q

-

learning update rule

def update

_

_

table

(

state

,

action, reward, next

_

state

)

best

_

future

_

value

=

.

max

(

_

table

[

_

state

[0],

_

state

[1]])

current

_

value

=

_

table

[

state

[0],

state

[1],

action

]

new

_

value

= (1 -

LEARNING

_

RATE

) *

current

_

value

+

LEARNING

_

RATE

* (

reward

+

DISCOUNT

_

FACTOR

*

best

_

future

_

value

)

_

table

[

state

[0],

state

[1],

action

] =

new

_

value

# Run Q

-

learning algorithm

_

learning

()

# Print the learned Q

-

table

("

Learned Q

-

table:"

)

(

_

table

)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Databases And Information Systems 14th East European Conference Adbis 2010 Novi Sad Serbia September 2010 Proceedings Lncs 6295

Authors: Barbara Catania ,Mirjana Ivanovic ,Bernhard Thalheim

2010th Edition

3642155758, 978-3642155758

Students also viewed these Databases questions

Question

★★★★★

Give structures for compounds R-W: CHal H.O) Ag20 H20 N-Methylpiperidine- R (CH16NI) 0-5 (C,HyNO) at CHal H2O

Answered: 1 week ago

Question

★★★★★

3 8 . : -

Answered: 1 week ago

Question

★★★★★

Address an envelope properly.

Answered: 1 week ago

Question

★★★★★

On December 21, 2012, Zurich Company provided you with the following information regarding its trading investments. During 2013, Carolina Company shares were sold for $9,500. The fair value of the...

Answered: 1 week ago

Question

★★★★★

import numpy as np import random # Define the grid world GRID _ SIZE = ( 4 , 5 ) START _ STATE = ( 0 , 0 ) GOAL _ STATE = ( 3 , 4 ) OBSTACLES = [ ( 1 , 1 ) , ( 2 , 2 ) , ( 1 , 3 ) ] # Q - learning...

Answered: 1 week ago

Question

★★★★★

The two layers of design in every slide are the O front and the back. color and the white space O foreground and the background. O formal and the informal. verbal and the visual. Previous Next >...

Answered: 1 week ago

Question

★★★★★

HO Designs experienced the following events during Year 1, its first year of operation:\ \ Started the business when it acquired $92,000 cash from the issue of common stock.\ \ Paid $39,000 cash to...

Answered: 1 week ago

Question

★★★★★

Equity in Net Income and Eliminating Entries, Intercompany Asset Transfers and Services On January 1, 2018, Pohang Company acquired all of Suro Corporations voting common stock for $1,500,000. The...

Answered: 1 week ago

Question

★★★★★

Prevosti Farms and Sugarhouse pays its employees according to their job classification. The following employees make up Sugarhouse's staff: Employee Number Name and Address Payroll information...

Answered: 1 week ago

Question

★★★★★

The interviews ___ for potential candidates. Group of answer choices is reserved are reserved am reserved

Answered: 1 week ago

Question

★★★★★

= 1.0 John used an existing reactor in his plant to perform a liquid phase reaction AB to produce product B for his customer. This reaction has a rate constant k mol/(m min). The feed to this reactor...

Answered: 1 week ago

Question

★★★★★

4. What actions have you taken in the past that have helped you to cope successfully with stressful circumstances? Could any of these approaches be useful to Diane?

Answered: 1 week ago

Question

★★★★★

5. What might you do in your organization to encourage laughter and fun at work in a way that contributes to stress reduction and increased effectiveness and productivity?

Answered: 1 week ago

Question

★★★★★

2. What did you do in response to the challenge? How did your responses improve your ability to cope or make you stronger, more flexible, or capable?

Answered: 1 week ago

Previous Question Next Question