Show your goal searching process with step to go curve, sum of squared error and or theoretical value table with diagrams and graphs and table for the following below code import numpy as np import random Define the grid world GRID SIZE ( 4 , 5 ) START STATE ( 0 , 0 ) GOAL STATE ( 3 , 4 ) OBSTACLES ( 1 , 1 ) , ( 2 , 2 ) , ( 1 , 3 ) Q learning parameters LEARNING RATE 0 1 DISCOUNT FACTOR 0 9 EPISODES 5 0 0 Initialize Q table q table np zeros ( ( GRID SIZE 0 , GRID SIZE 1 , 4 ) ) 4 actions up , down, left, right Define actions ACTIONS UP , DOWN , LEFT , RIGHT Function to choose an action using epsilon greedy strategy def choose action ( state , epsilon ) if random uniform ( 0 , 1 ) epsilon return random choice ( range ( 4 ) ) choose a random action else return np argmax ( q table state 0 , state 1 ) Function to perform Q learning def q learning ( ) for episode in range ( EPISODES ) state START STATE while state GOAL STATE action choose action ( state , epsilon 0 1 ) next state take action ( state , action ) reward calculate reward ( next state ) update q table ( state , action, reward, next state ) state next state Function to take an action and return the next state def take action ( state , action ) if action 0 UP return ( max ( 0 , state 0 1 ) , state 1 ) elif action 1 DOWN return ( min ( GRID SIZE 0 1 , state 0 1 ) , state 1 ) elif action 2 LEFT return ( state 0 , max ( 0 , state 1 1 ) ) elif action 3 RIGHT return ( state 0 , min ( GRID SIZE 1 1 , state 1 1 ) ) Function to calculate the reward for a given state def calculate reward ( state ) if state GOAL STATE return 1 elif state in OBSTACLES return 1 else return 0 Function to update the Q table based on the Q learning update rule def update q table ( state , action, reward, next state ) best future value np max ( q table next state 0 , next state 1 ) current value q table state 0 , state 1 , action new value ( 1 LEARNING RATE ) current value LEARNING RATE ( reward DISCOUNT FACTOR best future value ) q table state 0 , state 1 , action new value Run Q learning algorithm q learning ( ) Print the learned Q table print ( Learned Q table ) print ( q table )

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

Show your goal searching process with step - to - go curve, sum of squared error and / or theoretical value table with diagrams and

Show your goal searching process with step

-

-

go curve, sum of squared error and

/

or theoretical value table with diagrams and graphs and table for the following below code

import numpy as np

import random

# Define the grid world

GRID

_

SIZE

= (4, 5)

START

_

STATE

= (0, 0)

GOAL

_

STATE

= (3, 4)

OBSTACLES

= [(1, 1), (2, 2), (1, 3)]

# Q

-

learning parameters

LEARNING

_

RATE

= 0.1

DISCOUNT

_

FACTOR

= 0.9

EPISODES

= 500

# Initialize Q

-

table

_

table

=

.

zeros

((

GRID

_

SIZE

[0],

GRID

_

SIZE

[1], 4))

4

actions: up

,

down, left, right

# Define actions

ACTIONS

= ["

",

"DOWN", "LEFT", "RIGHT"

]

# Function to choose an action using epsilon

-

greedy strategy

def choose

_

action

(

state

,

epsilon

)

if random.uniform

(0, 1) <

epsilon:

return random.choice

(

range

(4))

# choose a random action

else:

return np

.

argmax

(

_

table

[

state

[0],

state

[1]])

# Function to perform Q

-

learning

def q

_

learning

()

for episode in range

(

EPISODES

)

state

=

START

_

STATE

while state

! =

GOAL

_

STATE:

action

=

choose

_

action

(

state

,

epsilon

= 0.1)

_

state

=

take

_

action

(

state

,

action

)

reward

=

calculate

_

reward

(

_

state

)

update

_

_

table

(

state

,

action, reward, next

_

state

)

state

=

_

state

# Function to take an action and return the next state

def take

_

action

(

state

,

action

)

if action

= = 0

: # UP

return

(

max

(0,

state

[0] - 1),

state

[1])

elif action

= = 1

: # DOWN

return

(

min

(

GRID

_

SIZE

[0] - 1,

state

[0] + 1),

state

[1])

elif action

= = 2

: # LEFT

return

(

state

[0],

max

(0,

state

[1] - 1))

elif action

= = 3

: # RIGHT

return

(

state

[0],

min

(

GRID

_

SIZE

[1] - 1,

state

[1] + 1))

# Function to calculate the reward for a given state

def calculate

_

reward

(

state

)

if state

= =

GOAL

_

STATE:

return

1

elif state in OBSTACLES:

return

- 1

else:

return

0

# Function to update the Q

-

table based on the Q

-

learning update rule

def update

_

_

table

(

state

,

action, reward, next

_

state

)

best

_

future

_

value

=

.

max

(

_

table

[

_

state

[0],

_

state

[1]])

current

_

value

=

_

table

[

state

[0],

state

[1],

action

]

new

_

value

= (1 -

LEARNING

_

RATE

) *

current

_

value

+

LEARNING

_

RATE

* (

reward

+

DISCOUNT

_

FACTOR

*

best

_

future

_

value

)

_

table

[

state

[0],

state

[1],

action

] =

new

_

value

# Run Q

-

learning algorithm

_

learning

()

# Print the learned Q

-

table

("

Learned Q

-

table:"

)

(

_

table

)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Databases And Information Systems 14th East European Conference Adbis 2010 Novi Sad Serbia September 2010 Proceedings Lncs 6295

Authors: Barbara Catania ,Mirjana Ivanovic ,Bernhard Thalheim

3. Visit the Web site www.careerjournal.com. This Wall Street Journal Web site has articles related to career issues. a. Click on the tab titled Career Strategies. Choose an article to read. b. Write...

Answered: 1 week ago

Previous Question Next Question