Show your goal searching process with step to go curve, sum of squared error and or theoretical value table with diagrams and graphs and table for the following below code import numpy as np import random Define the grid world GRID SIZE ( 4 , 5 ) START STATE ( 0 , 0 ) GOAL STATE ( 3 , 4 ) OBSTACLES ( 1 , 1 ) , ( 2 , 2 ) , ( 1 , 3 ) Q learning parameters LEARNING RATE 0 1 DISCOUNT FACTOR 0 9 EPISODES 5 0 0 Initialize Q table q table np zeros ( ( GRID SIZE 0 , GRID SIZE 1 , 4 ) ) 4 actions up , down, left, right Define actions ACTIONS UP , DOWN , LEFT , RIGHT Function to choose an action using epsilon greedy strategy def choose action ( state , epsilon ) if random uniform ( 0 , 1 ) epsilon return random choice ( range ( 4 ) ) choose a random action else return np argmax ( q table state 0 , state 1 ) Function to perform Q learning def q learning ( ) for episode in range ( EPISODES ) state START STATE while state GOAL STATE action choose action ( state , epsilon 0 1 ) next state take action ( state , action ) reward calculate reward ( next state ) update q table ( state , action, reward, next state ) state next state Function to take an action and return the next state def take action ( state , action ) if action 0 UP return ( max ( 0 , state 0 1 ) , state 1 ) elif action 1 DOWN return ( min ( GRID SIZE 0 1 , state 0 1 ) , state 1 ) elif action 2 LEFT return ( state 0 , max ( 0 , state 1 1 ) ) elif action 3 RIGHT return ( state 0 , min ( GRID SIZE 1 1 , state 1 1 ) ) Function to calculate the reward for a given state def calculate reward ( state ) if state GOAL STATE return 1 elif state in OBSTACLES return 1 else return 0 Function to update the Q table based on the Q learning update rule def update q table ( state , action, reward, next state ) best future value np max ( q table next state 0 , next state 1 ) current value q table state 0 , state 1 , action new value ( 1 LEARNING RATE ) current value LEARNING RATE ( reward DISCOUNT FACTOR best future value ) q table state 0 , state 1 , action new value Run Q learning algorithm q learning ( ) Print the learned Q table print ( Learned Q table ) print ( q table )

Question

Show your goal searching process with step   to   go curve, sum of squared error and   or theoretical value table with diagrams and graphs and table for the following below code import numpy as np import random   Define the grid world GRID   SIZE   ( 4 , 5 ) START   STATE   ( 0 , 0 ) GOAL   STATE   ( 3 , 4 ) OBSTACLES     ( 1 , 1 ) , ( 2 , 2 ) , ( 1 , 3 )     Q   learning parameters LEARNING   RATE   0   1 DISCOUNT   FACTOR   0   9 EPISODES   5 0 0   Initialize Q   table q   table   np   zeros ( ( GRID   SIZE   0   , GRID   SIZE   1   , 4 ) )   4 actions  up , down, left, right   Define actions ACTIONS       UP   ,  DOWN ,  LEFT ,  RIGHT      Function to choose an action using epsilon   greedy strategy def choose   action ( state , epsilon )   if random uniform ( 0 , 1 )   epsilon  return random choice ( range ( 4 ) )   choose a random action else  return np   argmax ( q   table   state   0   , state   1     )   Function to perform Q   learning def q   learning ( )   for episode in range ( EPISODES )   state   START   STATE while state     GOAL   STATE  action   choose   action ( state , epsilon   0   1 ) next   state   take   action ( state , action ) reward   calculate   reward ( next   state ) update   q   table ( state , action, reward, next   state ) state   next   state   Function to take an action and return the next state def take   action ( state , action )   if action     0     UP return ( max ( 0 , state   0     1 ) , state   1   ) elif action     1     DOWN return ( min ( GRID   SIZE   0     1 , state   0     1 ) , state   1   ) elif action     2     LEFT return ( state   0   , max ( 0 , state   1     1 ) ) elif action     3     RIGHT return ( state   0   , min ( GRID   SIZE   1     1 , state   1     1 ) )   Function to calculate the reward for a given state def calculate   reward ( state )   if state     GOAL   STATE  return 1 elif state in OBSTACLES  return   1 else  return 0   Function to update the Q   table based on the Q   learning update rule def update   q   table ( state , action, reward, next   state )   best   future   value   np   max ( q   table   next   state   0   , next   state   1     ) current   value   q   table   state   0   , state   1   , action   new   value   ( 1   LEARNING   RATE )   current   value   LEARNING   RATE   ( reward   DISCOUNT   FACTOR   best   future   value ) q   table   state   0   , state   1   , action     new   value   Run Q   learning algorithm q   learning ( )   Print the learned Q   table print (   Learned Q   table   ) print ( q   table )

Accepted Answer

The Answer is in the image, click to view ...

Question

Show your goal searching process with step - to - go curve, sum of squared error and / or theoretical value table with diagrams and

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Optimizing Data Collection In Warzones

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question