Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Show your goal searching process with step - to - go curve, sum of squared error and / or theoretical value table with diagrams and

Show your goal searching process with step-to-go curve, sum of squared error and/or theoretical value table with diagrams and graphs and table for the following below code
import numpy as np
import random
# Define the grid world
GRID_SIZE =(4,5)
START_STATE =(0,0)
GOAL_STATE =(3,4)
OBSTACLES =[(1,1),(2,2),(1,3)]
# Q-learning parameters
LEARNING_RATE =0.1
DISCOUNT_FACTOR =0.9
EPISODES =500
# Initialize Q-table
q_table = np.zeros((GRID_SIZE[0], GRID_SIZE[1],4)) # 4 actions: up, down, left, right
# Define actions
ACTIONS =["UP", "DOWN", "LEFT", "RIGHT"]
# Function to choose an action using epsilon-greedy strategy
def choose_action(state, epsilon):
if random.uniform(0,1)< epsilon:
return random.choice(range(4)) # choose a random action
else:
return np.argmax(q_table[state[0], state[1]])
# Function to perform Q-learning
def q_learning():
for episode in range(EPISODES):
state = START_STATE
while state != GOAL_STATE:
action = choose_action(state, epsilon=0.1)
next_state = take_action(state, action)
reward = calculate_reward(next_state)
update_q_table(state, action, reward, next_state)
state = next_state
# Function to take an action and return the next state
def take_action(state, action):
if action ==0: # UP
return (max(0, state[0]-1), state[1])
elif action ==1: # DOWN
return (min(GRID_SIZE[0]-1, state[0]+1), state[1])
elif action ==2: # LEFT
return (state[0], max(0, state[1]-1))
elif action ==3: # RIGHT
return (state[0], min(GRID_SIZE[1]-1, state[1]+1))
# Function to calculate the reward for a given state
def calculate_reward(state):
if state == GOAL_STATE:
return 1
elif state in OBSTACLES:
return -1
else:
return 0
# Function to update the Q-table based on the Q-learning update rule
def update_q_table(state, action, reward, next_state):
best_future_value = np.max(q_table[next_state[0], next_state[1]])
current_value = q_table[state[0], state[1], action]
new_value =(1- LEARNING_RATE)* current_value + LEARNING_RATE *(reward + DISCOUNT_FACTOR * best_future_value)
q_table[state[0], state[1], action]= new_value
# Run Q-learning algorithm
q_learning()
# Print the learned Q-table
print("Learned Q-table:")
print(q_table)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Optimizing Data Collection In Warzones

Authors: Aaget Aamber

1st Edition

B0CQRRFP5F, 979-8869065902

More Books

Students also viewed these Databases questions

Question

Use mesh analysis to find V0 in the circuit shown. 6 45V j2 1240I

Answered: 1 week ago