Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 23, 2024

import numpy as np import random # Define the grid world GRID _ SIZE = ( 4 , 5 ) START _ STATE = (

import numpy as np

import random

# Define the grid world

GRID

_

SIZE

= (4, 5)

START

_

STATE

= (0, 0)

GOAL

_

STATE

= (3, 4)

OBSTACLES

= [(1, 1), (2, 2), (1, 3)]

# Q

-

learning parameters

LEARNING

_

RATE

= 0.1

DISCOUNT

_

FACTOR

= 0.9

EPISODES

= 500

# Initialize Q

-

table

_

table

=

.

zeros

((

GRID

_

SIZE

[0],

GRID

_

SIZE

[1], 4))

4

actions: up

,

down, left, right

# Define actions

ACTIONS

= ["

",

"DOWN", "LEFT", "RIGHT"

]

# Function to choose an action using epsilon

-

greedy strategy

def choose

_

action

(

state

,

epsilon

)

if random.uniform

(0, 1) <

epsilon:

return random.choice

(

range

(4))

# choose a random action

else:

return np

.

argmax

(

_

table

[

state

[0],

state

[1]])

# Function to perform Q

-

learning

def q

_

learning

()

for episode in range

(

EPISODES

)

state

=

START

_

STATE

while state

! =

GOAL

_

STATE:

action

=

choose

_

action

(

state

,

epsilon

= 0.1)

_

state

=

take

_

action

(

state

,

action

)

reward

=

calculate

_

reward

(

_

state

)

update

_

_

table

(

state

,

action, reward, next

_

state

)

state

=

_

state

# Function to take an action and return the next state

def take

_

action

(

state

,

action

)

if action

= = 0

: # UP

return

(

max

(0,

state

[0] - 1),

state

[1])

elif action

= = 1

: # DOWN

return

(

min

(

GRID

_

SIZE

[0] - 1,

state

[0] + 1),

state

[1])

elif action

= = 2

: # LEFT

return

(

state

[0],

max

(0,

state

[1] - 1))

elif action

= = 3

: # RIGHT

return

(

state

[0],

min

(

GRID

_

SIZE

[1] - 1,

state

[1] + 1))

# Function to calculate the reward for a given state

def calculate

_

reward

(

state

)

if state

= =

GOAL

_

STATE:

return

1

elif state in OBSTACLES:

return

- 1

else:

return

0

# Function to update the Q

-

table based on the Q

-

learning update rule

def update

_

_

table

(

state

,

action, reward, next

_

state

)

best

_

future

_

value

=

.

max

(

_

table

[

_

state

[0],

_

state

[1]])

current

_

value

=

_

table

[

state

[0],

state

[1],

action

]

new

_

value

= (1 -

LEARNING

_

RATE

) *

current

_

value

+

LEARNING

_

RATE

* (

reward

+

DISCOUNT

_

FACTOR

*

best

_

future

_

value

)

_

table

[

state

[0],

state

[1],

action

] =

new

_

value

# Run Q

-

learning algorithm

_

learning

()

# Print the learned Q

-

table

("

Learned Q

-

table:"

)

(

_

table

)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Big Data With Hadoop MapReduce A Classroom Approach

Authors: Rathinaraja Jeyaraj ,Ganeshkumar Pugalendhi ,Anand Paul

1st Edition

1774634848, 978-1774634844

More Books

Students also viewed these Databases questions

Question

★★★★★

Explain the difference between the effects of liabilities of an S corporation on a shareholder's stock basis and the effect of liabilities of a partnership on a partner's partnership interest basis.

Answered: 1 week ago

Question

★★★★★

What are the determinants of poverty?

Answered: 1 week ago

Question

★★★★★

Give an example of one of the best displays of interpersonal skills you have seen in a PowerPoint presentation by one of your professors.

Answered: 1 week ago

Question

★★★★★

H&R Blocks tax filing service allows customers to obtain faster tax refunds in two ways: (1) the customer can pay $25 for H&R Block to file the return with the Internal Revenue Service...

Answered: 1 week ago

Question

★★★★★

import numpy as np import random # Define the grid world GRID _ SIZE = ( 4 , 5 ) START _ STATE = ( 0 , 0 ) GOAL _ STATE = ( 3 , 4 ) OBSTACLES = [ ( 1 , 1 ) , ( 2 , 2 ) , ( 1 , 3 ) ] # Q - learning...

Answered: 1 week ago

Question

★★★★★

PE 1+1 P: P. 3. Let's define the gross stock return Ras R = + = BED2 ... Dividends Du grow at the growth rate G, i.e., De+1 = GXDPlease derive the Gordon constant growth model (which we teach in the...

Answered: 1 week ago

Question

★★★★★

Determine the amplitude and period of each function.

Answered: 1 week ago

Question

★★★★★

23. Consider the following frequency distribution of weights of 150 bolts: Weight (grams) Frequency 5.00 and less than 5.01 4 5.01 and less than 5.02 18 5.02 and less than 5.03 25 5.03 and less than...

Answered: 1 week ago

Question

★★★★★

Evan Corporation provided consulting services for Kensington Company in year 1. Evan incurred costs of $60,000 associated with the consulting and billed Kensington $90,000. Evan paid $40,000 of its...

Answered: 1 week ago

Question

★★★★★

The price per gallon of gas data set has a mean of $2.98 and a standard deviation of $1.07. The high school SAT scores data set has a mean of 1015 and a standard deviation of 165. Calculate the...

Answered: 1 week ago

Question

★★★★★

7. Question 7 Refer to Step 3.3. In the "Unconstrained " or "Short Selling " version of the optimal risky portfolio, what is the portfolio mean ? Write your answer as a percentage ,with no percentage...

Answered: 1 week ago

Question

★★★★★

(Appendices) AGING RECEIVABLES AND UNCOLLECTIBLE ACCOUNT EXPENSE. Perkinson Corporation sells paper products to a large number of retailers. Perkinsons accountant has prepared the following aging...

Answered: 1 week ago

Question

★★★★★

(Appendices) INTERNAL CONTROL FOR SALES. Yancys Hardware has three stores. Each store manager is paid a salary plus a bonus on the sales made by his or her store. On January 5, 19x6, Bill Slick,...

Answered: 1 week ago

Question

★★★★★

(Appendices) What is a sales discount? How can sales discounts be recorded? LO9

Answered: 1 week ago

Previous Question Next Question