Question: Inis assignment uses autograding. We rely on the random number generator generating the same random sequence. Same as in Assignment 2 . This time, we

Inis assignment uses autograding. We rely on the random number generator generating the same random sequence. Same as in Assignment

2 .

This time, we will be using the random.choices

()

function as follows.This will give the following output.Note that we obtain

80 %

intendent actions and

10 %

unintended actions here. Make sure that you understand the output and that you can reproduce it on your machine before proceeding. Note that we use anaconda python

3.9

to obtain the above result.

Problem

1

: An MDP Episode

(25

points

)

In this part of the assignment, we are going to play an episode in an MDP by following a given policy. Consider the first test case of problem

1 (

available in the file test

_

cases

/

1 / 1 .

prob

) .

.

The first part of this file specifies an MDP

.

S is the start state with four available actions

(

, E, S, W

is an ordinary state with the same four available actions and

1, - 1

are states where the only available action is exit and the reward are

1

and

- 1

respectively. The reward for action in other states is

- 0.05 .

# is a wall.

Actions are not deterministic in this environment. In this case with noise

= 0.1,

we are successfully acting

80 %

of the time and

20 %

of the time we will act perpendicular to the intended direction with equal probability, i

.

. 10 %,

for each unintended direction. If the agent attempts to move into a wall, the agent will stay in the same position. Note that this MDP is identical to the example that we covered extensively in our class.

The second part of this file specifies the policy to be executed.

As usual, your first task is to implement the parsing of this grid MDP in the function read

_

grid

_

mdp

_

problem

_

1 (

file

_

path

)

of the file

parse.py

.

You may use any appropriate data structure.

Next, you should implement running the episode in the function play

_

episode

(

problem

)

in the file

1 .

.

Below is the expected output. Note that we always use exactly

5

characters for the output of a single grid and that the last line does not contain a new line.

Taking action: W

(

intended:

N)

Reward recelved:

- 0.05

New state:

, -, 1 \frac{?}{b}

a r (p), -, - 1

Cunulative rew

rd s

- 0.1

Taking action:

N (

intended:

N)

Reward received:

- 0.05

New state:

\frac{,}{b} a r \frac{p}{b} a r (n), -, 1

s

, S

Cunulativ

rew

\

bar

(

)

rd s

- 0.15

Taking action:

N (

Intended:

N)

Reward recelved:

- 0.05

New state:

, P = 1

\frac{?}{b a r} (S) - 1

\frac{,}{b} a r (s)

Cumulativ

rew

\

bar

(

)

rd s

- 0.2

Taking action:

S (

Intended:

E)

Reward recelved:

- 0.05

New state:

\frac{,}{b} a r \frac{p}{b} a r (s) = - 1

, S - - -

Cumulativ

rew

\

bar

(

)

rd s

- 0.25

- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Taking action:

N (

intended:

N)

Reward received:

- 0.05

New state:

\frac{,}{b} a r (s)

Cumulative rew

\

bar

(

)

rd s

- 0.3

Taking action: E

(

intended: E

)

Reward recelved:

- 0.65

New state:

,, P, 1 \frac{?}{b}

a r (s), -, - 1

Cumulativ

rew

\

bar

(

)

rd s

- 0.35

Taking action: E

(

intended: E

)

Reward received:

- 0.05

New state:

, -, 1 \frac{?}{b}

a r (s), -, - 1

\frac{,}{b} a r (s) - \frac{-}{- 1}

Cumulativ

rew

\

bar

(

)

rd s

- 0.4

Taking action: E

(

intended: E

)

Reward recelved:

= 0.05

New state:

-, -, - \frac{?}{b}

a r (s), -, - 1

Cumulative rew

rd s

- 6^{- 6} . 45

Taking action: exit

(

Intended; exit

)

Reward received:

1.0

w state:

, - \frac{,}{b} a r (2) \frac{?}{b}

a r (2), - 1

,

- - -

Cumulativ

rew

\

bar

(

)

rd s

0.55

As you can see, in this question we don't use any discount factor. We will introduce that in the next question. You can also try some of the other test cases such as test

_

cases

/

1 / 8 .

prob.With a correct implementation, you should be able to pass all test cases.

parse.py

> o x

read

_

grid

_

mdp

_

problem

_

1

1

def read

_

grid

_

mdp

_

problem

_

1 (

file

_

path

)

#Your p

1

code here

problem

=''

return problem

def read

_

grid

_

mdp

_

problem

_

2 (

file

_

path

)

#Your p

2

code here.

problem

=''

return problem

def read

_

grid

_

mdp

_

problem

_

3 (

file

_

path

)

#Your p

3

code here

problem

=''

return problem

Inis assignment uses autograding. We rely on the

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

Develop a system flowchart and then write a menu-driven C++ program that uses user-defined functions arrays, and a random number generator. Upon program execution, the screen will be cleared and the...

the bottom of the main loop (after getting user input), increment the current player. Then, if the number is too high, reset it to 0. Before printing whose turn it is, print the board using one of...

The main purpose of this assignment is to give you practice using two-dimensional arrays, including passing two-dimensional arrays to functions. For Part A, you will add constants and functions...

Urgent Please Help!!! CSCI 1913: Introduction to Algorithms, Data Structures, and Program Development 0. Introduction. In this assignment, you will write a Python program that is given example words,...

Python3 question! 0. Introduction. In this assignment, you will write a Python program that is given example words, and generates random words that resemble them. For example, after being given the...

Due tonight, please help urgent! 0. Introduction. In this assignment, you will write a Python program that is given example words, and generates random words that resemble them. For example, after...

c++ Overview In this assignment, you will simulate a simple board game. The board is a grid, and starts with a pile of money in each cell. Players take turns rolling four dice to pick a cell, and...

0. Introduction. In this assignment, you will write a Python program that is given example words, and generates random words that resemble them. For example, after being given the last names of all...

*************** Given card.hxx ******************************* #ifndef card_hxx_ #define card_hxx_ //=========================================================================== #include #include...

Just need some help on this and the parts that will follow The cost of the land - The total cost of the land improvements- The cost of the building- and which or what of these things will depreciate?...

Let T and TB be the operators whose standard matrices are given. Find the standard matrices for TBO TA and TA O TB. A = 2 3 3 3 0 -4 1 7 3 B = 5 - -1 1 1 5 -2 4 1 9

After the vendor master file data has been entered, to best improve internal controls the Question 6 options: transaction file detail should be matched to the vendor master file. data entry should be...

Pharoah Company reported the following amounts for 2022: Raw materials purchased $95,200 Beginning raw materials inventory 5,824 Ending raw materials inventory 5,040 Beginning finished goods...