Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 30, 2024

The parameters , , , and captures the probability distributions of state transition and reward. In this section, you will compute the optimal policy for

The parameters

,

,,

and

captures the probability distributions of state transition and reward. In

this section, you will compute the optimal policy for problem

3

assuming that

,

,,

and

are known.

Deliverables:

1 . (15

marks

)

Implement value iteration to solve the Bellman optimality equation for problem

3 .

You

MUST implement this code in the function value

_

iteration

()

in the Python script planning.py that

is provided to you. The function MUST return the optimal value function and the optimal policy.

You MUST NOT include anything in report.docx for this part.

2 . (15

marks

)

Implement value iteration to solve the Bellman optimality equation for problem

3 .

You

MUST implement this code in the function policy

_

iteration

()

in the Python script planning.py that

is provided to you. The function MUST return the optimal value function and the optimal policy.

You MUST NOT include anything in the report.docx for this part.

3 .

Using the codes for value and policy iteration answer the following questions. For all these

questions, your answer MUST be in report.docx ONLY. Any code that you write to answer these

questions MUST NOT be there in planning.py or report.docx.

. (2

marks

)

Compute average computation time of value iteration and policy iteration over

5

runs. For each run, you must use different values of

,

,,

and

.

You can import the

time library and use the time.time

()

function for this question. Which approach is faster?

. (2

marks

)

It seems intuitive that the agent should save more non

-

renewable energy in the

battery when the non

-

renewable energy production is high compared to when it is low.

Document necessary analysis

(

maybe plots

)

in report.docx that either validates this

intuition or negate it

.

Use your imagination to decide what analysis you want to do

.

. (3

marks

)

In Problem

3,

the power plant has an additional constraint compared to Problem

2 .

Hence, it seems intuitive that the

-

discounted cost for Problem

3

should be more than

Problem

2 .

Document necessary analysis

(

maybe plots

)

in report.docx that either validates

this intuition or negate it

.

Use your imagination to decide what analysis you want to do

.

. (3

marks

)

It seems intuitive that battery becomes more important when the renewable

energy is highly fluctuating

(

higher standard deviation

) .

This is because the job of the

battery is to smoothen out the fluctuation in renewable energy production

(

as mentioned

in the introduction section

) .

Hence, the difference between the

-

discounted cost of

Problems

1

and

2

should increase as the standard deviation of

increases. Document

necessary analysis

(

maybe plots

)

in report.docx that either validates this intuition or negate

.

Use your imagination to decide what analysis you want to do

.

(

PTO

)

Read these points before attempting the deliverables:

1 .

For all the above deliverables, set

= = = 10, = 2,

and

= 0.95

unless mentioned

otherwise.

should be varied between

0.25

4 .

2 .

In the folder titled data, you have eight transition probability matrices,

,

that you can use to test

your code. The instruction to load these matrices is there in planning.py

.

3 .

The function prob

_

vector

_

generator

()

can be used to get the probability distribution,

,

that has

a pre

-

specified mean and standard deviation. The instruction to use this function is there in

planning.py

.

4 .

For a given set of system parameters, the optimal value function obtained using value and policy

iteration should almost be the same. Otherwise, there is something wrong with your code.

5 .

planning.py is just the skeleton code. You have to add necessary arguments to both

value

_

iteration

()

and policy

_

iteration

() .

You may also add additional functions in planning.py if it

helps in improving the code.

6 .

While coding the Q

-

function, you may want to use the broadcasting operation of Numpy to speed

your computation. In my laptop, one run of policy iteration was not taking more than

5

minutes.

You can use this information to benchmark you code.

7 .

To answer

3

c and

3

,

you do not have to code value iteration and policy iteration for Problems

1

and

2

separately. By setting

= 0

and

=,

we can convert Problem

3

to Problem

1 .

Similarly, by setting

=,

we can convert Problem

3

to Problem

2 .

8 .

You must take into consideration that different states may have different action space. This means

a few things. First, while implementing value

/

policy iteration, the maxima

/

minima should be over

the action space corresponding to the state. Second, for policy iteration, the policy should be

initialized such that the actions corresponding to a state are in the action space of the state.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro SQL Server Wait Statistics

Authors: Enrico Van De Laar

1st Edition

★★★★★

What are all the ways you currently use to give each other feedback inside the organization? How can you improve these ways?

Answered: 1 week ago

Previous Question Next Question