Answered step by step
Verified Expert Solution
Question
1 Approved Answer
python help q 3 Policy Iteration vs Value Iteration In Part 3 , you will compare the convergence time of policy iteration and value iteration.
python help q
Policy Iteration vs Value Iteration
In Part you will compare the convergence time of policy iteration and value iteration. Both techniques are guaranteed to converge to the optimal policy in a finite number of steps, but we will see that the number of steps required can be considerably different.
A Create Environment
Create a x instance of the FrozenPlatform environment with sprange a start position of which is the default no holes, and with randomstate You do not need to display the environment.
B Policy Iteration
Create an instance of the DPAgent class for the environment created in Step A Set gamma and randomstate Run policy iteration with the default parameters.
C Value Iteration
Create annother instance of the DPAgent class for the environment created in Step A Set gamma and randomstate Run value iteration with the default parameters.
D Algorithm Comparison
In the previous steps, you should have noticed that policy iteration had a considerably longer runtime than value iteration. You will now explore how these runtimes depend on environment size.
Starter code has been provided to you for this step. The code is intended to use a loop to create FrozenPlatform environments of size xxx and so on up to x Both policy iteration and value iteration will be applied to each environment. The time function from the time module will be used to calculate the runtime for each algorithm, storing the results in two different lists.
After the loop is complete, the cell should output the following two messages, with the blanks filled in with the appropriate values, rounded to decimal places.
time
# D
rng range
politertimes
valitertimes
#nprandom.seed
for i in tqdmrng:
tempfp FrozenPlatform
rows cols sprange holes randomstatei
t time.time
tempdp DPAgentenvtempfp gamma randomstatei
tempdppolicyiterationreportFalse
deltat time.time t
appenddeltat
t time.time
tempdp DPAgentenvtempfp gamma randomstatei
tempdpvalueiterationreportFalse
deltat time.time t
appenddeltat
printfAverage time for policy iteration: npmean:f
printfAverage time for value iteration: npmean:f
Visualizing Results
Use Matplotlib to create a figure with two line plots on a single axis. The yvalues for each line plot should come from the runtime lists created in the previous cell. The xvalues should be the associated environment sizes through Create the figure according to the following specifications.
Set the figsize to
The title should read "Runtime Comparison".
The x and y axes should be labeled "Environment Size" and "Runtime in seconds respectively.
Add a legend with labels "Policy Iteration" and "Value Iteration" to explain which line corresponds to which algorithm.
Add a grid to your plot.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started