Question
Question 2 - Value Iteration [35 points] In this question, you will be using an applet to improve your understanding of value iteration. You can
Question 2 - Value Iteration [35 points] In this question, you will be using an applet to improve your understanding of value iteration. You can find the applet at https://artint info/demos/mdp/vi.html Note: modern browsers don't seem to like Java. There are workarounds, but a vastly less painful way to access the applet is to make sure you have the Java appletviewer installed (which you should if you have the JDK installed), and then from your command line run appletviewer https://artint . info/ demos/mdp/vi . html (You may need to first navigate to the directory where the appletviewer program is located.) There are some questions listed on that website; for this assignment, please disregard those questions and only answer the following ones. In this assignment, we are using a discount factor of 0.9, initial values of UCI(s) = 0 for all s, and the "absorbing states" option (explained in detail on the website with the applet) We will refer to states as (x,y), meaning the state in the x-th column and the y-th row: e.g. (1,1) for the state at the top left, and (10,1) for the state at the top right. (a) (10 points) The figure below shows the values U.")(s) in each state, that is, the values after one step of value iteration. We will focus on the entry in a single state, namely state (10,8), the state to the right of the absorbing state with reward 10 (which is located at (9,9)). Show in detail how UO( (10,8) ) is computed using the values U"(s). Value Iteration Step Discount 13 . Resch Meeting States
Question 2 - Value Iteration [35 points] In this question, you will be using an applet to improve your understanding of value iteration. You can find the applet at https://artint.info/demos/mdp/vi.html Note: modern browsers don't seem to like Java. There are workarounds, but a vastly less painful way to access the applet is to make sure you have the Java appletviewer installed (which you should if you have the JDK installed), and then from your command line run appletviewer https://artint.info/demos/mdp/vi.html (You may need to first navigate to the directory where the appletviewer program is located.) There are some questions listed on that website; for this assignment, please disregard those questions and only answer the following ones. In this assignment, we are using a discount factor of 0.9, initial values of U(s) = 0 for all s, and the "absorbing states" option (explained in detail on the website with the applet). We will refer to states as (x,y), meaning the state in the x-th column and the y-th row: e.g. (1,1) for the state at the top left, and (10,1) for the state at the top right. (a) (10 points) The figure below shows the values U(s) in each state, that is, the values after one step of value iteration. We will focus on the entry in a single state, namely state (10,8), the state to the right of the absorbing state with reward 10 (which is located at (9,8)). Show in detail how U(10,8)) is computed using the values U)(s). Value Iteration 01 01 01 01 Disco Step Resel Inal Value Absorbing States
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started