Question 2 - Value Iteration [35 points] In this question, you will be using an applet to
Question:
Question 2 - Value Iteration [35 points] In this question, you will be using an applet to improve your understanding of value iteration. You can find the applet at https://artint info/demos/mdp/vi.html Note: modern browsers don't seem to like Java. There are workarounds, but a vastly less painful way to access the applet is to make sure you have the Java appletviewer installed (which you should if you have the JDK installed), and then from your command line run appletviewer https://artint . info/ demos/mdp/vi . html (You may need to first navigate to the directory where the appletviewer program is located.) There are some questions listed on that website; for this assignment, please disregard those questions and only answer the following ones. In this assignment, we are using a discount factor of 0.9, initial values of UCI(s) = 0 for all s, and the "absorbing states" option (explained in detail on the website with the applet) We will refer to states as (x,y), meaning the state in the x-th column and the y-th row: e.g. (1,1) for the state at the top left, and (10,1) for the state at the top right. (a) (10 points) The figure below shows the values U.")(s) in each state, that is, the values after one step of value iteration. We will focus on the entry in a single state, namely state (10,8), the state to the right of the absorbing state with reward 10 (which is located at (9,9)). Show in detail how UO( (10,8) ) is computed using the values U"(s). Value Iteration Step Discount 13 . Resch Meeting States
Introduction to Data Mining
ISBN: 978-0321321367
1st edition
Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar