Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

4 Markov Decision Processes Consider the following game. In each turn you have a choice of rolling a special die, or stopping the game. The

image text in transcribedimage text in transcribed

4 Markov Decision Processes Consider the following game. In each turn you have a choice of rolling a special die, or stopping the game. The die is biased - every time you roll, it produces 1, 3, 5 or 6 with equal probability. No other values are possible (It's a tetrahedral die.) At any point of time, you can either roll or stop if the total "score" (obtained by adding the values on the die from every rolling) is less than 7. If the "score" reaches or exceeds 7, you "go bust and go to the final state, accruing zero reward. When in any state other than the final state, you are allowed to take the stop action. When you stop, you reach the final state and your reward is the total "score" if it is less than 7. Note: there is no direct reward from rolling the dice (or we could say that there is a reward but it's always 0). The only non-zero reward comes from explicitly taking the stop action. Discounting or not should not matter in the MDP for this game, but for the record, we assume no discounting i.e., y = 1). Figure 1: The value of a tetrahedral die like this, after a roll, is at the top, here 5, which should show equally well on any of the three faces that touch the top vertex. (a) (6 points) Write down the states in any order) and actions for this MDP. (Hint: there are 8 states in total and each should correspond to a numeric value except the initial and final states) (b) (10 points) Give the full transition function T(s, a, s'). Here s is a current state, a is an action, and d' is a possible next state when a is performed in s. Assuming your states are 80, 81, 82, 83 etc., and actions are do, a etc., some examples of how you should write the function are as follows: T(s, a, s') = (value); s = $0, s' {81, 82, 83, ...} T($0,21,81) = (value) (c) (2 points) Give the full reward function R(s, a, s'). (d) (2 points) What is the optimal policy? There is no need to perform value iteration or use any fancy math; just write your answer in words. 4 Markov Decision Processes Consider the following game. In each turn you have a choice of rolling a special die, or stopping the game. The die is biased - every time you roll, it produces 1, 3, 5 or 6 with equal probability. No other values are possible (It's a tetrahedral die.) At any point of time, you can either roll or stop if the total "score" (obtained by adding the values on the die from every rolling) is less than 7. If the "score" reaches or exceeds 7, you "go bust and go to the final state, accruing zero reward. When in any state other than the final state, you are allowed to take the stop action. When you stop, you reach the final state and your reward is the total "score" if it is less than 7. Note: there is no direct reward from rolling the dice (or we could say that there is a reward but it's always 0). The only non-zero reward comes from explicitly taking the stop action. Discounting or not should not matter in the MDP for this game, but for the record, we assume no discounting i.e., y = 1). Figure 1: The value of a tetrahedral die like this, after a roll, is at the top, here 5, which should show equally well on any of the three faces that touch the top vertex. (a) (6 points) Write down the states in any order) and actions for this MDP. (Hint: there are 8 states in total and each should correspond to a numeric value except the initial and final states) (b) (10 points) Give the full transition function T(s, a, s'). Here s is a current state, a is an action, and d' is a possible next state when a is performed in s. Assuming your states are 80, 81, 82, 83 etc., and actions are do, a etc., some examples of how you should write the function are as follows: T(s, a, s') = (value); s = $0, s' {81, 82, 83, ...} T($0,21,81) = (value) (c) (2 points) Give the full reward function R(s, a, s'). (d) (2 points) What is the optimal policy? There is no need to perform value iteration or use any fancy math; just write your answer in words

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Design Query Formulation And Administration Using Oracle And PostgreSQL

Authors: Michael Mannino

8th Edition

1948426951, 978-1948426954

More Books

Students also viewed these Databases questions