Answered step by step
Verified Expert Solution
Question
1 Approved Answer
4 Markov Decision Processes Consider the following game. In each turn you have a choice of rolling a special die, or stopping the game. The
4 Markov Decision Processes Consider the following game. In each turn you have a choice of rolling a special die, or stopping the game. The die is biased - every time you roll, it produces 1, 3, 5 or 6 with equal probability. No other values are possible (It's a tetrahedral die.) At any point of time, you can either roll or stop if the total "score" (obtained by adding the values on the die from every rolling) is less than 7. If the "score" reaches or exceeds 7, you "go bust and go to the final state, accruing zero reward. When in any state other than the final state, you are allowed to take the stop action. When you stop, you reach the final state and your reward is the total "score" if it is less than 7. Note: there is no direct reward from rolling the dice (or we could say that there is a reward but it's always 0). The only non-zero reward comes from explicitly taking the stop action. Discounting or not should not matter in the MDP for this game, but for the record, we assume no discounting i.e., y = 1). Figure 1: The value of a tetrahedral die like this, after a roll, is at the top, here 5, which should show equally well on any of the three faces that touch the top vertex. (a) (6 points) Write down the states in any order) and actions for this MDP. (Hint: there are 8 states in total and each should correspond to a numeric value except the initial and final states) (b) (10 points) Give the full transition function T(s, a, s'). Here s is a current state, a is an action, and d' is a possible next state when a is performed in s. Assuming your states are 80, 81, 82, 83 etc., and actions are do, a etc., some examples of how you should write the function are as follows: T(s, a, s') = (value); s = $0, s' {81, 82, 83, ...} T($0,21,81) = (value) (c) (2 points) Give the full reward function R(s, a, s'). (d) (2 points) What is the optimal policy? There is no need to perform value iteration or use any fancy math; just write your answer in words. 4 Markov Decision Processes Consider the following game. In each turn you have a choice of rolling a special die, or stopping the game. The die is biased - every time you roll, it produces 1, 3, 5 or 6 with equal probability. No other values are possible (It's a tetrahedral die.) At any point of time, you can either roll or stop if the total "score" (obtained by adding the values on the die from every rolling) is less than 7. If the "score" reaches or exceeds 7, you "go bust and go to the final state, accruing zero reward. When in any state other than the final state, you are allowed to take the stop action. When you stop, you reach the final state and your reward is the total "score" if it is less than 7. Note: there is no direct reward from rolling the dice (or we could say that there is a reward but it's always 0). The only non-zero reward comes from explicitly taking the stop action. Discounting or not should not matter in the MDP for this game, but for the record, we assume no discounting i.e., y = 1). Figure 1: The value of a tetrahedral die like this, after a roll, is at the top, here 5, which should show equally well on any of the three faces that touch the top vertex. (a) (6 points) Write down the states in any order) and actions for this MDP. (Hint: there are 8 states in total and each should correspond to a numeric value except the initial and final states) (b) (10 points) Give the full transition function T(s, a, s'). Here s is a current state, a is an action, and d' is a possible next state when a is performed in s. Assuming your states are 80, 81, 82, 83 etc., and actions are do, a etc., some examples of how you should write the function are as follows: T(s, a, s') = (value); s = $0, s' {81, 82, 83, ...} T($0,21,81) = (value) (c) (2 points) Give the full reward function R(s, a, s'). (d) (2 points) What is the optimal policy? There is no need to perform value iteration or use any fancy math; just write your answer in words
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started