Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem Statement Develop a reinforcement learning agent using dynamic programming methods to solve the Dice game optimally. The agent will learn the optimal policy by

Problem Statement
Develop a reinforcement learning agent using dynamic programming methods to solve the Dice game optimally. The agent will learn the optimal policy by iteratively evaluating and improving its strategy based on the state-value function and the Bellman equations.
Scenario:
A player rolls a 6-sided die with the objective of reaching a score of exactly 100. On each turn, the player can choose to stop and keep their current score or continue rolling the die. If the player rolls a 1, they lose all points accumulated in that turn and the turn ends. If the player rolls any other number (2-6), that number is added to their score for that turn. The game ends when the player decides to stop and keep their score OR when the player's score reaches 100. The player wins if they reach a score of exactly 100, and loses if they roll a 1 when their score is below 100.
Environment Details
The environment consists of a player who can choose to either roll a 6-sided die or stop at any point. The player starts with an initial score (e.g.,0) and aims to reach a score of exactly 100. If the player rolls a 1, they lose all points accumulated in that turn and the turn ends. If they roll any other number (2-6), that number is added to their score for that turn. The goal is to accumulate a total of exactly 100 points to win, or to stop the game before reaching 100 points.
States
State s: Represents the current score of the player, ranging from 0 to 100.
Terminal States:
State s =100: Represents the player winning the game by reaching the goal of 100 points.
State s =0: Represents the player losing all points accumulated in the turn due to rolling a 1.
Actions
Action a: Represents the decision to either "roll" the die or "stop" the game at the current score.
The possible actions in any state s are either "roll" or "stop".
Outcomes:
1. Use dynamic programming methods value iteration, policy improvement and policy evaluation to find the optimal policy for the Dice Game.
2. Implement an epsilon-greedy policy for action selection during training to balance exploration and exploitation.
3. Evaluate the agent's performance in terms of the probability of reaching exactly 100 points after learning the optimal policy.
4. Use the agent's policy as the best strategy for different betting scenarios within the problem.
5. Following is the comment given for Value-Iteration Function -
# Iterate over all states except terminal state untill convergence
# Calculate expected returns V(s) for current policy by considering all possible actions.
#If action is stop:
#Calculate reward for stopping and append to rewards.
#If action is roll:
#For each possible roll outcome (1 to 6), Determine next_s based on roll.
# Update V(s) using the Bellman equation.
#Determine max_reward from rewards
#With probability epsilon, randomly choose a reward from rewards.
#Check convergence if delta is less than a small threshold.
#-----write your code below this line---------
6. Following is the comment given for Policy Iteration -
#For each state, Store old_policy of state s.
#Determine best_action based on maximum reward. Update policy[s] to best_action.
#Return stable when old policy = policy[s]
#-----write your code below this line---------
7. Comment given for Execute Policy Iteration & Value Iteration :
#Simulate the game for 100 states. Use the learned policy to get the actions.
#when its roll, randomly generate a number to find the reward.
#when its stop, get the respective reward
#determine the total cumulative reward
#-----write your code below this line----:-----
8. Need to design a DiceGame Environment:
# Code for Dataset loading and print dataset statistics along with reward function
#-----write your code below this line---------
class DiceGameEnvironment:
9. Reward Function -
#Calculate reward function for 'stop' and 'roll' actions
#-----write your code below this line---------

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional SQL Server 2012 Internals And Troubleshooting

Authors: Christian Bolton, Justin Langford

1st Edition

1118177657, 9781118177655

More Books

Students also viewed these Databases questions

Question

What would you do?

Answered: 1 week ago