Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem Statement Develop a reinforcement learning agent using dynamic programming methods to solve the Dice game optimally. The agent will learn the optimal policy by

Problem Statement
Develop a reinforcement learning agent using dynamic programming methods to solve the Dice game optimally. The agent will learn the optimal policy by iteratively evaluating and improving its strategy based on the state-value function and the Bellman equations.
Scenario:
A player rolls a 6-sided die with the objective of reaching a score of exactly 100. On each turn, the player can choose to stop and keep their current score or continue rolling the die. If the player rolls a 1, they lose all points accumulated in that turn and the turn ends. If the player rolls any other number (2-6), that number is added to their score for that turn. The game ends when the player decides to stop and keep their score OR when the player's score reaches 100. The player wins if they reach a score of exactly 100, and loses if they roll a 1 when their score is below 100.
Environment Details
The environment consists of a player who can choose to either roll a 6-sided die or stop at any point.
The player starts with an initial score (e.g.,0) and aims to reach a score of exactly 100.
If the player rolls a 1, they lose all points accumulated in that turn and the turn ends. If they roll any other number (2-6), that number is added to their score for that turn.
The goal is to accumulate a total of exactly 100 points to win, or to stop the game before reaching 100 points.
States
State s: Represents the current score of the player, ranging from 0 to 100.
Terminal States:
State s =100: Represents the player winning the game by reaching the goal of 100 points.
State s =0: Represents the player losing all points accumulated in the turn due to rolling a 1.
Actions
Action a: Represents the decision to either "roll" the die or "stop" the game at the current score.
The possible actions in any state s are either "roll" or "stop".
Expected Outcomes:
Use dynamic programming methods value iteration, policy improvement and policy evaluation to find the optimal policy for the Dice Game.
Implement an epsilon-greedy policy for action selection during training to balance exploration and exploitation.
Evaluate the agent's performance in terms of the probability of reaching exactly 100 points after learning the optimal policy.
Use the agent's policy as the best strategy for different betting scenarios within the problem.
Design a DiceGame Environment (1M)
Define reward function
Policy Iteration Function Definition (0.5M)
Value Iteration Function Definition (0.5M)
Executing Policy Iteration and Value Iteration Functions
Print the Learned Optimal Policy, Optimal Value Function

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing Fundamentals, Design, and Implementation

Authors: David M. Kroenke, David J. Auer

14th edition

133876705, 9781292107639, 1292107634, 978-0133876703

More Books

Students also viewed these Databases questions

Question

Th ey told me Id have to write a lett er. Whos got time for that?

Answered: 1 week ago