Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Homework (2/2) 2-a What are the optimal state-values and state-action-values for this environment? 2-b What is the optimal policy for this environment? 2-c Assume
Homework (2/2) 2-a What are the optimal state-values and state-action-values for this environment? 2-b What is the optimal policy for this environment? 2-c Assume we introduce a discount factor of 0.95 into our value functions. Determine the new values of the state-value and state-action-value functions as well as the new optimal policy. Describe the effect of the discount factor on the optimal policy. 3. (2pts) we will formulate Tic-Tac-Toe as an environment in which we can train a reinforcement learning agent. You will play as X's, and your opponent will be O's. Two-player games such as Tic-Tac-Toe are often modeled using game theory, in which we try and predict the moves of our opponent as well. For simplicity, we ignore the modeling of the opponent moves and treat our opponent's actions as a source of randomness within the environment. Assume you always go first. What are the states and actions within the Tic-Tac-Toe reinforcement learning environment? How does the current state affect the actions you can take?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started