Answered step by step
Verified Expert Solution
Question
1 Approved Answer
python: Description of Part III of Project In Part III of the project, you will train Q - learning agent to play Nim. The agent
python: Description of Part III of Project
In Part III of the project, you will train Qlearning agent to play Nim. The agent will be trained by playing thousands of games against a RandomPlayer agent, but will eventually be able to consistently defeat the better playing MininmaxPlayer agents.
Define Nim Class
Copy the definition for the Nim class from Part I of the project into the cell below.
New Classes
You will work with two new classes in this notebook. These are named BotPlayerEnv and PolicyPlayer. These classes are described below.
The BotPlayerEnv class provides an interface that can be used with reinforcement learning algorithms to train agents to play games by having them complete against a "bot player" controlled by an adversarial search agent such as RandomPlayer Instances of BotPlayerEnv combine an instance of a game environment with an instance of an adversarial agent to create an environment that can be use with our RL algorithms. When an action is taken in this environment, the BotPlayerEnv class will apply that action, and then generate and apply an action for the bot player. The code block below demonstrates how to create an instance of BotPlayerEnv and how to use it with an instance of TDAgent.
nim Nimpiles stones limit
bot RandomPlayerBot
botenv BotPlayerEnvgameenvnim, agentbot
td TDAgentbotenv, gamma randomstate
An instance of the PolicyPlayer class represents an adversarial search agent that follows a policy that maps game states to actions. We will use this class to create agents that follow the policies learned by applying the Qlearning algorithm. It is interesting to note that since PolicyPlayer agents don't have to perform a search when selecting actions, they will always select their actions very quickly. It might take a significant amount of time to run the Qlearning algorithm that learns the policy to use in conjunction with PolicyPlayer, but once the policy is learned, the agent will play very quickly.
The code block below demonstrates how to create an instance of PolicyPlayer.
p PolicyPlayerPolicy Player', policysomepolicy
Part : Basic QLearning Agent
In Part we will use Qlearning to learn a policy for playing Nim. The policy will be learned by having the Qlearning algorithm play many games against a RandomPlayer agent, and will be tested by having it play against RandomPlayer and MinimaxPlayer agents. Our eventual goal is to find a policy that can be used to consistently defeat a Minimax agent with a depth of
A Training the Agent
Create the following objects:
An instance of Nim with piles, stones per pile, and with a limit of stones per action.
An instance of RandomPlayer.
An instance of BotPlayerEnv using the Nim and RandomPlayer instances you created above.
An instance of TDAgent that uses your instance of BotPlayerEnv. Set gamma and randomstate
After creating the objects above, use your TDAgent instance to apply Qlearning to learn a policy for Nim. Run episodes of Qlearning with an exploration rate of and a learning rate of Also set trackhistoryFalse when calling calling the qlearning method. This will significantly reduce the memory requirements of the algorithm.
B Create Agents
Create the following agents:
A PolicyPlayer instance using the policy found by Qlearning.
A RandomPlayer instance.
A MinimaxPlayer instance with depth
A MinimaxPlayer instance with depth
A MinimaxPlayer instance with depth
C Versus RandomPlayer
Run a round tournament between the PolicyPlayer agent and the RandomPlayer agent. Set randomstate When creating the agent list, please list the PolicyPlayer agent first.
D Versus Minimax
Run a round tournament between the PolicyPlayer agent and the MinimaxPlayer agent with depth Set randomstate When creating the agent list, please list the PolicyPlayer agent first.
E Versus Minimax
Run a round tournament between the PolicyPlayer agent and the MinimaxPlayer agent with depth Set randomstate When creating the agent list, please list the PolicyPlayer agent first.
F Versus Minimax
Run a round tournament between the PolicyPlayer agent and the MinimaxPlayer agent with depth Set randomstate When creating the agent list, please list the PolicyPlayer agent first.
G Summarizing Results
Indicate the win rates for the PolicyPlayer agent by filling in each of the blanks below. Proivde your answer as percentages rounded to decimal place.
The policy player won:
of games played against the RandomPlayer agent.
of games played against the MinimaxPlayer agent with depth
of games played against the MinimaxPlayer agent with depth
of games played against the MinimaxPlayer agent with depth
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started