Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

5. (20 points) In the aima-python/mdp.ipynb code, the GridMDP class provides all the tools required for solving the grid-world problems and four cases to demonstrate

image text in transcribed
5. (20 points) In the aima-python/mdp.ipynb code, the GridMDP class provides all the tools required for solving the grid-world problems and four cases to demonstrate how the agent should behave for each case. The jupyter notebook also has the utility function, print_table, to output the optimal policy. Now we consider a 4x4 grid-world problem shown in the following figure. 0.0 0.0 The agent's goal is to reach either one of the two corners (1.4) and (4,1). The agent can choose any action from the action set A = {left, right, up, down), which cause the current state transitions to next state, except the agent stays put when located on a boundary cell and then the action takes it off the grid. The agent follows equal probability random policy, no discount, and the reward is -1 for every transition. a) Output the best policy. b) Output the best policy when discount is set to 0.1. 6. (20 points) In the aima-python/games4e.ipynb code, the TicTac Toe class and the Board class provide tools for players to play games. Now you want to collect the result (i.e., win, draw, lost) of 500 games played by two players on a 4x4 board. The two players are a) alpha-beta vs. random b) Minimax vs. random c) Alpha-beta vs. minimax 5. (20 points) In the aima-python/mdp.ipynb code, the GridMDP class provides all the tools required for solving the grid-world problems and four cases to demonstrate how the agent should behave for each case. The jupyter notebook also has the utility function, print_table, to output the optimal policy. Now we consider a 4x4 grid-world problem shown in the following figure. 0.0 0.0 The agent's goal is to reach either one of the two corners (1.4) and (4,1). The agent can choose any action from the action set A = {left, right, up, down), which cause the current state transitions to next state, except the agent stays put when located on a boundary cell and then the action takes it off the grid. The agent follows equal probability random policy, no discount, and the reward is -1 for every transition. a) Output the best policy. b) Output the best policy when discount is set to 0.1. 6. (20 points) In the aima-python/games4e.ipynb code, the TicTac Toe class and the Board class provide tools for players to play games. Now you want to collect the result (i.e., win, draw, lost) of 500 games played by two players on a 4x4 board. The two players are a) alpha-beta vs. random b) Minimax vs. random c) Alpha-beta vs. minimax

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke, David Auer, Scott Vandenberg, Robert Yoder

9th Edition

0135188148, 978-0135188149, 9781642087611

More Books

Students also viewed these Databases questions