Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please write the code for part one in python. No pusedo code please. Objectives: python. No pusedo code please. To implement a reinforcement based on

Please write the code for part one in
python. No pusedo code please.
Objectives:
python. No pusedo code please.
To implement a reinforcement based on task-based rewards
To take a continuous environment and discretize it so that it is suitable for a
reinforcement learning task
This is the CartPole task. The idea here is to balance this pole using a one-dimensional robot (it
can only move left and right). The robot's state has 4 components:
?x : the location of the robot (0 is the center, -2.4 is the leftmost part of the board, 2.4 is
the rightmost part of the board)
OpenAI Gym:
You do not have to implement the problem domain yourself, there is a resource called openAl
gym which has a set of common training examples. Gym can be installed with the following
command:After running the provided command, you may also be asked to install some additional
packages for the video encoding. You'll see an error message with instructions to follow.
State Discretization:
We will discretize the space in order to simplify the reinforcement learning algorithm. One
example can be as follows:
? x: (one bucket for -.08, one for >.08-.5>.5-50degs,50degs0..161QQp=random()Qxp
Choose random action
Else:
Choose action that gives max Q value
Your Task (part2):
Now that you've implemented q-learning for one task, you will move to the mountain car task.
Instead of2 actions (left, right), this task has three (left, null, right). The task also has different
state variables (only2)
?x : the location of the robot (-1.2is the left, -.45is approximately the valley, 0.6is the
rightmost part of the board, 0.5is the location of the flag)
? xdot: the velocity of the robot (this can go from -0.07to0.07)
This will require you to change the number of bins for state descritization as well as the
alpha and gamma values. Additionally, you need to implement the exploration vs
exploitation part for this problem as well.
Once your model is trained it will be saved the Q-table as 'car.npy' file. Make sure to that you
don't change this file name.-.5, one for >.5-.08, one for >.08
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions