Question

1 Approved Answer

Posted on Sep 25, 2024

Reinforcement Learning - Bongo Board (Using Python) Environment: This environment is based on the Bongo Board game. We want to train a controller to allow

Reinforcement Learning - Bongo Board (Using Python)

Environment: This environment is based on the Bongo Board game. We want to train a controller to allow a simulated humanoid robot to balance on the board. Implement the following environment as Open AI Gym environments.

Bongo-Board-v0:

First, we model the humanoid robot and the bongo board as a simple linear inverted pendulum problem. We assume that the board does not slip on the roll and hence the board rotates around a point at the bottom of the circle.

image text in transcribed

The bongo board B has diameter d and the board B on top of the cylinder has roll angle with the horizontal axis. The robot is modeled as a point mass with mass m that sits on top of the bongo board with distance l.

The roll angle is limited by the length of the board B lb and the diameter d of the cylinder under the board.

The control allows actuation at point C that the mass rotates around point C.

The simplified dynamics of the system are those of a linear inverted pendulum considering the extra length d and . You can ignore the specifics of the dynamics for this assignment.

You can use the cart and pole and two link Acrobat-v1 environments in OpenAI gym as a starting point for this part of the assignment.

The main difference is that the angle of the board must be limited. You can assume that the length of the board B is 5 * d. Calculate the range of possible angles for theta.

Assume the following parameters:

image text in transcribed

Modify the two link Acrobat-v1 environment so that it matches the description above. In particular, you will need to change at least those values:

image text in transcribed

This version of the environment allows two torques (+1,-1) to be applied at the joint between the two links.

The goal of the Acrobat robot is to swing up, but the goal of the bongo board is to stay upright as long as possible. Modify the reward function to match the bongo board domain.

Reinforce Algorithm: This description includes a description of the application of the REINFORCE algorithm to solve the cart and pole problem. Modify the implementation to solve the bongo board problem as described above.

Evaluation: Evaluate the performance of the Reinforce algorithm for the bongo board environment. Find suitable parameters alpha and gamma so that the system can learn to balance on the board.Try some variations of the neural network used to learn the policy gradients.

a 1 Board Length b R Fig 1.: Simplified linear inverted pendulum model of a robot on a Bongo board. Variable Value d 0.25m b 1.25m (5*d) 1 1.1m m 5 kg LINK_LENGTH_1 = 1. # [m] | LINK LENGTH_2 = 1. # [mj | LINK_MASS_1 = 1. #i [kg] mass of link 1 LINK_MASS 2 = 1. #: 1kg) mass of link 2 LINK COM_POS_1 = 0.5 #: [m] position of the center of mass of link 1 |LINK COM_POS_2 = 0.5 #: m position of the center of mass of link 2 a 1 Board Length b R Fig 1.: Simplified linear inverted pendulum model of a robot on a Bongo board. Variable Value d 0.25m b 1.25m (5*d) 1 1.1m m 5 kg LINK_LENGTH_1 = 1. # [m] | LINK LENGTH_2 = 1. # [mj | LINK_MASS_1 = 1. #i [kg] mass of link 1 LINK_MASS 2 = 1. #: 1kg) mass of link 2 LINK COM_POS_1 = 0.5 #: [m] position of the center of mass of link 1 |LINK COM_POS_2 = 0.5 #: m position of the center of mass of link 2