Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Value Function Approximation. The robot given below is trying to explore the area and find safe routes to resources. The state of the robot is
Value Function Approximation. The robot given below is trying to explore the area and find safe routes to resources. The state of the robot is the grid it is in. Robot can move in four cardinal directions. The landmarks, L1 and L2, signify that there is a resource close-by. The locations of these landmarks are known to the robot (L1 = (xl1, yl1) and L2 = (xl2, yl2)).
I need answer to this question
Use the observed transitions to update the weights, starting from zero weights with the learning rate = 0.2 and the discount factor = 1.0.
4 Actions: Up Left + 3 L1 Right Down N L2 State: (x,y) location of the robot, e.g. (2,1) in the figure L1 and L2: Known landmarks Discount: 1.0 1 1 2 3 4 The robot wants to use function approximation get the values of each state. It decides to use the following features, given the current state s = (1, y). Current x-coordinate: fi(8) = x Current y-coordinate: f2(8) = y Manhattan Distance to L1: f3(s) = 12 - 211 + y - yu| Manhattan Distance to L2: f4(8) = 12 - 212 + y - y12 Furthermore, it uses a linear function approximator: W (s,w) = wifi(s) + w2f2(s) + w3f3(s) +w4f4(s) = w+ f(s) = The robot then observes the following transitions: (2, 1), -0.1 +(2, 2), -0.1 +(2,3), +1Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started