Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Q4. Model-free Reinforcement Learning: Cycle (20 points) Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do

image text in transcribed

Q4. Model-free Reinforcement Learning: Cycle (20 points) Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward function for the MDP, but instead, we are given with samples of what an agent actually experiences when it interacts with the environment (although, we do know that we do not remain in the same state after taking an action. In this problem, instead of first estimating the transition and reward functions, we will directly estimate the Q function using Q-learning. Assume, the discount factor, y is 0.5 and the step size for Q-learning, a is 0.5. Our current Q function, Q(s,a), is shown in the left figure. The agent encounters the samples shown in the right figure: s' r A B Clockwise 1.501 -0.451 2.73 Counterclockwise 3.153 -6.055 2.133 A Counterclockwise 8.0 Counterclockwise A 0.0 Provide the Q-values for all pairs of (state, action) after both samples have been accounted for. Q4. Model-free Reinforcement Learning: Cycle (20 points) Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward function for the MDP, but instead, we are given with samples of what an agent actually experiences when it interacts with the environment (although, we do know that we do not remain in the same state after taking an action. In this problem, instead of first estimating the transition and reward functions, we will directly estimate the Q function using Q-learning. Assume, the discount factor, y is 0.5 and the step size for Q-learning, a is 0.5. Our current Q function, Q(s,a), is shown in the left figure. The agent encounters the samples shown in the right figure: s' r A B Clockwise 1.501 -0.451 2.73 Counterclockwise 3.153 -6.055 2.133 A Counterclockwise 8.0 Counterclockwise A 0.0 Provide the Q-values for all pairs of (state, action) after both samples have been accounted for

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro SQL Server Wait Statistics

Pro SQL Server Wait Statistics

Authors: Enrico Van De Laar

1st Edition

1484211391, 9781484211397

More Books

Students also viewed these Databases questions

Question

What are the differences among a summary business plan, a full business plan, and an operational business plan?

Answered: 1 week ago

Question

★★★★★

Koch Chemical Company makes a variety of cosmetic products, one of which is a skin cream designed to reduce the signs of aging. Koch produces a relatively small amount (15,000 units) of the cream and...

Answered: 1 week ago

Question

★★★★★

Q4. Model-free Reinforcement Learning: Cycle (20 points) Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward...

Answered: 1 week ago

Question

★★★★★

The ABS CBN News reports foreign exchange rate are closed on March 13, 2020 at P51.25. Therefore the formula that gives Philippine Peso in terms of US dollars on that day is: P = 51.25D Where D...

Answered: 1 week ago

Question

★★★★★

Determine the amplitude and period of each function.

Answered: 1 week ago

Question

★★★★★

23. Consider the following frequency distribution of weights of 150 bolts: Weight (grams) Frequency 5.00 and less than 5.01 4 5.01 and less than 5.02 18 5.02 and less than 5.03 25 5.03 and less than...

Answered: 1 week ago

Question

★★★★★

Evan Corporation provided consulting services for Kensington Company in year 1. Evan incurred costs of $60,000 associated with the consulting and billed Kensington $90,000. Evan paid $40,000 of its...

Answered: 1 week ago

Question

★★★★★

The price per gallon of gas data set has a mean of $2.98 and a standard deviation of $1.07. The high school SAT scores data set has a mean of 1015 and a standard deviation of 165. Calculate the...

Answered: 1 week ago

Question

★★★★★

7. Question 7 Refer to Step 3.3. In the "Unconstrained " or "Short Selling " version of the optimal risky portfolio, what is the portfolio mean ? Write your answer as a percentage ,with no percentage...

Answered: 1 week ago

Question

★★★★★

Give an example of a Composite Primary Key use in a HCM Payroll Table.

Answered: 1 week ago

Question

★★★★★

How are Third Normal Form rules disregarded in Dimensional Database Design?

Answered: 1 week ago

Question

★★★★★

Provide examples of Dimensional Tables.

Answered: 1 week ago

Previous Question Next Question