Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Consider a 2-armed bandit instance B in which the rewards from the arms come from uniform distributions (recall that the lectures assumed they came

Consider a 2-armed bandit instance B in which the rewards from the arms come from uniform distributions (recall that the lectures assumed they came from Bernoulli distributions). The rewards of arm 1 are drawn uniformly at random from [a, b], and the rewards of arm 2 are drawn uniformly at random from [c, d], where 0 < a < c < b < d < 1. Observe that this means there is an overlap: both arms produce some rewards from the interval [c, b]. An algorithm L proceeds as follows. First it pulls arm 1; then it pulls arm 2; whichever of these arms produced a higher reward (or arm 1 in case of a tie) is then pulled a further 20 times. In other words, the algorithm performs round-robin exploration for 2 steps and greedily picks an arm for the subsequent exploitation phase, during which that arm is blindly pulled 20 times. What is the expected cumulative regret of L on B after 22 pulls? (If you have worked out an answer but are not sure about it, consider writing a small program to simulate L and run it many times for fixed a, b, c, d. Is the average regret from these runs close to your answer? The program is for your own sake; no need to submit or to explain to us.)

Step by Step Solution

3.46 Rating (156 Votes )

There are 3 Steps involved in it

Step: 1

The detailed ... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

College Algebra

Authors: Margaret L. Lial, John Hornsby, David I. Schneider, Callie Daniels

12th edition

134697022, 9780134313795 , 978-0134697024

More Books

Students also viewed these Computer Engineering questions

Question

Graph the function. (x) = log 2 (x + 2) - 3

Answered: 1 week ago