Question: Consider a 2-armed bandit instance B in which the rewards from the arms come from uniform distributions (recall that the lectures assumed they came

Consider a 2-armed bandit instance B in which the rewards from the

Consider a 2-armed bandit instance B in which the rewards from the arms come from uniform distributions (recall that the lectures assumed they came from Bernoulli distributions). The rewards of arm 1 are drawn uniformly at random from [a, b], and the rewards of arm 2 are drawn uniformly at random from [c, d], where 0 < a < c < b < d < 1. Observe that this means there is an overlap: both arms produce some rewards from the interval [c, b]. An algorithm L proceeds as follows. First it pulls arm 1; then it pulls arm 2; whichever of these arms produced a higher reward (or arm 1 in case of a tie) is then pulled a further 20 times. In other words, the algorithm performs round-robin exploration for 2 steps and greedily picks an arm for the subsequent exploitation phase, during which that arm is blindly pulled 20 times. What is the expected cumulative regret of L on B after 22 pulls? (If you have worked out an answer but are not sure about it, consider writing a small program to simulate L and run it many times for fixed a, b, c, d. Is the average regret from these runs close to your answer? The program is for your own sake; no need to submit or to explain to us.)

Step by Step Solution

★★★★★

3.46 Rating (156 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

The detailed ... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Computer Engineering Questions!

Consider two different normal distributions for which both the means 1 and 2 and the variances 21 and 22 are unknown, and suppose that it is desired to test the following hypotheses: H0: 21 22, H1:...

Consider again Problem 2.1.4 where two cards are drawn from a pack of cards. Is the expected number of hearts drawn larger when the second drawing is made with or without replacement? Does this...

The motor at C pulls in the cable with an acceleration ac= (3t)2 m/s, where t is in seconds. The motor at D draws in its cable at aD = 5m/s2. If both motors start at the same instant from rest when d...

19 ons The Mehmet Company manufactures toy ships and had the following transactions: Owner's invest $18,000 to start the company Paid $2,200 rent on administrative office Purchased $3,000 of...

Mayfax Distributors, Inc., has four sales territories, each of which must be assigned a sales representative. From past experience, the firms sales manager has estimated the annual cost ($000s) of...

(Thompson sampling always optimal) Thompson sampling and U03 are two of the most popular algorithms for the multiarmed bandit problem. We have also seen evidence for their optimality, but only under...

Question. A 2-armed bandit instance I has as the mean rewards of its arms P, P2 [0, 1], where P1 P2|=A> 0. Both arms produce 0 and 1 rewards (that is, from Bernoulli distributions). Suppose we are...

Identify and explain the type of unemployment in each situation: a ) ) New graduates looking for a suitable job for the first time. b ) ) Mary lost her job during the Global Financial Crisis when her...

Prior to liquidating their partnership, Knight and Mee had capital accounts of $120,000 and $40,000, respectively. The partnership assets were sold for $50,000. The partnership had no liabilities....

Afin-Super is the largest superannuation and pension fund in Macquarieland. To promote pension and annuity service, Afin-Super is serving a short-term experience plan (STEP). Plan joiners will be...

Training sesael Clurreraturn 8 . 1 2 / 3 5 , 7 6 . 2 3 7 4 Test - California Military ( 2 0 2 5 ) Credits 2 / 2 Question 1 0 of 1 5 . Staff Sergeant MacIntosh is domiciled in Oklahoma and stationed...

Check that the nuclides 239 Pu, 233 U, and 241 Pu satisfy the four conditions listed in 18.3.3 for being fissile. A few other actinides Z > 92 nuclei with produced in reactors are also fissile. See...

An item that costs a retailer $58.50 is normally sold for $95. During a storewide sale, the item was marked down by 30%. a. What is the normal rate of markup based on cost? b. What was the reduced...

An experiment was conducted to investigate the precision of measurements of a saturated solution of iodine after an extended period of continuous stirring. The data shown in the table represent n =...

Refer to the information for Adrian Express in E125. Industry averages for the following profitability ratios are as follows: Gross profit ratio ............................ 45% Return on assets...

Modify the servlet for Exercise 11.5 to count the number of visitors and then display that number for each visitor. Data from exercise 11.5 Write the markup document to create a form that collects...

Dismenorrea Option A Menstruacin dolorosa Option B Sangrado vaginal excesivo . Option C Ausencia de menstruacin . Option D Secreciones blancas durante la menstruacin

Graph the function. (x) = log 2 (x + 2) - 3

Prove each of the following for every positive integer n. Use steps (a)(e) as in Exercises. If a > 1, then a n > a n-1 . Steps Let Sn represent the given statement, and use mathematical induction to...

Solve the nonlinear system of equations. Give all solutions, including those with nonreal complex components. y = 6x + x 2 4x - y = -3

By the time he turns 60, Justin (just turned age 31) wants the amount in his RRSP to have the purchasing power of $250,000 in current dollars. What annual contributions on his 32nd through 60th...

The interest rate on a $100,000 loan is 9% compounded monthly. How much longer will it take to pay off the loan with monthly payments of $1000 than with monthly payments of $1050?

A finance company paid a furniture retailer $1934 for a conditional sale contract requiring 12 end-of-month payments of $175. What effective rate of return does the finance company earn on the...