Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

(3 points) Given a strict turnstile stream where every value is an integer in (1, N). Modify the count- sketch algorithm to report the elements

image text in transcribed

image text in transcribed

(3 points) Given a strict turnstile stream where every value is an integer in (1, N). Modify the count- sketch algorithm to report the elements such that (1) All elements that occur more than F1/3 times in the stream are reported. (2) Moreover, all elements that are reported occur at least F1/6 times. (a) Suppose that we don't care about the running time. Describe a simple algorithm that solves the problem by using the Count-Min Sketch. Your algorithm should succeed with probability at least 2/3 and uses (log n + log N). log N) bits of memory. Specify what values of e and 8 are used in your sketch. Explain why your algorithm succeeds with probability at least 2/3. (Hint: We know every key is an integer between 1 and N. Moreover, you can do whatever many queries in the end since we don't care about the running time. However, be careful with the failure probability because you are doing many queries.) (b) To get a running time that is sublinear in N, we need to do the query more efficiently. Assume that N is a power of 2. Define the Dyadic intervals as follows: Lo = {{1}, {2},...,{N}} = {{1,2},{3, 4},...,{N 1, N}} L2 = {{1,2,3,4}, {5,6,7,8},...,{N - 3, N - 2, N -1,N}} 2/3 Lig N = {{1, 2, ...,N}} Note that if [i, j] is an interval in Lx for k > 0, then it must be the case that (i,(i+j)/2], [(i + j)/2+1, j] are intervals in Lk-1. We call these intervals the children intervals of [i,j]. Conversely, [i, j] is the parent of these two intervals. We construct 1+lg N Count-Min Sketches with e=1/6 and 8 = 1/(6N), one for each Lk, to sketch the frequency for every I E Lx (the frequency of an interval refers to the sum of the frequency of the items in the interval). At the end of the stream, we run the following recursive procedure that queries the intervals in a depth-first search manner. We will begin by calling Query([1, N],1g N). procedure Query(Interval I, level) if level = 0 then Report the element belonging to the singleton interval I end if Let 11, 12 be the children intervals of I. Get estimates on the frequencies of 11 and 12. Let fi, and fi, be the estimates. if fi > F1/3 then Execute Query(I, level - 1) recursively. end if if f1, 2 F1/3 then Execute Query(12, level-1) recursively. end if Let & be the event that for all 0 Sk 2/3. Then, show that if & holds, the algorithm uses at most O(log N) queries and it solves the problem. Remarks. In (strict) turnstile streams, it is important to distinguish between n and F1. Here, F1 is the total counts of the elements at the end of the stream, while n is maximum total counts of the elements over any given time during the stream. Also, the update sequence can be arbitrarily long, so the error probability can cumulate if we were to do a query after each update. This rules out the approach of maintaining the top 6 frequent items using a heap. (3 points) Given a strict turnstile stream where every value is an integer in (1, N). Modify the count- sketch algorithm to report the elements such that (1) All elements that occur more than F1/3 times in the stream are reported. (2) Moreover, all elements that are reported occur at least F1/6 times. (a) Suppose that we don't care about the running time. Describe a simple algorithm that solves the problem by using the Count-Min Sketch. Your algorithm should succeed with probability at least 2/3 and uses (log n + log N). log N) bits of memory. Specify what values of e and 8 are used in your sketch. Explain why your algorithm succeeds with probability at least 2/3. (Hint: We know every key is an integer between 1 and N. Moreover, you can do whatever many queries in the end since we don't care about the running time. However, be careful with the failure probability because you are doing many queries.) (b) To get a running time that is sublinear in N, we need to do the query more efficiently. Assume that N is a power of 2. Define the Dyadic intervals as follows: Lo = {{1}, {2},...,{N}} = {{1,2},{3, 4},...,{N 1, N}} L2 = {{1,2,3,4}, {5,6,7,8},...,{N - 3, N - 2, N -1,N}} 2/3 Lig N = {{1, 2, ...,N}} Note that if [i, j] is an interval in Lx for k > 0, then it must be the case that (i,(i+j)/2], [(i + j)/2+1, j] are intervals in Lk-1. We call these intervals the children intervals of [i,j]. Conversely, [i, j] is the parent of these two intervals. We construct 1+lg N Count-Min Sketches with e=1/6 and 8 = 1/(6N), one for each Lk, to sketch the frequency for every I E Lx (the frequency of an interval refers to the sum of the frequency of the items in the interval). At the end of the stream, we run the following recursive procedure that queries the intervals in a depth-first search manner. We will begin by calling Query([1, N],1g N). procedure Query(Interval I, level) if level = 0 then Report the element belonging to the singleton interval I end if Let 11, 12 be the children intervals of I. Get estimates on the frequencies of 11 and 12. Let fi, and fi, be the estimates. if fi > F1/3 then Execute Query(I, level - 1) recursively. end if if f1, 2 F1/3 then Execute Query(12, level-1) recursively. end if Let & be the event that for all 0 Sk 2/3. Then, show that if & holds, the algorithm uses at most O(log N) queries and it solves the problem. Remarks. In (strict) turnstile streams, it is important to distinguish between n and F1. Here, F1 is the total counts of the elements at the end of the stream, while n is maximum total counts of the elements over any given time during the stream. Also, the update sequence can be arbitrarily long, so the error probability can cumulate if we were to do a query after each update. This rules out the approach of maintaining the top 6 frequent items using a heap

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intelligent Databases Technologies And Applications

Authors: Zongmin Ma

1st Edition

1599041219, 978-1599041216

More Books

Students also viewed these Databases questions