Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

(3 points) In the lecture, we noted that F2 represents the size of a self-join of the same table. In this problem, you will generalize

image text in transcribed

image text in transcribed

(3 points) In the lecture, we noted that F2 represents the size of a self-join of the same table. In this problem, you will generalize the algorithm to compute the estimate of the size of the join between two different tables. For two relations (i.e. tables in a database) r(A, B) and s(A,C), with a common attribute (i.e.,column) A, we define the join rx s to be a relation consisting of all tuples (a, b,c) such that (a,b) er and (a,c) es. Therefore, if fr.j and fs.j denote the frequencies of j in the first columns (i.e., "A"-columns) of r and s, respectively, and j can take values in {1,...,N}, then the size of join is 2-1 fr.j. fs.j. For example, in Figure 2, the third table is the join of the first two tables, joined by the key "MENTOR". The size (# rows) of the table is 2 x 2 + 2x1+3 x 2 = 12. Suppose that the first table and the second table are given as streams 01 and 32 respectively. (You may ignore irrelevant information outside of the first column and assume that each item in the stream is a single token ;). (a) Consider the following procedure: Choose a hash function h from a 4-universal hash family that map (1,N] to (-1, +1}. Consider the following sketching algo- rithm: procedure f(0, h) MENTOR Alice Alice Bob Bob Charlie Charlie Charlie STUDENTS David Edward George Jack Luigi Mike Norman Figure 2: MENTOR EXPERTISE Alice C++ Alice Java Bob Scala Charlie Python Charlie Fortran STUDENTS EXPERTISE David C++ David Java Edward Ct Edward Java George Scala Jack Scala Luigi Python Luigi Fortran Mike Python Mike Fortran Norman Python Norman Fortran Set c+0 On processing token j: c+c+h() return c Let 0 = -1 fr.j fs,j. Use the procedure above to compute the an estimate of 0. Show that the expectation of your output is . (b) Assume that the variance of the above algorithm is upper bounded by 2. 02. Describe how to produce an estimate such that (1 - )

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

MySQL Crash Course A Hands On Introduction To Database Development

Authors: Rick Silva

1st Edition

1718503008, 978-1718503007

More Books

Students also viewed these Databases questions

Question

How do Dimensional Database Models differ from Relational Models?

Answered: 1 week ago

Question

What type of processing do Relational Databases support?

Answered: 1 week ago

Question

Describe several aggregation operators.

Answered: 1 week ago