Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Suppose we have a stream of tuples with the schema Grades(university, courseID, studentID, grade) Assume universities are unique, but a courseID is unique only within

Suppose we have a stream of tuples with the schema Grades(university, courseID, studentID, grade) Assume universities are unique, but a courseID is unique only within a university (i.e., different universities may have different courses with the same ID, e.g., CS101) and likewise, studentIDs are unique only within a university (different universities may assign the same ID to different students). Suppose we want to answer certain queries approximately from a 1/15th sample of the data. For each of the queries below, indicate how you would construct the sample.

That is, tell what the key attributes should be. (a) Estimate the average number of courses per university. (b) Estimate the fraction of students who have a GPA of 3.7 or more. Explain briefly but clearly how you will create the sample and why.

And

DGIM Algorithm 1) Suppose the window is as shown in Fig. 4.2. Estimate the number of 1s the last k positions, for k = (a) 5 (b) 15 In each case, how far off the correct value is your estimate?

image text in transcribed 2) Study the example in section 4.6.7 Extensions to the Counting of Ones. Use the technique of Section 4.6.6 to estimate the total error. Show that if each ci has fractional error at most e, then the estimate of the true sum has error at most e. Briefly explain each step in your solution. (Book- Mining Massive Datasets, chapter 4)

1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 10 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 01 1 0 0 1 0 1 1 0 At least one One of Two of Two of size 4 of size 8 size 2 size 1 Figure 4.2: A bit-stream divided into buckets following the DGIM rules 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 10 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 01 1 0 0 1 0 1 1 0 At least one One of Two of Two of size 4 of size 8 size 2 size 1 Figure 4.2: A bit-stream divided into buckets following the DGIM rules

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Refactoring Databases Evolutionary Database Design

Authors: Scott Ambler, Pramod Sadalage

1st Edition

0321774515, 978-0321774514

More Books

Students also viewed these Databases questions