Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Problem 2 (10 points) (Exercise 6.3.4 MMDS book) Suppose we perform the PCY Algo- rithm to find frequent pairs, with market-basket data meeting the following
Problem 2 (10 points) (Exercise 6.3.4 MMDS book) Suppose we perform the PCY Algo- rithm to find frequent pairs, with market-basket data meeting the following specifications: 1. The support threshold is 10,000. 2. There are one million items, represented by the integers 0, 1, 999999 3. There are 250, 000 frequent items, that is, items that occur 10,000 times or more. 4 Thare are oae millipstur 10,00 tines or ore 5. There are P pairs that occur exactly once and consist of two frequent items 6. No other pairs occur at all. 7. Integers are always represented by 4 bytes. 8. When we hash pairs, they distribute among buckets randomly, but as evenly as possible i.e., you may assume that each bucket gets exactly its fair share of the P pairs that occur once Suppose there are S bytes of main memory. In order to run the PCY Algorithm successfully, the number of buckets must be sufficiently large that most buckets are not frequent. In addition, on the second pass, there must be enough room to count all the candidate pairs. As a function of S, what is the largest value of P for which we can successfully run the PCY Algorithm on this data
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started