Answered step by step
Verified Expert Solution
Question
1 Approved Answer
2. (75 points) Download the file trans.txt and implement a streaming algorithm for mining the top- k most frequent patterns. In the data file trans.txt,
2. (75 points) Download the file "trans.txt" and implement a streaming algorithm for mining the top- k most frequent patterns. In the data file "trans.txt", every line is a transaction represented by a set of item ids and the largest transaction contains 15 items. a) (15 points) Prove that to mine top- k most frequent patterns, we do not need to consider patterns of size greater than m=log2(k+1). b) (60 points) Apply the idea of the Misra-Gries Algorithm to mine approximate frequent patterns by scanning each transaction only once. Specifically, implement your algorithm as follows. (1). Maintain at most C counters. Each counter is a (key, value) pair where "key" represents a specific pattern and "value" indicates the corresponding (approximate) support of the pattern. (2). When reading a transaction, enumerate all its subsets of size at most m. Suppose for the i-th transaction we have Li such valid subsets and clearly, Li=j=1min(li,m)(lij) where li is the size of the i-th transaction. Transform the i-th transaction to a stream of Li subsets (the order could be arbitaray) and use the Misra-Gries Algorithm to count each subset's number of appearances (support). b.1) (8 points) Suppose in total we have M transactions. Let L=i=1MLi. Suppose fS is the real support of a pattern S and f^S is the approximate support maintained by your Misra-Gries Algorithm. Prove that for any pattern S, we have that fSf^SfSC+1L. b.2) (7 points) Suppose Sk is the real k-th most frequent pattern. Let f^k be the k-th largest (approximate) support obtained by your Misra-Gries Algorithm. Prove that fSkf^kfSkC+1L. b.3) (15 points) Since we only have the approximate supports of patterns obtained by our Misra-Gries Algorithm, we can only use such approximate supports to return approximate top- k patterns. We hope to collect all the true top- k patterns by returning a collection of patterns A={Sf^St} where t is a threshold for us to filter out nonfrequent patterns. Prove tha t=f^kC+1L (1 A points) (2) The minimum support of patterns in AminSup(A)=minSAfSfSkC+12L minSup(A)fSkC+12L. (9 points) b.4) (30 points) Set k=500. Run your Misra-Gries Algorithm on the "trans.txt" dataset and report the values of L and minSup(A) when setting C=500000,750000,1000000. To compute minSup(A), you can refer to the file 2. (75 points) Download the file "trans.txt" and implement a streaming algorithm for mining the top- k most frequent patterns. In the data file "trans.txt", every line is a transaction represented by a set of item ids and the largest transaction contains 15 items. a) (15 points) Prove that to mine top- k most frequent patterns, we do not need to consider patterns of size greater than m=log2(k+1). b) (60 points) Apply the idea of the Misra-Gries Algorithm to mine approximate frequent patterns by scanning each transaction only once. Specifically, implement your algorithm as follows. (1). Maintain at most C counters. Each counter is a (key, value) pair where "key" represents a specific pattern and "value" indicates the corresponding (approximate) support of the pattern. (2). When reading a transaction, enumerate all its subsets of size at most m. Suppose for the i-th transaction we have Li such valid subsets and clearly, Li=j=1min(li,m)(lij) where li is the size of the i-th transaction. Transform the i-th transaction to a stream of Li subsets (the order could be arbitaray) and use the Misra-Gries Algorithm to count each subset's number of appearances (support). b.1) (8 points) Suppose in total we have M transactions. Let L=i=1MLi. Suppose fS is the real support of a pattern S and f^S is the approximate support maintained by your Misra-Gries Algorithm. Prove that for any pattern S, we have that fSf^SfSC+1L. b.2) (7 points) Suppose Sk is the real k-th most frequent pattern. Let f^k be the k-th largest (approximate) support obtained by your Misra-Gries Algorithm. Prove that fSkf^kfSkC+1L. b.3) (15 points) Since we only have the approximate supports of patterns obtained by our Misra-Gries Algorithm, we can only use such approximate supports to return approximate top- k patterns. We hope to collect all the true top- k patterns by returning a collection of patterns A={Sf^St} where t is a threshold for us to filter out nonfrequent patterns. Prove tha t=f^kC+1L (1 A points) (2) The minimum support of patterns in AminSup(A)=minSAfSfSkC+12L minSup(A)fSkC+12L. (9 points) b.4) (30 points) Set k=500. Run your Misra-Gries Algorithm on the "trans.txt" dataset and report the values of L and minSup(A) when setting C=500000,750000,1000000. To compute minSup(A), you can refer to the file
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started