(Implementation project) There are four typical data cube computation methods: MultiWay [ZDN97], BUC [BR99], H-Cubing [HPDW01], and...
Question:
(Implementation project) There are four typical data cube computation methods: MultiWay [ZDN97], BUC [BR99], H-Cubing [HPDW01], and Star-Cubing [XHLW03].
a. Implement any one of these cube computation algorithms and describe your implementation, experimentation, and performance. Find another student who has implemented a different algorithm on the same platform (e.g., C++ on Linux) and compare your algorithm performance with his or hers. Input:
i. An \(n\)-dimensional base cuboid table (for \(n<20\) ), which is essentially a relational table with \(n\) attributes.
ii. An iceberg condition: count \((C) \geq k\), where \(k\) is a positive integer as a parameter. Output:
i. The set of computed cuboids that satisfy the iceberg condition, in the order of your output generation.
ii. Summary of the set of cuboids in the form of "cuboid ID: the number of nonempty cells," sorted in alphabetical order of cuboids (e.g., \(A\) : \(155, A B: 120, A B C: 22, A B C D\) : 4, \(A B C E: 6, A B D: 36\) ), where the number after : represents the number of nonempty cells. (This is used to quickly check the correctness of your results.)
b. Based on your implementation, discuss the following:
i. What challenging computation problems are encountered as the number of dimensions grows large?
ii. How can iceberg cubing solve the problems of part (a) for some data sets (and characterize such data sets)?
iii. Give one simple example to show that sometimes iceberg cubes cannot provide a good solution.
c. Instead of computing a high-dimensionality data cube, we may choose to materialize the cuboids that have only a small number of dimension combinations. For example, for a 30-D data cube, we may only compute the 5-D cuboids for every possible 5-D combination. The resulting cuboids form a shell cube. Discuss how easy or hard it is to modify your cube computation algorithm to facilitate such computation.
Step by Step Answer:
Data Mining Concepts And Techniques
ISBN: 9780128117613
4th Edition
Authors: Jiawei Han, Jian Pei, Hanghang Tong