Section 5.1.4 introduced a core Pattern-Fusion method for mining high-dimensional data. Explain why a long pattern, if
Question:
Section 5.1.4 introduced a core Pattern-Fusion method for mining high-dimensional data. Explain why a long pattern, if existing in the data set, is likely to be discovered by this method.
Section 5.1.4
Our discussions of mining multidimensional patterns in the above two subsections are confined to patterns involving a small number of dimensions. However, some applications may need to mine highdimensional data (i.e., data with hundreds or thousands of dimensions). However, it is not easy to extend the previous multidimensional pattern mining methods to mine high-dimensional data because the search spaces of such methods grow exponentially with the number of dimensions. One interesting direction to handle high-dimensional data is to extend a pattern growth approach by exploring the vertical data format to handle data sets with a large number of dimensions (also called features or items, e.g., genes) but a small number of rows (also called transactions or tuples, e.g., samples).
This is useful in applications like the analysis of gene expressions in bioinformatics, for example, where we often need to analyze microarray data that contain a large number of genes (e.g., 10,000 to 100,000) but only a small number of samples (e.g., dozens to hundreds). Another direction is to develop a new methodology that focuses its mining effort on colossal patterns, that is, patterns of rather long length, instead of the complete set of patterns. One interesting such method is called Pattern-Fusion, which takes leaps in the pattern search space, leading to a good approximation of the complete set of colossal frequent patterns. We briefly outline the idea of patternfusion here and refer interested readers to the detailed technical paper. In some applications (e.g., bioinformatics), a researcher can be more interested in finding colossal patterns (e.g., long DNA and protein sequences) than finding small (i.e., short) ones since colossal patterns usually carry more significant meanings.
Finding colossal patterns is challenging because incremental mining tends to get “trapped” by an explosive number of midsize patterns before it can even reach candidate patterns of large size. All of the pattern mining strategies we have studied so far, such as Apriori and FP-growth, use an incremental growth strategy by nature, that is, they increase the length of candidate patterns by one at a time. Breadth-first search methods like Apriori cannot bypass the generation of an explosive number of midsize patterns generated, making it impossible to reach colossal patterns. Even depth-first search methods like FP-growth can be easily trapped in a huge number of subtrees before reaching colossal patterns. Clearly, a completely new mining methodology is needed to overcome such a hurdle.......
Step by Step Answer:
Data Mining Concepts And Techniques
ISBN: 9780128117613
4th Edition
Authors: Jiawei Han, Jian Pei, Hanghang Tong