Answered step by step
Verified Expert Solution
Question
1 Approved Answer
1. If {1,2,3},{1,2,4},{1,3,4},{1,2,5}, and {3,4,5} are ALL the large 3 -itemsets, list all of the large 2-itemsets out of the original data set {1,2,3,4,5}. (2
1. If {1,2,3},{1,2,4},{1,3,4},{1,2,5}, and {3,4,5} are ALL the large 3 -itemsets, list all of the large 2-itemsets out of the original data set {1,2,3,4,5}. (2 pts) 2. Given the following (incomplete) data set about users of a web site and its document search and retrieval history, describe how you will apply association rule mining to the given data set by identifying the equivalents of items, item-grouping attribute (e.g., transaction) as well as the potential use of the patterns you obtain from the data set. Hint: You don't have to use all of the attributes in each answer ( 3 pts) CLUSTERING Given the distance function of the following data set, show the dendrogram generated by Bottom-up hierarchical clustering methods (using min cluster distance) (5 pts.) I will give you the first two steps for "free": Step 1. Find the minimal distance, which is 45, form the first cluster Step 2. Re-build the table as following: 1. Nave Bayes (5 pts). Below is a data set about stolen cars. We have a YELLOW SUV DOMESTIC. Predict whether it is stolen using Nave Bayes. Show your process. 2. Consider the training dataset given below. In the dataset, "Purchase" is the label. Which attribute has the highest information gain, i.e., reduction in Gini Index? Justify your answer. Use Gini Index (1p12p22) as measurement criteria (5 pts) 3. Two image recognition classifiers are being compared for their performances using confusion matrix: a. (5 pts.) which classifier has higher recall rate? Show your computation b. (5 pts.) if this is an app developed for self-driving cars to recognize bikes in the road so as to avoid accidents, which measurement metrics should we optimize? Precision or recall? Why? 4. Selecting Data Mining Tasks and Methods (20 pts) The following describe the demographic, online browsing/shopping session and purchasing transaction information collected from a set of voluntary Internet users. Table Name: Demographics Household id num Unique household identifier hoh most education num Household Most Education census region num Census Region of Residence household size num Household Size hoh oldestage num Household Eldest Age household income num Household Income child present num Child Present racialabackground num Racial Background connectionsspeed num Connection Speed country of origin num Country of Origin Table Name: Session Answer the following questions: a) Can you perform association rule analysis in this data set? Justify your answer and state any assumption you make about the attributes and/or attribute val.. . if manama.. ic points) Screenshot b) eBay is interested in finding out what it is about an eBay customer that can help predict if the customer will buy large, moderate or small amounts (in dollars) of pre-owned products. Please recommend a data mining task that should be performed to find out what eBay is interested in and state the selection of all necessary attributes and their roles in the data mining task. Justify the recommendation and state any assumption you make about the attributes and/or attribute values if necessary. Please note that it is possible to find the insights in which eBay is interested via different data mining tasks with different choices of attributes respectively. You only need to specify one. As always, be creative and relevant at the same time. (15 points)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started