Question
ASSOCIATION RULE MINING 1. If {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 2, 5}, and {3, 4, 5} are ALL the large
ASSOCIATION RULE MINING
1. If {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 2, 5}, and {3, 4, 5} are ALL the large 3-itemsets, list all of the large 2-itemsets out of the original data set - {1, 2, 3, 4, 5}.
2. Given the following (incomplete) data set about users of a web site and its document search and retrieval history, describe how you will apply association rule mining to the given data set by identifying the equivalents of items, item-grouping attribute (e.g., transaction) as well as the potential use of the patterns you obtain from the data set. Hint: You don't have to use all of the attributes in each answer
User ID
| Session ID
| Key words
| Documents
|
U1 | S1 | K1 K2 | D1 |
U1 | S2 | K2 K3 K4 | D2 |
U1 | S3 | K2 | D1 |
U1 | S4 | K3 K4 K5 | D3 |
U1 | S5 | K4 K5 | D2 |
U2 | S6 | K1 K2 K6 | D1 |
U2 | S7 | K2 K6 K7 | D2 |
U2 | S8 | K2 K6 | D1 |
U2 | S9 | K8 K6 K2 | D3 |
U2 | S10 | K2 K6 K9 | D2 |
... | ... | ... | ... |
CLUSTERING
Given the distance function of the following data set, show the dendrogram generated by Bottom-up hierarchical clustering methods (using min cluster distance)
| 1 | 2 | 3 | 4 | 5 |
1 | 0 | 3 | 9 | 9 | 5 |
2 | 3 | 0 | 8 | 7 | 6 |
3 | 9 | 8 | 0 | 2 | 2 |
4 | 9 | 7 | 2 | 0 | 1 |
5 | 5 | 6 | 2 | 1 | 0 |
I will give you the first two steps for "free":
Step 1. Find the minimal distance, which is 4 - 5, form the first cluster
Step 2. Re-build the table as following:
| 1 | 2 | 3 | 4/5 |
1 | 0 | 3 | 9 | 7 |
2 | 3 | 0 | 8 | 6.5 |
3 | 9 | 8 | 0 | 2 |
4/5 | 7 | 6.5 | 2 | 0 |
CLASSIFICATION
1. Nave Bayes. Below is a data set about stolen cars. We have a YELLOW SUV DOMESTIC. Predict whether it is stolen using Nave Bayes. Show your process.
2. Consider the training dataset given below. In the dataset, "Purchase" is the label.
Customer | Age_group | Income_level | Education_level | Purchase? |
1 | 0 | 1 | 0 | Yes |
2 | 0 | 0 | 0 | No |
3 | 1 | 1 | 1 | Yes |
4 | 1 | 0 | 1 | No |
5 | 0 | 1 | 1 | No |
Which attribute has the highest information gain, i.e., reduction in Gini Index? Justify your answer. Use Gini Index (1 - p12 - p22)as measurement criteria
3. Two image recognition classifiers are being compared for their performances using confusion matrix:
a. which classifier has higher recall rate? Show your computation
b. if this is an app developed for self-driving cars to recognize bikes in the road so as to avoid accidents, which measurement metrics should we optimize? Precision or recall? Why?
4. Selecting Data Mining Tasks and Methods
The following describe the demographic, online browsing/shopping session and purchasing transaction information collected from a set of voluntary Internet users.
Table Name: Demographics |
|
|
Household_id | num | Unique household identifier |
hoh_most_education | num | Household Most Education |
census_region | num | Census Region of Residence |
household_size | num | Household Size |
hoh_oldest_age | num | Household Eldest Age |
household_income | num | Household Income |
child_present | num | Child Present |
racial_background | num | Racial Background |
connection_speed | num | Connection Speed |
country_of_origin | num | Country of Origin |
|
|
|
Table Name: Session |
|
|
Session_id Household_id date | num num num | Unique session identifier Household that signed on the session Date |
time | num | Time |
domain_id | num | Domain ID |
domain_name | char | Domain Name |
site_category_id | num | Site Category ID |
duration | num | Duration |
page_viewed | num | Page Viewed |
ref_domain_name | char | Referring Domain Name |
|
|
|
Table Name: Transaction |
|
|
Trans_id Session_id prod_category_id | Num num num | Unique transaction identifier Session in which the transaction was completed Product Category ID |
prod_totprice | num | Product Total Price |
prod_name | char | Product Name |
basket_tot | num | Basket Total |
Answer the following questions:
a. Can you perform association rule analysis in this data set? Justify your answer and state any assumption you make about the attributes and/or attribute values if necessary.
b. eBay is interested in finding out what it is about an eBay customer that can help predict if the customer will buy large, moderate or small amounts (in dollars) of pre-owned products. Please recommend a data mining task that should be performed to find out what eBay is interested in and state the selection of all necessary attributes and their roles in the data mining task. Justify the recommendation and state any assumption you make about the attributes and/or attribute values if necessary. Please note that it is possible to find the insights in which eBay is interested via different data mining tasks with different choices of attributes respectively. You only need to specify one. As always, be creative and relevant at the same time.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started