Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

ASSOCIATION RULE MINING 1. If {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 2, 5}, and {3, 4, 5} are ALL the large

ASSOCIATION RULE MINING

1. If {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 2, 5}, and {3, 4, 5} are ALL the large 3-itemsets, list all of the large 2-itemsets out of the original data set - {1, 2, 3, 4, 5}.

2. Given the following (incomplete) data set about users of a web site and its document search and retrieval history, describe how you will apply association rule mining to the given data set by identifying the equivalents of items, item-grouping attribute (e.g., transaction) as well as the potential use of the patterns you obtain from the data set. Hint: You don't have to use all of the attributes in each answer

User ID

Session ID

Key words

Documents

U1

S1

K1 K2

D1

U1

S2

K2 K3 K4

D2

U1

S3

K2

D1

U1

S4

K3 K4 K5

D3

U1

S5

K4 K5

D2

U2

S6

K1 K2 K6

D1

U2

S7

K2 K6 K7

D2

U2

S8

K2 K6

D1

U2

S9

K8 K6 K2

D3

U2

S10

K2 K6 K9

D2

...

...

...

...

CLUSTERING

Given the distance function of the following data set, show the dendrogram generated by Bottom-up hierarchical clustering methods (using min cluster distance)

1

2

3

4

5

1

0

3

9

9

5

2

3

0

8

7

6

3

9

8

0

2

2

4

9

7

2

0

1

5

5

6

2

1

0

I will give you the first two steps for "free":

Step 1. Find the minimal distance, which is 4 - 5, form the first cluster

Step 2. Re-build the table as following:

1

2

3

4/5

1

0

3

9

7

2

3

0

8

6.5

3

9

8

0

2

4/5

7

6.5

2

0

CLASSIFICATION

1. Nave Bayes. Below is a data set about stolen cars. We have a YELLOW SUV DOMESTIC. Predict whether it is stolen using Nave Bayes. Show your process.

2. Consider the training dataset given below. In the dataset, "Purchase" is the label.

Customer

Age_group

Income_level

Education_level

Purchase?

1

0

1

0

Yes

2

0

0

0

No

3

1

1

1

Yes

4

1

0

1

No

5

0

1

1

No

Which attribute has the highest information gain, i.e., reduction in Gini Index? Justify your answer. Use Gini Index (1 - p12 - p22)as measurement criteria

3. Two image recognition classifiers are being compared for their performances using confusion matrix:

a. which classifier has higher recall rate? Show your computation

b. if this is an app developed for self-driving cars to recognize bikes in the road so as to avoid accidents, which measurement metrics should we optimize? Precision or recall? Why?

4. Selecting Data Mining Tasks and Methods

The following describe the demographic, online browsing/shopping session and purchasing transaction information collected from a set of voluntary Internet users.

Table Name: Demographics

Household_id

num

Unique household identifier

hoh_most_education

num

Household Most Education

census_region

num

Census Region of Residence

household_size

num

Household Size

hoh_oldest_age

num

Household Eldest Age

household_income

num

Household Income

child_present

num

Child Present

racial_background

num

Racial Background

connection_speed

num

Connection Speed

country_of_origin

num

Country of Origin

Table Name: Session

Session_id

Household_id

date

num

num

num

Unique session identifier

Household that signed on the session

Date

time

num

Time

domain_id

num

Domain ID

domain_name

char

Domain Name

site_category_id

num

Site Category ID

duration

num

Duration

page_viewed

num

Page Viewed

ref_domain_name

char

Referring Domain Name

Table Name: Transaction

Trans_id

Session_id

prod_category_id

Num

num

num

Unique transaction identifier

Session in which the transaction was completed

Product Category ID

prod_totprice

num

Product Total Price

prod_name

char

Product Name

basket_tot

num

Basket Total

Answer the following questions:

a. Can you perform association rule analysis in this data set? Justify your answer and state any assumption you make about the attributes and/or attribute values if necessary.

b. eBay is interested in finding out what it is about an eBay customer that can help predict if the customer will buy large, moderate or small amounts (in dollars) of pre-owned products. Please recommend a data mining task that should be performed to find out what eBay is interested in and state the selection of all necessary attributes and their roles in the data mining task. Justify the recommendation and state any assumption you make about the attributes and/or attribute values if necessary. Please note that it is possible to find the insights in which eBay is interested via different data mining tasks with different choices of attributes respectively. You only need to specify one. As always, be creative and relevant at the same time.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_step_2

Step: 3

blur-text-image_step3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Genetic Databases

Authors: Martin J. Bishop

1st Edition

ISBN: 0121016250, 978-0121016258

More Books

Students also viewed these Databases questions

Question

What is the Profitability ratios and how do you calculate it?

Answered: 1 week ago