Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 30, 2024

algorithm but definitely different. Step 1 . Randomly select k objects as initial representative objects. Step 2 . For each of non - representative (

algorithm but definitely different.

Step

1 .

Randomly select

k

objects as initial representative objects.

Step

2 .

For each of non

-

representative

(

unselected

)

objects, compute the distances to

k

representative

(

selected

)

objects and assign to the closest one to obtain a clustering

result.

Step

3 .

Find a new representative object of each cluster, which minimizes the sum of the

distances to other objects in its cluster. Update the current representative object in

each cluster by replacing with the new one.

Step

4 .

If the newly updated representative objects are the same with the previous ones,

then stop. Otherwise, go to Step

1 .

(1) (3

pts

)

What would be the strength

(

)

of this algorithm over the original

k -

means

algorithm? Explain why.

(2) (3

pts

)

What would be the strength

(

)

of this algorithm over the PAM

(

Partitioning

Around Medoids

)

algorithm? Explain why.

(3

pts

)

Suppose that we perform PCA using the five

-

dimensional dataset shown below.

\

table

[[

1,

2,

3,

4,

5], [2, 4, 0.4, 0.2, 0.02], [5, 10, 1.0, 0.5, 0.05], [1, 2, 0.2, 0.1, 0.01], [6, 12, 1.2, 0.6, 0.06], [8, 16, 1.6, 0.8, 0.08], [3, 6, 0.6, 0.3, 0.03], [4, 8, 0.8, 0.4, 0.04], [7, 14, 1.4, 0.7, 0.07], [9, 18, 1.8, 0.9, 0.09], [10, 20, 2.0, 1.0, 0.10]]

How much variability of the dataset can be explained by the first principal component? Explain why.

(6

pts

)

Consider the similarity matrix of four data points

(A, B, C, D)

shown below.

(1) (3

pts

)

Find the optimal clustering result that maximizes the following quantity,

Z = \frac{_{k = 1}^{3}}{b} a r (_{i j i n C_{k}}^{?} s (i, j)),

where

s (i, j)

is the similarity between object i and

j,

and

C_{k}

indicates the

k

th cluster.

Notice that the number of clusters is

3 .

If there are multiple optimal results, find them all.

(2) (3

pts

)

Covert similarities to distances and cluster the four points using complete linkage.

Draw a dendrogram.

(5

pts

)

Answer the following questions using the datasets in the figure shown below. Note that each dataset contains

1, 000

items and

10, 000

transactions. Dark cells indicate ones

(

presence of items

)

and white cells indicate zeros

(

absence of items

) .

We will apply the apriori algorithm to extract frequent itemsets with minsup

= 10 % (

.

.,

itemsets must be contained in at least

1, 000

transactions.

)

(

) (1) (

Ipt

)

Which dataset

(

)

will produce the most number of frequent itemsets? Explain why.

(2) (

lpt

)

Which dataset

(

)

will produce the fewest number of frequent itemsets? Explain why.

(3) (1

)

Which dataset

(

)

will produce the longest frequent itemset? Explain why.

(e)

(4) (

lpt

)

Which dataset

(

)

will prodyce the frequent itemset with highest support? Explain

|_{?_{?}} b

why.

(5) (1

)

Which dataset

(

)

will pooduce frequent itemsets with wide

-

varying support levels?

Explain why.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro PowerShell For Database Developers

Authors: Bryan P Cafferky

1st Edition

1484205413, 9781484205419

More Books

Students also viewed these Databases questions

Question

★★★★★

In doing physical therapy for an injured knee joint, a person raises a 5.0-kg weighted boot as shown in Fig. 8.39. Compute the torque due to the boot for each position shown. 90 40 cm 60 30

Answered: 1 week ago

Question

★★★★★

=+15.17 Research carried out to investigate the relationship between smoking status of workers and short-term absenteeism rate (hr/mo) yielded the accompanying summary information (Work-Related...

Answered: 1 week ago

Question

★★★★★

The borrowing period, in days, for a particular book at a University library can be regarded as a continuous random variable X with density function (i) What is the maximum period allowed for...

Answered: 1 week ago

Question

★★★★★

Actual demand for a product for the past three months was Three months ago ............. 400 units Two months ago ............. 350 units Last month ............. 325 units a. Using a simple three-...

Answered: 1 week ago

Question

★★★★★

ood luck. Question 1 6.5 pts What is the name of the supplementary records that a company creates and maintains in order to keep a record of the details of the accounts receivable or accounts payable...

Answered: 1 week ago

Question

★★★★★

Question: ( a ) How does an industrial steam generator differ from a utility boiler? ( b ) Why boiler water is to be treated? Explain briefly feedwater treatment.

Answered: 1 week ago

Question

★★★★★

Thank you for your support What is the expected return on a stock that has a 27% probability of a 35% return, a 15% probability of a 15% return, a 32 % probability of a -2%, and a probability of 26...

Answered: 1 week ago

Question

★★★★★

Calculate the ILP for each of the following scenarios. Use a table to record the relevant dependencies as part of your answer. A sample table is provided at the end of this problem

Answered: 1 week ago

Question

★★★★★

A B C D E F G H J K 1 Requirement #1: 2 During its first month of operation, the True Consulting Corporation, which specializes in management consulting, 3 completed the following transactions. 4 5...

Answered: 1 week ago

Question

★★★★★

Horace entered into the Achimota Mall branch of Shoprite with the intention of purchasing his favourite Royal energy drink. While in the shop, he spotted a bottle of the drink being displayed on one...

Answered: 1 week ago

Question

★★★★★

Determine the Appropriate Function and Write a QUERY: Choose a table, choose a column in the table that allows and has null values and replace (in your query) the null values with any value you...

Answered: 1 week ago

Question

★★★★★

Explain the potential advantages of e-learning for training. page 303

Answered: 1 week ago

Question

★★★★★

Choose appropriate evaluation design and training outcomes based on the training objectives and evaluation purpose. page 310

Answered: 1 week ago

Question

★★★★★

Design a training session to maximize learning. page 309

Answered: 1 week ago

Previous Question Next Question