Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

algorithm but definitely different. Step 1 . Randomly select k objects as initial representative objects. Step 2 . For each of non - representative (

algorithm but definitely different.
Step 1. Randomly select k objects as initial representative objects.
Step 2. For each of non-representative (unselected) objects, compute the distances to k
representative (selected) objects and assign to the closest one to obtain a clustering
result.
Step 3. Find a new representative object of each cluster, which minimizes the sum of the
distances to other objects in its cluster. Update the current representative object in
each cluster by replacing with the new one.
Step 4. If the newly updated representative objects are the same with the previous ones,
then stop. Otherwise, go to Step 1.
(1)(3pts) What would be the strength(s) of this algorithm over the original k-means
algorithm? Explain why.
(2)(3pts) What would be the strength(s) of this algorithm over the PAM (Partitioning
Around Medoids) algorithm? Explain why.
(3pts) Suppose that we perform PCA using the five-dimensional dataset shown below.
\table[[X1,X2,X3,X4,X5],[2,4,0.4,0.2,0.02],[5,10,1.0,0.5,0.05],[1,2,0.2,0.1,0.01],[6,12,1.2,0.6,0.06],[8,16,1.6,0.8,0.08],[3,6,0.6,0.3,0.03],[4,8,0.8,0.4,0.04],[7,14,1.4,0.7,0.07],[9,18,1.8,0.9,0.09],[10,20,2.0,1.0,0.10]]
How much variability of the dataset can be explained by the first principal component? Explain why.
(6pts) Consider the similarity matrix of four data points (A,B,C,D) shown below.
(1)(3pts) Find the optimal clustering result that maximizes the following quantity,
Z=k=13bar(ijinCk?s(i,j)),
where s(i,j) is the similarity between object i and j, and Ck indicates the k th cluster.
Notice that the number of clusters is 3. If there are multiple optimal results, find them all.
(2)(3pts) Covert similarities to distances and cluster the four points using complete linkage.
Draw a dendrogram. (5pts) Answer the following questions using the datasets in the figure shown below. Note that each dataset contains 1,000 items and 10,000 transactions. Dark cells indicate ones (presence of items) and white cells indicate zeros (absence of items). We will apply the apriori algorithm to extract frequent itemsets with minsup=10%(i.e., itemsets must be contained in at least 1,000 transactions.)
(c)(1)(Ipt) Which dataset(s) will produce the most number of frequent itemsets? Explain why.
(2)(lpt) Which dataset(s) will produce the fewest number of frequent itemsets? Explain why.
(3)(1pt) Which dataset(s) will produce the longest frequent itemset? Explain why. (e)
(4)(lpt) Which dataset(s) will prodyce the frequent itemset with highest support? Explain |??b
why.
(5)(1pt) Which dataset(s) will pooduce frequent itemsets with wide-varying support levels?
Explain why.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro PowerShell For Database Developers

Authors: Bryan P Cafferky

1st Edition

1484205413, 9781484205419

More Books

Students also viewed these Databases questions

Question

Design a training session to maximize learning. page 309

Answered: 1 week ago