Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

We want to cluster categorical data, i.e. data that have categorical attribute domains. The k-medoid algorithm can be applied to any datasets with a given

image text in transcribed

We want to cluster categorical data, i.e. data that have categorical attribute domains. The k-medoid algorithm can be applied to any datasets with a given pair-wise distance function and, therefore, i:s applicable also to categorical data. The k-means algorithm, on the other hand, is much more efficient than the k-medoid algorithm, but it requires numeric data. The task of this assignment is to develop the equivalent of the k-means algorithm for categorical data. We assume the following distance function (Hamming distance) for pairs of categorical objects dist(x,y)-(x,y) with (x,y)- 0 if Xi- yi 1 else (a) What is the equivalent of m for the means of a cluster C in categorical data? Note that m must be computable by scanning the set of objects of C once (similar to the computation of the cluster means) (b) Prove that m according to your definition in (a) is the object minimizing the cluster cost TD(C,m)- dist(p,m) 1 Hint: first formulate the intuition of the proof, then formalize the proof. The proof can be performed by contradiction

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Microsoft SQL Server 2012 Unleashed

Authors: Ray Rankins, Paul Bertucci

1st Edition

0133408507, 9780133408508

More Books

Students also viewed these Databases questions