Answered step by step
Verified Expert Solution
Question
1 Approved Answer
We want to cluster categorical data, i.e. data that have categorical attribute domains. The k-medoid algorithm can be applied to any datasets with a given
We want to cluster categorical data, i.e. data that have categorical attribute domains. The k-medoid algorithm can be applied to any datasets with a given pair-wise distance function and, therefore, i:s applicable also to categorical data. The k-means algorithm, on the other hand, is much more efficient than the k-medoid algorithm, but it requires numeric data. The task of this assignment is to develop the equivalent of the k-means algorithm for categorical data. We assume the following distance function (Hamming distance) for pairs of categorical objects dist(x,y)-(x,y) with (x,y)- 0 if Xi- yi 1 else (a) What is the equivalent of m for the means of a cluster C in categorical data? Note that m must be computable by scanning the set of objects of C once (similar to the computation of the cluster means) (b) Prove that m according to your definition in (a) is the object minimizing the cluster cost TD(C,m)- dist(p,m) 1 Hint: first formulate the intuition of the proof, then formalize the proof. The proof can be performed by contradiction
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started