Question

1 Approved Answer

Posted on Oct 15, 2024

Here is the link to the data set I am using: https://files.catbox.moe/8rj2hu.csv Just copy and paste the link into your browser and it will download

Here is the link to the data set I am using: https://files.catbox.moe/8rj2hu.csv

Just copy and paste the link into your browser and it will download the csv dataset file.

The second image is just to test the function works.

Thank you so much

1+; 2) Write an Python-function entropy(a, b) that computes the entropy and the percentage of outliers of a clustering result based on an apriori given set of class labels, where or gives the assignment of objects in O to clusters, and b contains the class labels of the examples in O. The entropy function H is dened as follows: Assume we have m classes in our clustering problem; for each cluster Ci we have proportions p, = (pil, ...,p,,,.) of examples belonging to the m different classes (for cluster numbers i =1, ..., k); the entropy of a cluster C, is computed as follows: H(p,) = 21-409,} * lo g2 (17%)) (H is called the entropy function) Moreover, if pij = 0,19,} * logz (pit-j) is dened to be 0 The entropy of a clustering X is the size-weighted sum of the entropies on the individual clusters: H(X) = 2r=1(lCr|/| Epl /|Cp|) * H(pr) In the above formulas 'I. . .I' represents the set cardinality function Moreover, we assume that X = {C1, ..., Ck} is a clustering with k clusters C1, ..., Ck You can assume that cluster 0 contains all the outliers, and clusters 1, 2, ..., k represents \"true\" clusters; therefore, you should ignore cluster 0 and its instances when computing H(X). The entropy function returns a vector: (, ); e.g. if the function return (0.1 1 , 0.2), this would indicate that entropy is 0.11, but 20% of the objects in the dataset O has been classified as outliers : #DO NOT EDIT OR DELETE THIS CELL #1ST TEST CASE al = (0, 1, 1, 1, 1, 2, 2, 3) bl = ( 'A' , 'A' , 'A' , 'E' , 'E' , 'D', 'D' , C' ) #2ND TEST CASE a2 = (1, 1, 1, 0, 0, 2, 2, 2) b2 = ( 'A' , 'A' , 'A' , 'E' , 'E' , 'D', 'D' , 'C') #testing function print ('Ist case test: ' , entropy (al, bl) ) print ( '2nd case test: ' , entropy (a2, b2) ) Ist case test: (0.5714285714285714, 0.125) 2nd case test: (0. 4591479170272448, 0.25)