Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Solve the following set of problems using Python and submit the code file with extension . ipynb. 1 . Load the Obesity dataset. Remove unwanted

Solve the following set of problems using Python and submit the code file
with extension .ipynb.
1. Load the Obesity dataset. Remove unwanted features if required.
2. Select the optimum k value using Silhouette Coefficient and plot the optimum k
values.
3. Create clusters using Kmeans and Kmeans++ algorithms with optimal k value found in
the previous problem. Report performances using appropriate evaluation metrics.
Compare the results.
4. Now repeat clustering using KMeans for 50 times and report the average
performance. Again compare the results that you have obtained in Q3 using
Kmeans++ and explain the difference.
5. Apply DBSCAN on this same Obesity dataset and find the optimum "eps" and
"min_samples" value. Is the number of clusters the same as the cluster found in Q2?
Explain the similarity or differences that you have found between two solutions.
6. Load the gene expression dataset. Apply PCA on the genes for generating 3
principal components. Plot the first three components of the PCA.
7. Continue from question 6, what is the variance (%) covered by the first three
components? Explain how this percentage of variance has been computed?
8. Continue from question 6, apply KMeans on the original features of the gene dataset
and the first three components returned by PCA. Compare the results using the given labels. Kindly share the jupyter note (contaning the code, outputs and explanation to the answers)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions