Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Google Colab Clustering Assignment Instructions Objective: Perform unsupervised learning on a vehicular dataset using k - means clustering to identify cluster centroids for three different

Google Colab Clustering Assignment Instructions
Objective: Perform unsupervised learning on a vehicular dataset using k-means clustering to identify cluster centroids for three different ECU signatures, namely steering, speed, and RPM.
Note: This data set was obtained from three sedan vehicles of a single make (Nissan). It has been pre- processed to obtain the columns relevant to the signatures you will need to use as inputs. These columns are ECU300(steering), ECU1F9(tachometer), and ECU280(speed). They contain physical (or actual) values of these signatures at different time instants. For those interested, the units of speed and tachometer are in miles per hour (mph) and revolutions per minute (RPM).
Instructions:
1) Navigate to colab.research.google.com in your browser and open the ECU Clustering.ipynb Python notebook file using Google Colab (File > Open Notebook (or ctrl+o)).
2) Execute cells individually by clicking on the Run cell icon. Alternatively, after you select a cell, you can hit (ctrl+enter) to execute it.
3) The notebook has been segregated into three sections: Section 1,2, and 3 contain the k- means clustering implementations for ECU signatures speed, tachometer, and steering, respectively.
4) The following are cells where you need to make modifications for completing the table.
a) Code cells 3,7, and 11 need to be modified to accommodate a MinMax scaling function to normalize the input data (ECU signatures. Use the same scaling function for all the ECU signatures.
b) Identify the optimal number of clusters (K) and the sum of squared error (SSE) using the elbow method for each of the three ECU signatures.
c) Verify your choice of clusters by comparing the results of a clustering metric called the Calinski-Harabasz (CH) score by using different numbers of clusters.
d) Provide descriptive statistics, that is, the minimum, maximum, and mean, for each cluster and for each ECU signature. Use subscripts to designate the statistics for that particular cluster. For instance, the mean value for cluster 1 could be written as Mean1.
Populate the following three tables with your observations.
Table I: Evaluating number of clusters for speed ECU signature
Number of Clusters (K) SSE (Elbow Method) CH Score Min, Max, Mean
K =3
K =4
K =5
Table II: Evaluating number of clusters for RPM ECU signature
Number of Clusters (K) SSE (Elbow Method) CH Score Min, Max, Mean
K =3
K =4
K =5
Table III: Evaluating number of clusters for steering ECU signature
Number of Clusters (K) SSE (Elbow Method) CH Score Min, Max, Mean
K =3
K =4
K =5
Answer the following questions based on your findings.
1. How does the CH score change as the number of clusters (K) is increased? Provide a justification for your answer.
2. Why cant metrics such as precision or recall be used to evaluate the performance of clustering algorithms like k-means++?
3. What is the optimal number of clusters (K) that shows consensus among the elbow evaluation method and the CH score?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

what is a peer Group? Importance?

Answered: 1 week ago

Question

design a simple performance appraisal system

Answered: 1 week ago