Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Google Colab Clustering Assignment Instructions Objective: Perform unsupervised learning on a vehicular dataset using k - means clustering to identify cluster centroids for three different
Google Colab Clustering Assignment Instructions
Objective: Perform unsupervised learning on a vehicular dataset using kmeans clustering to identify cluster centroids for three different ECU signatures, namely steering, speed, and RPM
Note: This data set was obtained from three sedan vehicles of a single make Nissan It has been pre processed to obtain the columns relevant to the signatures you will need to use as inputs. These columns are ECUsteering ECUFtachometer and ECUspeed They contain physical or actual values of these signatures at different time instants. For those interested, the units of speed and tachometer are in miles per hour mph and revolutions per minute RPM
Instructions:
Navigate to colab.research.google.com in your browser and open the ECU Clustering.ipynb Python notebook file using Google Colab File Open Notebook or ctrlo
Execute cells individually by clicking on the Run cell icon. Alternatively, after you select a cell, you can hit ctrlenter to execute it
The notebook has been segregated into three sections: Section and contain the k means clustering implementations for ECU signatures speed, tachometer, and steering, respectively.
The following are cells where you need to make modifications for completing the table.
a Code cells and need to be modified to accommodate a MinMax scaling function to normalize the input data ECU signatures. Use the same scaling function for all the ECU signatures.
b Identify the optimal number of clusters K and the sum of squared error SSE using the elbow method for each of the three ECU signatures.
c Verify your choice of clusters by comparing the results of a clustering metric called the CalinskiHarabasz CH score by using different numbers of clusters.
d Provide descriptive statistics, that is the minimum, maximum, and mean, for each cluster and for each ECU signature. Use subscripts to designate the statistics for that particular cluster. For instance, the mean value for cluster could be written as Mean
Populate the following three tables with your observations.
Table I: Evaluating number of clusters for speed ECU signature
Number of Clusters K SSE Elbow Method CH Score Min, Max, Mean
K
K
K
Table II: Evaluating number of clusters for RPM ECU signature
Number of Clusters K SSE Elbow Method CH Score Min, Max, Mean
K
K
K
Table III: Evaluating number of clusters for steering ECU signature
Number of Clusters K SSE Elbow Method CH Score Min, Max, Mean
K
K
K
Answer the following questions based on your findings.
How does the CH score change as the number of clusters K is increased? Provide a justification for your answer.
Why cant metrics such as precision or recall be used to evaluate the performance of clustering algorithms like kmeans
What is the optimal number of clusters K that shows consensus among the elbow evaluation method and the CH score?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started