Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Load the data into a DataFrame df = pd . read _ csv ( ' seeds . csv ' ) Correlation Analysis Calculate the correlation
Load the data into a DataFrame
df pdreadcsvseedscsv
Correlation Analysis
Calculate the correlation values
Reduce theNf x Nf f correlation DataFrame to nonredundant, nonidentical feature pairs with the corresponding correlation value
Examine and show the features pairings with a correlation value greater than
Partition Data
Extract the features into a new DataFrame named X
Extract the target labels into a DataFrame named y
Make sure the target labels are in a DataFrame, not an array or Series
Model predictions will be saved to this DataFrame
Use double brackets: dftarget
Identify best k with elbow method
Construct a function that produces the plot of SSE versus k The function should use a feature set X as input and return only the plot.
Consider the following pseudo code:
from sklearn.cluster import KMeans
def calculatessevskX:
## instantiate a list of array to hold the k ans SSE values at each iteration
ssevk
## iterate over values of k
for k in nparange:
## instantiate and fit KMeans with the value of k and fixed randomstate
## add the k value and SSE inertia to the list
## plot the resulting sse versus k pairs
pltfigurefigsize
pltscatterx ssevk: y ssevk: ## show as points
pltplotssevk:ssevk:
pltxlabelCluster number $k$
pltylabelSSE Inertia
pltxticksticks nparange
pltshow
Show the SSE verus k plot for unscaled data
What is the optimal value of k when clustering unscaled data?
Now, instantiate a StandardScaler, scale and transform x and show the SSE versus k for scaled data
What is the optimal value of k when clustering scaled data?
With optimal k values found, extract the cluster labels from KMeans Clustering
#### Unscaled data
Instantiate KMeans with the optimal k value found from the unscaled data. Be sure to employ the same randomstate that was used in the calculatessevsk function above.
Fit this KMeans with the unscaled data
Extract the labels from this KMeans and add these as a new column named kmlabel' to the target DataFrame
You may need to align the predicted labels to the actual labels
#### Scaled Data
Instantiate KMeans with the optimal k value found from the scaled data. Be sure to employ the same randomstate that was used in the calculatessevsk function above
Fit this KMeans with the scaled data
Extract the labels from this KMeans and add these as a new column named kmsslabel' to the target DataFrame
You may need to align the predicted labels to the actual labels
With optimal k values found, extract the cluster labels from Agglomerative Clustering
#### Unscaled data
Instantiate AgglomerativeClustering with the optimal k value found from the unscaled data, and linkage 'complete'.
Fit this AgglomerativeClustering with the unscaled data
Extract the labels from this AgglomerativeClustering and add these as a new column named 'agglabel' to the target DataFrame
You may need to align the predicted labels to the actual labels
#### Scaled Data
Instantiate AgglomerativeClustering with the optimal k value found from the scaled data, and linkage 'complete'.
Fit this AgglomerativeClustering with the scaled data
Extract the labels from this AgglomerativeClustering and add these as a new column named 'aggsslabel' to the target DataFrame
You may need to align the predicted labels to the actual labels
## Compare the clustering results to kNearest Neighbors Classifier
Instantiate KNeighborsClassifier from sklearn.neighbors with default settings, fit to the unscaled data
Tip: The target dataset y has additional columns from clustering predictions. Be sure to call ytarget
Add the predictions from the kNN classifier and the unscaled data to the target DataFrame as knnlabel
Repeat for scaled data. Add those predictions as knnsslabel
## Calculate the Accuracy Score and show the Confusion Matrix for all Predictors
There are six predictions to consider
KMeans clustering with unscaled and scaled data
Agglomerative clustering with unscaled and scaled data
kNN Classifier with unscaled and scaled data
Which predictor performed the best?
Was there any case when the predictor with unscaled data outperformed the same predictor with scaled data?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started