Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.cluster import KMeans # Sample data ( replace this with your actual

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Sample data (replace this with your actual data)
data ={
"Passenger Count": [27271,29131,5415,35156,34090],
"Price Category Code_Low Fare": [True, True, True, False, False],
"Price Category Code_Other": [False, False, False, True, True],
"GEO Summary_Domestic": [True, True, True, False, False],
"GEO Summary_International": [False, False, False, True, True],
"Cluster": [0,0,0,3,3]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Extract features for clustering (you can select specific columns based on your requirement)
features = df[["Passenger Count", "Price Category Code_Low Fare", "Price Category Code_Other", "GEO Summary_Domestic", "GEO Summary_International"]]
# Fit K-means clustering model
kmeans = KMeans(n_clusters=4) # Change the number of clusters as per your analysis
kmeans.fit(features)
# Add cluster labels to the DataFrame
df['Cluster']= kmeans.labels_
# Visualize the clusters for all pairs of features
fig, axs = plt.subplots(2,3, figsize=(18,12))
# Flatten the axs array for easy iteration
axs = axs.flatten()
# Initialize a counter for the subplot index
subplot_index =0
# Plot each pair of features with color-coded clusters
for i, feature1 in enumerate(features.columns):
for j, feature2 in enumerate(features.columns):
if i < j: # This ensures that each pair is plotted only once
ax = axs[subplot_index]
ax.scatter(df[feature1], df[feature2], c=df['Cluster'], cmap='viridis', s=50, alpha=0.7)
ax.set_xlabel(feature1)
ax.set_ylabel(feature2)
subplot_index +=1 # Increment the subplot index
# Plot cluster centroids for each pair of features
for cluster in range(kmeans.n_clusters):
subplot_index =0 # Reset the subplot index for centroids
for i, feature1 in enumerate(features.columns):
for j, feature2 in enumerate(features.columns):
if i < j: # This ensures that each pair is plotted only once
ax = axs[subplot_index]
ax.scatter(kmeans.cluster_centers_[cluster][i], kmeans.cluster_centers_[cluster][j], c='red', marker='x', s=200, label=f'Cluster {cluster}')
subplot_index +=1 # Increment the subplot index
plt.tight_layout()
plt.show()
For the above code i am getting the below error
{
"name": "IndexError",
"message": "index 6 is out of bounds for axis 0 with size 6",
"stack": "---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[178], line 42
40 for j, feature2 in enumerate(features.columns):
41 if i < j: # This ensures that each pair is plotted only once
--->42 ax = axs[subplot_index]
43 ax.scatter(df[feature1], df[feature2], c=df['Cluster'], cmap='viridis', s=50, alpha=0.7)
44 ax.set_xlabel(feature1)
IndexError: index 6 is out of bounds for axis 0 with size 6"
Kindly resolve this . also i am sharing the dataframe information : Dimensions of DataFrame (rows, columns): (5,6)
Column labels: Index(['Passenger Count', 'Price Category Code_Low Fare',
'Price Category Code_Other', 'GEO Summary_Domestic',
'GEO Summary_International', 'Cluster'],
dtype='object')

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions