Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Classify each point in the testing dataset based on the fitted cluster centers. Report the numbers of points classified into each center and its corresponding

Classify each point in the testing dataset based on the fitted cluster centers. Report the numbers of points classified into each center and its corresponding MSE. Use two pie charts to convey the same information (training and testing). Does the training data fit the test data well? Draw conclusions about model performance. In python refering to code below:
import sys
import numpy as np
from pyspark import SparkConf, SparkContext
from pyspark.mllib.clustering import KMeans
# Helpers
def parse_vector(line, sep=','):
"""Parses a line.
Returns: numpy array of the latitude and longitude
"""
fields = line.strip().split(sep)
latitude = float(fields[1])
longitude = float(fields[2])
return np.array([latitude, longitude])
# Main
if __name__=="__main__":
if len(sys.argv)!=3:
print >> sys.stderr, "Usage: kmeans "
exit(-1)
# Configure Spark
conf = SparkConf().setMaster("local")\
.setAppName("Earthquake Clustering")\
.set("spark.executor.memory", "2g")
sc = SparkContext(conf=conf)
# Create training RDD of (lat, long) vectors
earthquakes_file = sys.argv[1]
training = sc.textFile(earthquakes_file).map(parse_vector)
# Train KMeans models for different values of k
k_values = range(2,11)
mse_values =[]
for k in k_values:
model = KMeans.train(training, k, maxIterations=10, initializationMode="random")
mse = model.computeCost(training)
mse_values.append(mse)
# Find the optimal k using the elbow method
import matplotlib.pyplot as plt
plt.plot(k_values, mse_values)
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Mean Squared Error (MSE)')
plt.title('Elbow Method for Optimal k')
plt.show()
# Train the model with the optimal k
optimal_k =3 # You can change this based on the elbow plot
model = KMeans.train(training, optimal_k, maxIterations=10, initializationMode="random")
# Print the cluster centers
print("Earthquake cluster centers:")
print(model.clusterCenters)
# Plot the cluster centers on a map (using a library like Folium)
# This part requires an additional library (Folium) and code for map visualization
sc.stop()

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems Design Implementation And Management

Authors: Carlos Coronel, Steven Morris

14th Edition

978-0357673034

More Books

Students also viewed these Databases questions

Question

How do you talk about your complaining customers?

Answered: 1 week ago