Question
In Python 3, modify the following k-means algorithm to analyze the earthquake data from the following excel file (save it, you don't have to import
In Python 3, modify the following k-means algorithm to analyze the earthquake data from the following excel file (save it, you don't have to import it from the internet): https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv and create clusters for earthqaukes that happen on the east coast, the west coast, and the midwest in the United States. Once you have the clusters, use the turtle module to draw a dot on this map of the United States(http://i64.tinypic.com/2i739t4.jpg) where each earthquake occurred and each cluster should be a different cluster. You can only use the math, random, and turtle modules. Here's the code that needs modified:
import math
import random
import turtle
def euclid(p1, p2):
total = 0
for i in range(len(p1)):
total += (p2[i] - p1[i])**2
return math.sqrt(total)
def getData(afile):
datafile = open(afile,'r')
thedict = {}
key = 1
for line in datafile:
score = int(line)
thedict[key] = [score]
key += 1
return thedict
def centroids(k, datadict):
centroidL = []
centroidCount = 0
centroidKeys = []
while centroidCount < k:
randomkey = random.randint(1, len(datadict))
if randomkey not in centroidKeys:
centroidL.append(datadict[randomkey])
centroidKeys.append(randomkey)
centroidCount += 1
return centroidL
def createClusters(k, centroidL, datadict, repeat):
for apass in range(repeat):
clusterL = []
for i in range(k):
clusterL.append([])#add an empty list for each cluster
for akey in datadict:
distances = []
for cindex in range(k):
dist = euclid(datadict[akey],centroidL[cindex])
distances.append(dist)
minD = min(distances) # smallest distance
index = distances.index(minD)
clusterL[index].append(akey)
dimension = len(datadict[1])
for cindex in range(k):
totals = [0]*dimension #repeat 0 dimension times, in a list
for item in clusterL[cindex]:
points = datadict[item] #get data from dictionary
for ind in range(len(points)):
totals[ind] += points[ind]
for ind in range(len(totals)):
clusterLen = len(clusterL[cindex])
if clusterLen != 0:
totals[ind] /= clusterLen
centroidL[cindex] = totals
#print the clusters
for c in clusterL:
print("Cluster", apass)
for k in c:
print(datadict[k], end=" ")
print()#newline
return clusterL
#testing
point1 = [4, 6, 12]
point2 = [-3, 4, -2]
#print(euclid(point1,point2))
data = getData('scores.txt')
#print(data)
cent = centroids(5, data)
#print(cent)
CL = createClusters(5, cent, data, 3)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started