Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In Python 3, modify the following k-means algorithm to analyze the earthquake data from the following excel file (save it, you don't have to import

In Python 3, modify the following k-means algorithm to analyze the earthquake data from the following excel file (save it, you don't have to import it from the internet): https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv and create clusters for earthqaukes that happen on the east coast, the west coast, and the midwest in the United States. Once you have the clusters, use the turtle module to draw a dot on this map of the United States(http://i64.tinypic.com/2i739t4.jpg) where each earthquake occurred and each cluster should be a different cluster. You can only use the math, random, and turtle modules. Here's the code that needs modified:

import math

import random

import turtle

def euclid(p1, p2):

total = 0

for i in range(len(p1)):

total += (p2[i] - p1[i])**2

return math.sqrt(total)

def getData(afile):

datafile = open(afile,'r')

thedict = {}

key = 1

for line in datafile:

score = int(line)

thedict[key] = [score]

key += 1

return thedict

def centroids(k, datadict):

centroidL = []

centroidCount = 0

centroidKeys = []

while centroidCount < k:

randomkey = random.randint(1, len(datadict))

if randomkey not in centroidKeys:

centroidL.append(datadict[randomkey])

centroidKeys.append(randomkey)

centroidCount += 1

return centroidL

def createClusters(k, centroidL, datadict, repeat):

for apass in range(repeat):

clusterL = []

for i in range(k):

clusterL.append([])#add an empty list for each cluster

for akey in datadict:

distances = []

for cindex in range(k):

dist = euclid(datadict[akey],centroidL[cindex])

distances.append(dist)

minD = min(distances) # smallest distance

index = distances.index(minD)

clusterL[index].append(akey)

dimension = len(datadict[1])

for cindex in range(k):

totals = [0]*dimension #repeat 0 dimension times, in a list

for item in clusterL[cindex]:

points = datadict[item] #get data from dictionary

for ind in range(len(points)):

totals[ind] += points[ind]

for ind in range(len(totals)):

clusterLen = len(clusterL[cindex])

if clusterLen != 0:

totals[ind] /= clusterLen

centroidL[cindex] = totals

#print the clusters

for c in clusterL:

print("Cluster", apass)

for k in c:

print(datadict[k], end=" ")

print()#newline

return clusterL

#testing

point1 = [4, 6, 12]

point2 = [-3, 4, -2]

#print(euclid(point1,point2))

data = getData('scores.txt')

#print(data)

cent = centroids(5, data)

#print(cent)

CL = createClusters(5, cent, data, 3)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Logic In Databases International Workshop Lid 96 San Miniato Italy July 1 2 1996 Proceedings Lncs 1154

Authors: Dino Pedreschi ,Carlo Zaniolo

1st Edition

3540618147, 978-3540618140

More Books

Students also viewed these Databases questions

Question

manageremployee relationship deteriorating over time;

Answered: 1 week ago