Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Implement an iterative algorithm ( k - means ) in Spark to calculate k - means for a set of points that are in a

Implement an iterative algorithm (k-means) in Spark to calculate k-means for
a set of points that are in a file, a k-means algorithm in python. Do not use use K-means in MLib of Spark to solve the problem. Set the center points to k=5.
Follow this pattern:
Randomly assign a centroid to each of the k clusters (k =5).
Calculate the distance of all observation to each of the k centroids
Assign observations to the closest centroid
Find the new location of the centroid by taking the mean of all the observations in each cluster
Repeat steps 3-5 until the centroids do not change position
Note: You need a variable to decide when the K-means calculation is done when
the amount the locations of the means changes between iterations is less than the variable. Set
the variable =0.1.
Example of imput file (an rdd):
[(7869,8696),(8676,-4746),(9484,112526),(-1827,5958),(987,900087),(18127,9383),(298,272),(91716,2827),(12625,92827)........]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

Draw a schematic diagram of I.C. engines and name the parts.

Answered: 1 week ago