Question
So I'm looking for some help and guidance on this coding. I'm new to program and although I understand the idea I'm behind on the
So I'm looking for some help and guidance on this coding. I'm new to program and although I understand the idea I'm behind on the langauage.
Thanks
One of the most popular algorithms for p erforming clustering is the k - means method. The algorithm depends on the notion of distance between two points. For points with only one dimension (just single values), we can define the distance between two points and as The k - means algorithm will work by placing points into clusters an d computing their centroids , which is defined as the average of the data points in the cluster. Specifi cally, the algorithm works as follows:
1. Pick k, the number of clusters.
2. Initialize clusters by picking one point (centroid) per cluster. F or this assignment, you can pick the first k points as initial centroids for each corresponding cluster .
3. For each point, place it in the cluster whose current centroid it is nearest.
4. After all points are assigned, update the locations of centroids of the k clusters
5. Reassign all points to their closest centroid. This sometimes moves points between clusters.
6. Repeat 4,5 until convergence. Convergence occurs when points don t move between clusters and centroids stabilize.
Requirements Y ou are to create a program using Python that does the following:
1. Ask s the user for a filename which contains the point data which is to be clustered (see Data File Format section for details) .
2. Ask s the user for the name of the output file.
3. Ask s the user for the number of clusters. This is the parameter k that will be used for k - means.
4. Read the input file a nd stores the points into a list 5. Appl ies the k - means algorithm to find the cluster for each point.
6. Display s the points that each cluster contains after each iteration of the algorithm
7. W r ite s the final cluster assignments t o the output file. YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, P ANDAS, ... ) - NO IMPORT STATEMENTS .
Additional Requirements
1. The name of your source code file should be kMeans .py . All your code should be within a single file.
2. Your code should follow good coding practices, including good use of whitespace and use of both inline and block comments.
3. You need to use meaningful identifier names that conform to standard naming conventions.
4. At the top of each file, you need to put in a block comment with the following information: your name, date, course name, semester, and assignment name.
5. The output of your program should exactly match the sample program output given at the end. Data File Format Le t N be the number of points and Pi to be the value o f point i . The input file should be of the following format: P1 P2 ... PN Example: 1.2 2.1 4.56 2.113 2.2
Sample Program Output
70
-
510, [semester] [year]
NAME: [put your name here]
PROGRAMMING ASSIGN
MENT #2
Enter the name of the input file: prog2
-
input
-
d
ata.txt
Enter the name of the output file: prog2
-
output
-
data.txt
Enter the number of clusters: 5
Iteration 1
0 [1.8]
1 [4.5, 6.5]
2 [1.1, 0.5]
3 [2.1, 3.2]
4 [9.8, 7.6, 11.32]
Iteration 2
0 [1.8, 2.1]
1 [4.5, 6.5]
2 [1.1, 0.5]
3 [3.2]
4 [9.8, 7.6, 11.32]
Iteration 3
0 [1.8, 2.1]
1 [4.5, 6.5]
2 [1.1, 0.5]
3 [3.2]
4 [9.8, 7.6, 11.32]
Output File Contents
Point 1.8 in cluster 0
Point 4.5 in cluster 1
Point 1.1 in cluster 2
Point 2.1 in cluster 0
Point 9.8 in cluster 4
Point 7.6 in cluster 4
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started