Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

So I'm looking for some help and guidance on this coding. I'm new to program and although I understand the idea I'm behind on the

So I'm looking for some help and guidance on this coding. I'm new to program and although I understand the idea I'm behind on the langauage.

Thanks

One of the most popular algorithms for p erforming clustering is the k - means method. The algorithm depends on the notion of distance between two points. For points with only one dimension (just single values), we can define the distance between two points and as The k - means algorithm will work by placing points into clusters an d computing their centroids , which is defined as the average of the data points in the cluster. Specifi cally, the algorithm works as follows:

1. Pick k, the number of clusters.

2. Initialize clusters by picking one point (centroid) per cluster. F or this assignment, you can pick the first k points as initial centroids for each corresponding cluster .

3. For each point, place it in the cluster whose current centroid it is nearest.

4. After all points are assigned, update the locations of centroids of the k clusters

5. Reassign all points to their closest centroid. This sometimes moves points between clusters.

6. Repeat 4,5 until convergence. Convergence occurs when points don t move between clusters and centroids stabilize.

Requirements Y ou are to create a program using Python that does the following:

1. Ask s the user for a filename which contains the point data which is to be clustered (see Data File Format section for details) .

2. Ask s the user for the name of the output file.

3. Ask s the user for the number of clusters. This is the parameter k that will be used for k - means.

4. Read the input file a nd stores the points into a list 5. Appl ies the k - means algorithm to find the cluster for each point.

6. Display s the points that each cluster contains after each iteration of the algorithm

7. W r ite s the final cluster assignments t o the output file. YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, P ANDAS, ... ) - NO IMPORT STATEMENTS .

Additional Requirements

1. The name of your source code file should be kMeans .py . All your code should be within a single file.

2. Your code should follow good coding practices, including good use of whitespace and use of both inline and block comments.

3. You need to use meaningful identifier names that conform to standard naming conventions.

4. At the top of each file, you need to put in a block comment with the following information: your name, date, course name, semester, and assignment name.

5. The output of your program should exactly match the sample program output given at the end. Data File Format Le t N be the number of points and Pi to be the value o f point i . The input file should be of the following format: P1 P2 ... PN Example: 1.2 2.1 4.56 2.113 2.2

Sample Program Output

70

-

510, [semester] [year]

NAME: [put your name here]

PROGRAMMING ASSIGN

MENT #2

Enter the name of the input file: prog2

-

input

-

d

ata.txt

Enter the name of the output file: prog2

-

output

-

data.txt

Enter the number of clusters: 5

Iteration 1

0 [1.8]

1 [4.5, 6.5]

2 [1.1, 0.5]

3 [2.1, 3.2]

4 [9.8, 7.6, 11.32]

Iteration 2

0 [1.8, 2.1]

1 [4.5, 6.5]

2 [1.1, 0.5]

3 [3.2]

4 [9.8, 7.6, 11.32]

Iteration 3

0 [1.8, 2.1]

1 [4.5, 6.5]

2 [1.1, 0.5]

3 [3.2]

4 [9.8, 7.6, 11.32]

Output File Contents

Point 1.8 in cluster 0

Point 4.5 in cluster 1

Point 1.1 in cluster 2

Point 2.1 in cluster 0

Point 9.8 in cluster 4

Point 7.6 in cluster 4

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions