Question

1 Approved Answer

Posted on Aug 30, 2024

I keep getting a file not found in directory error, even when I save prog2-input-data.txt with kMeans.py. Please explain how to fix this error along

I keep getting a "file not found in directory" error, even when I save prog2-input-data.txt with kMeans.py. Please explain how to fix this error along with the homework. At the end there's a step by step hints.

The k-means algorithm will work by placing points into clusters and computing their centroids, which is defined as the average of the data points in the cluster. Specifically, the algorithm works as follows:

Pick k, the number of clusters.
Initialize clusters by picking one point (centroid) per cluster. For this assignment, you can pick the first k

points as initial centroids for each corresponding cluster.
For each point, place it in the cluster whose current centroid it is nearest.
After all points are assigned, update the locations of centroids of the k clusters
Reassign all points to their closest centroid. This sometimes moves points between clusters.
Repeat 4,5 until convergence. Convergence occurs when points dont move between clusters and

centroids stabilize.

Requirements

You are to create a program using Python that does the following:

Asks the user for the number of clusters. This is the parameter k that will be used for k-means.
Reads the input file (prog2-input-data.txt) and stores the points into a list
Applies the k-means algorithm to find the cluster for each point.
Displays the points that each cluster contains after each iteration of the algorithm
Writes the final cluster assignments to the screen and the output file (prog2-output-data.txt).

YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, PANDAS, ...) - NO IMPORT STATEMENTS.

Additional Requirements

The name of your source code file should be kMeans.py. All your code should be within a single file.
Your code should follow good coding practices, including good use of whitespace and use of both inline

and block comments.
You need to use meaningful identifier names that conform to standard naming conventions.
At the top of each file, you need to put in a block comment with the following information: your name,

date, course name, semester, and assignment name.
The output of your program should exactly match the sample program output given at the end. That is,

for same input, it should generate the same output. Note that I may use other test cases for grading your program and your code needs to work correctly in all cases.

Data File Format

Let N be the number of points and Pi to be the value of point i. The input file should be of the following format:

P1 P2 ... PN

Example:

1.2 2.1 4.56 2.113 2.2

The name of the input file is always:

prog2-input-data.txt

What to Turn In You will turn in a screenshot of your output and a single kMeans.py file using BlackBoard.

HINTS

Make use of list comprehensions for reading lines from a file and then converting the strings into a list of floats.
Use pwd() to check the directory where you should place your input file.
Use a dict data structures for storing centroids and clusters. The centroids dict will be a mapping from

cluster number to centroids. The clusters dict will be a mapping from cluster number to a list of points in the cluster.

Sample Program Output

DATA-51100, [semester] [year] NAME: [put your name here] PROGRAMMING ASSIGNMENT #2

Enter the number of clusters: 5

Iteration 1 0 [1.8] 1 [4.5, 6.5] 2 [1.1, 0.5] 3 [2.1, 3.2]

4 [9.8,

Iteration 2 0 [1.8, 2.1] 1 [4.5, 6.5] 2 [1.1, 0.5] 3 [3.2] 4 [9.8,

Iteration 3 0 [1.8, 2.1] 1 [4.5, 6.5] 2 [1.1, 0.5] 3 [3.2] 4 [9.8,

7.6,

11.32]

7.6,

Point 1.8 in Point 4.5 in Point 1.1 in Point 2.1 in Point 9.8 in Point 7.6 in Point 11.32 in cluster 4

Point 3.2 in Point 0.5 in Point 6.5 in

cluster 3 cluster 2 cluster 1

cluster 0 cluster 1 cluster 2 cluster 0 cluster 4 cluster 4

Output File Contents

Point 1.8 in cluster 0 Point 4.5 in cluster 1 Point 1.1 in cluster 2 Point 2.1 in cluster 0 Point 9.8 in cluster 4 Point 7.6 in cluster 4 Point 11.32 in cluster 4 Point 3.2 in cluster 3 Point 0.5 in cluster 2 Point 6.5 in cluster 1

Step by step

# Initialization

k-Means Clustering Step by Step Directions

1. Print header info to screen 2. Get input/output file names and number of clusters

3. Read file: -use open() and a list comprehension to strip all lines of ending char (using rstrip method) and convert to floats

4. Create variables to store centroids, clusters, and point assignments. Initially, pick one point (centroid) per cluster: -create and initialize a variable to store centroids for each cluster: a mapping (dict) from range(k) to data[0:k] -create and initialize another variable to store all points for each cluster: a mapping (dict) of range(k) to k empty lists -use zip when creating the dict

-create and initialize a dict mapping points to clusters -create a variable to store old point assignments (from previous iteration)

# Algorithm

5. Repeat the following: a) Save current point assignment into old point assignment variable (create a new dict from current assignment variable) b) Place each point in the closest cluster (you should make a function that does this) c) Update the locations of centroids of the k clusters (make a function for this also) d) Reinitialize the clusters variable to empty lists

# Output

6. Print the point assignments