Question

1 Approved Answer

Posted on Sep 22, 2024

*****Python 2.7, the text is clear and here is an image, don't tell me unclear context. I pay for this service, and you're ripping me

*****Python 2.7, the text is clear and here is an image, don't tell me unclear context. I pay for this service, and you're ripping me off, the question has been answered on here once, do your job, or i want a refund!****

image text in transcribed

The input data file (prog2-input-data.txt) is

1.8 4.5 1.1 2.1 9.8 7.6 11.32 3.2 0.5 6.5

(The output would be - prog2-output-data.txt)

W r ite s the final cluster assignments to the output file. YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, P ANDAS, ... ) - NO IMPORT STATEMENTS

Fromt the gudianc i got yesterday. I'm assuming this is what no Import statements means.

import os.path

import sys

Any ideas on this would be great.

Thanks

One of the most popular algorithms for p erforming clustering is the k - means method. The algorithm depends on the notion of distance between two points. For points with only one dimension (just single values), we can define the distance between two points and as The k - means algorithm will work by placing points into clusters an d computing their centroids , which is defined as the average of the data points in the cluster. Specifi cally, the algorithm works as follows:

1. Pick k, the number of clusters.

2. Initialize clusters by picking one point (centroid) per cluster. F or this assignment, you can pick the first k points as initial centroids for each corresponding cluster .

3. For each point, place it in the cluster whose current centroid it is nearest.

4. After all points are assigned, update the locations of centroids of the k clusters

5. Reassign all points to their closest centroid. This sometimes moves points between clusters.

6. Repeat 4,5 until convergence. Convergence occurs when points don t move between clusters and centroids stabilize.

Requirements Y ou are to create a program using Python that does the following:

1. Ask s the user for a filename which contains the point data which is to be clustered (see Data File Format section for details) .

2. Ask s the user for the name of the output file.

3. Ask s the user for the number of clusters. This is the parameter k that will be used for k - means.

4. Read the input file a nd stores the points into a list

5. Appl ies the k - means algorithm to find the cluster for each point.

6. Display s the points that each cluster contains after each iteration of the algorithm

7. W r ite s the final cluster assignments t o the output file. YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, P ANDAS, ... ) - NO IMPORT STATEMENTS .

Additional Requirements

1. The name of your source code file should be kMeans .py . All your code should be within a single file.

2. Your code should follow good coding practices, including good use of whitespace and use of both inline and block comments.

3. You need to use meaningful identifier names that conform to standard naming conventions.

4. At the top of each file, you need to put in a block comment with the following information: your name, date, course name, semester, and assignment name.

5. The output of your program should exactly match the sample program output given at the end.

Data File Format Le t N be the number of points and Pi to be the value o f point i .

The input file should be of the following format: P1 P2 ... PN Example: 1.2 2.1 4.56 2.113 2.2 Sample

Program Output

70 - 510, [semester] [year]

NAME: [put your name here]

PROGRAMMING ASSIGN MENT

#2 Enter the name of the input file: prog2 - input - d ata.txt

Enter the name of the output file: prog2 - output - data.txt

Enter the number of clusters: 5

Iteration 1 0 [1.8] 1 [4.5, 6.5] 2 [1.1, 0.5] 3 [2.1, 3.2] 4 [9.8, 7.6, 11.32]

Iteration 2 0 [1.8, 2.1] 1 [4.5, 6.5] 2 [1.1, 0.5] 3 [3.2] 4 [9.8, 7.6, 11.32]

Iteration 3 0 [1.8, 2.1] 1 [4.5, 6.5] 2 [1.1, 0.5] 3 [3.2] 4 [9.8, 7.6, 11.32]

Output File Contents Point 1.8 in cluster 0 Point 4.5 in cluster 1 Point 1.1 in cluster 2 Point 2.1 in cluster 0 Point 9.8 in cluster 4 Point 7.6 in cluster 4

image text in transcribed

70-511: Statistical Propramming ProgianmingAssignrien 2-k-Means Clustering You are to creale a prograrn usirng Python hat does the fullowing: Asks the user for a filename which contains the point data which is to be clustered (see Data File Format section for details). 1. 2. Asks the user for the narme af the output file. 3 Asks the user for the number of dusters. This is the parameter kthat will be used for k means. 4. Read the Input file and stores the points Into a list 5. Applies the k-means alporithm to find the cluster for each point. G. Displays the polints that cach duster contains after cach itcration of the algorithm . Writes the final cluster assignments to the output file. Clustering is a process of identifying groupinsi.e. clusters within the data. For example, the fipure below shows three clusters of two dimensional data points (Xs]: YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, PANDAS, .) NO IMPORT STATEMENTS. The name of your source code file should he kMeans. py. All your code shauld be within a single file. Your code should follow good coding practices, including good usc of whitespacc and usc of hoth inline and block comments You need to use meaningful identifier names that conform to standard namine conventions. At the top of cach file, you need to put In a block comment with the following Information: your name, date, course name, semester, and assignment name. The output of your program should exactly match the sample program output given at the end. 1. 2. 3. 4. 5. xxX Data File Format Let N be the number of points and Pi to he the value of point i. The input file shauld be of the following format: P1 P2 Clustering has many applications, including inferring population structures from genetic data, recognizing communities within social networks, or segmenting of customers for market research. PN One of the most popular algorithms for performing clustering is the k-means method. The algorithm depends on the natian of distance hetween two points. For points with only one dimension (just single values), we can define the distance between two points p and q as Example: 1.2 2.1 4.56 2.113 What to Turn In You will turn in the single kMeans.py file using BlackBoard The k-means algorithm will work by placing points into clusters and computing their centroids, which is defined as the average of the data points in the cluster. Specifically, the alporithm works as follows: HINTS 1. Pick k, the number of clusters. 2. Initialize dusters by picking one point [centroid) per cluster. For this assigriment, you can pick the firstk Make use of list comprehensions for reading lines from a file and then converting the strings into a list of floats points as initial centroids for each corresponding dluster 3. For cach point, place it in the cluster whose current centrold it is nearest. . After all points are assigned, update the locations of centroids of the k clusters 5. Reassign all points to their closest centroid. This sometimes moves points between clusters 6. Repeat 4,5 untill convergence, Convergence occurs when points don't move between clusters and .Use pwdl to check the directory where you should place your input file. . Use a dict data structures for storing centrolds and clusters. The centrolds dict will be a mapping from cluster number to centroids. The clusters dict will be a mapping from cluster number to a list of points in the cluster centroids stabilize