Question: I keep having a file not found in directory error even though i have the file saved where the .py file is saved. please show

I keep having a "file not found in directory error" even though i have the file saved where the .py file is saved. please show how i am able to fix this error.

please show how i am able to fix this error. Clustering is

a process of identifying groupings (i.e. clusters) within the data. For example,

the figure below shows three clusters of two-dimensional data points (Xs): Clustering

has many applications, including inferring population structures from genetic data, recognizing communities

within social networks, or segmenting of customers for market research. One of

the most popular algorithms for performing clustering is the k-means method. The

Clustering is a process of identifying groupings (i.e. clusters) within the data. For example, the figure below shows three clusters of two-dimensional data points (Xs):

Clustering has many applications, including inferring population structures from genetic data, recognizing communities within social networks, or segmenting of customers for market research.

One of the most popular algorithms for performing clustering is the k-means method. The algorithm depends on the notion of distance between two points. For points with only one dimension (just single values), we can define the distance between two points and as

(, ) = | |

The k-means algorithm will work by placing points into clusters and computing their centroids, which is defined as the average of the data points in the cluster. Specifically, the algorithm works as follows:

1. Pick k, the number of clusters.

2. Initialize clusters by picking one point (centroid) per cluster. For this assignment, you can pick the first k

points as initial centroids for each corresponding cluster.

3. For each point, place it in the cluster whose current centroid it is nearest.

4. After all points are assigned, update the locations of centroids of the k clusters

5. Reassign all points to their closest centroid. This sometimes moves points between clusters.

6. Repeat 4,5 until convergence. Convergence occurs when points dont move between clusters and

centroids stabilize.

Requirements

You are to create a program using Python that does the following:

1. Asks the user for the number of clusters. This is the parameter k that will be used for k-means.

2. Reads the input file (prog2-input-data.txt) and stores the points into a list

3. Applies the k-means algorithm to find the cluster for each point.

4. Displays the points that each cluster contains after each iteration of the algorithm

5. Writes the final cluster assignments to the screen and the output file (prog2-output-data.txt).

YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, PANDAS, ...) - NO IMPORT STATEMENTS.

Additional Requirements

1. The name of your source code file should be kMeans.py. All your code should be within a single file.

2. Your code should follow good coding practices, including good use of whitespace and use of both inline

and block comments.

3. You need to use meaningful identifier names that conform to standard naming conventions.

4. At the top of each file, you need to put in a block comment with the following information: your name,

date, course name, semester, and assignment name.

5. The output of your program should exactly match the sample program output given at the end. That is,

for same input, it should generate the same output. Note that I may use other test cases for grading your program and your code needs to work correctly in all cases.

Data File Format

Let N be the number of points and Pi to be the value of point i. The input file should be of the following format:

P1 P2 ... PN

Example:

1.2 2.1 4.56 2.113 2.2

The name of the input file is always:

prog2-input-data.txt

What to Turn In

You will turn in a screenshot of your output and a single kMeans.py file using BlackBoard.

HINTS

Make use of list comprehensions for reading lines from a file and then converting the strings into a list of floats.

Use pwd() to check the directory where you should place your input file.

Use a dict data structures for storing centroids and clusters. The centroids dict will be a mapping from

cluster number to centroids. The clusters dict will be a mapping from cluster number to a list of points in the cluster.

Sample Program Output

DATA-51100, [semester] [year] NAME: [put your name here] PROGRAMMING ASSIGNMENT #2

Enter the number of clusters: 5

Iteration 1 0 [1.8]

1 [4.5, 6.5] 2 [1.1, 0.5] 3 [2.1, 3.2]

4 [9.8,

Iteration 2 0 [1.8, 2.1] 1 [4.5, 6.5] 2 [1.1, 0.5] 3 [3.2]

4 [9.8,

Iteration 3 0 [1.8, 2.1] 1 [4.5, 6.5] 2 [1.1, 0.5] 3 [3.2]

4 [9.8,

7.6,

11.32]

7.6,

Point 1.8 in

Point 4.5 in

Point 1.1 in

Point 2.1 in

Point 9.8 in

Point 7.6 in

Point 11.32 in cluster 4

Point 3.2 in Point 0.5 in Point 6.5 in

cluster 3

cluster 2

cluster 1

cluster 0

cluster 1

cluster 2

cluster 0

cluster 4

Output File Contents

Point 1.8 in cluster 0 Point 4.5 in cluster 1 Point 1.1 in cluster 2 Point 2.1 in cluster 0 Point 9.8 in cluster 4 Point 7.6 in cluster 4 Point 11.32 in cluster 4 Point 3.2 in cluster 3 Point 0.5 in cluster 2 Point 6.5 in cluster 1

# Initialization

1. Print header info to screen

2. Get input/output file names and number of clusters

3. Read file:

-use open() and a list comprehension to strip all lines of ending char (using rstrip method) and convert to floats

4. Create variables to store centroids, clusters, and point assignments. Initially, pick one point (centroid) per cluster: -create and initialize a variable to store centroids for each cluster: a mapping (dict) from range(k) to data[0:k] -create and initialize another variable to store all points for each cluster: a mapping (dict) of range(k) to k empty lists -use zip when creating the dict

-create and initialize a dict mapping points to clusters

-create a variable to store old point assignments (from previous iteration)

# Algorithm

5. Repeat the following:

a) Save current point assignment into old point assignment variable (create a new dict from current assignment variable)

b) Place each point in the closest cluster (you should make a function that does this)

c) Update the locations of centroids of the k clusters (make a function for this also)

d) Reinitialize the clusters variable to empty lists

# Output

6. Print the point assignments

# Initialization

1. Print header info to screen

Use print function.

DETAILED HINTS

2. Get input/output file names and number of clusters

Use raw_input. For number of clusters (k), make sure to use int() to convert to an integer.

3. Read file:

-use open() and a list comprehension to strip all lines of ending char (using rstrip method) and convert to floats

I basically showed this in the video:

data = [float(x.rstrip()) for x in open(input_file)]

4. Create variables to store centroids, clusters, and point assignments. Initially, pick one point (centroid) per cluster:

-create and initialize a variable to store centroids for each cluster: a mapping (dict) from range(k) to data[0:k]

centroids = dict(zip(range(k),data[0:k])

-create and initialize another variable to store all points for each cluster: a mapping (dict) of range(k) to k empty lists -use zip when creating the dict

clusters = dict(zip(range(k),[[] for i in range(k)]))

-create and initialize a dict mapping points to clusters

First, think about how points are represented - as numbers in a list, with each number having an index value. So you can do a mapping between each of the index values (0..k-1) and the cluster to which the point having this index value belongs. Initially, the points won't belong to any cluster, so you can just map to dummy values. Later, you will update these with appropriate values (after each iteration).

-create a variable to store old point assignments (from previous iteration)

Since there is no previous assignment on first iteration, just assign an empty dict to an old_point_assignments variable

# Algorithm

5. Repeat the following:

a) Save current point assignment into old point assignment variable (create a new dict from current assignment variable)

This should create a copy of the dict holding the point assignments into the old_point_assignments variable

b) Place each point in the closest cluster (you should make a function that does this)

Make a function called assign_to_clusters that takes as input: data, clusters, centroids, and point_assignments. The function should go through each point and index of that point in data (use enumarate()) and find the closest centroid. Then add that point to the list of points for that clusters in the clusters variable. Also, do:

point_assignments[j] = closest_index

c) Update the locations of centroids of the k clusters (make a function for this also)

Make a function that takes as input: data, clusters, and centroids. It should go through each list contained in clusters variable and recompute the centroid by averaging over all the points in that list (use sum() and len() functions). After computing, update the corresponding centroids value.

d) Reinitialize the clusters variable to empty lists

clusters = dict(zip(range(k),[[] for i in range(k)]))

# Output

6. Print the point assignments

Go through all values in point_assignments and print them.

12:43 1 .LED x DATA511-Prog2.pdf . dem 12:43 . LTED x DATA511-Prog2.pdf ... Enter the be s t 119...6.11.12) 14.6. 419., 7., 11.12) Pont 1. in cluster 12:43 . LTED x DATA511-Prog2.pdf ... Enter the be s t 119...6.11.12) 14.6. 419., 7., 11.12) Pont 1. in cluster 12:43 . LTED DATA511-Prog2.pdf ... 3 of 3 Muhe outcomes fording to the w t you are Enter the others 419. 7.6. 11.) 19., 7., 11.12) 12:43 all LTE X prog2-input-data.txt 1.8 4.5 1.1 2.1 9.8 7.6 11.32 3.2 0.5 6.5 12:44 . LTED X prog2-kmeans-stepbystep.pdf ... www.and alcohol inghe ingaw a ch 12:44 .LTED X prog2-kmeans-stepbystep.pdf ... DETAILED HINTS 12:44 ..1 LTED X prog2-kmeans-stepbystep.pdf ... 3 of 3 12:43 1 .LED x DATA511-Prog2.pdf . dem 12:43 . LTED x DATA511-Prog2.pdf ... Enter the be s t 119...6.11.12) 14.6. 419., 7., 11.12) Pont 1. in cluster 12:43 . LTED x DATA511-Prog2.pdf ... Enter the be s t 119...6.11.12) 14.6. 419., 7., 11.12) Pont 1. in cluster 12:43 . LTED DATA511-Prog2.pdf ... 3 of 3 Muhe outcomes fording to the w t you are Enter the others 419. 7.6. 11.) 19., 7., 11.12) 12:43 all LTE X prog2-input-data.txt 1.8 4.5 1.1 2.1 9.8 7.6 11.32 3.2 0.5 6.5 12:44 . LTED X prog2-kmeans-stepbystep.pdf ... www.and alcohol inghe ingaw a ch 12:44 .LTED X prog2-kmeans-stepbystep.pdf ... DETAILED HINTS 12:44 ..1 LTED X prog2-kmeans-stepbystep.pdf ... 3 of 3

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

I keep getting a "file not found in directory" error, even when I save prog2-input-data.txt with kMeans.py. Please explain how to fix this error along with the homework. At the end there's a step by...

// java //1.Class Q3_Ticket_manager package ticket; import java.util.TreeMap; public class Q3_Ticket_Manager { public static void main(String[] args) { new Q3_Ticket_Manager().manage(); } //...

java language Concept Application & Algorithmic Part Question 1: (20 points) 1. Concept Application: A COVID 19 vaccination clinic allows the patients to receive the vaccine if they are 15 years old...

Content of "commands.txt" is written below (can't upload the text file here) STARTUP DISPLAY_ALL_Info Content of "intialPatientInformation.txt" is written below (can't upload the text file here) 7...

RMIT UNIVERSITY Programming Fundamentals (COSC2531) Assignment 2 Individual assignment (no group work). Submit online via Canvas/Assignments/Assignment 2. Marks are awarded per rubric (please see the...

Needing ANSWERS ASAP! Starting at pg 34 - Labeled Graded Project 06155200: Graded Project Instructions & Worksheets 1 Lesson 1: Business, Accounting, and You PROJECT GOAL The goal of this graded...

This is a rather long Java project for a tower defense game, but my Enemy objects only spawn in the top-left of the game area and my Tower objects can't be placed from the menu. Please show me any...

accounting general ledger ...attached is the information The goal of this graded project is to create the following financial statements for J & L Accounting, Inc.: Balance sheet Income statement...

ttth Suppose that the sequence of bags {Bn | n N} is recursively enumerated by the computable function e(n, x) = fn(x), [7 marks] Hence prove that the set of all recursive bags cannot be recursively...

87 Chapter 7 Sequencing MIDI Overview I n this chapter we will examine advanced MIDI sequencing concepts using Ableton Live, a powerful digital audio workstation. This chapter will also serve as a...

In an experiment, m grams of a compound X (gas/liquid/solid) taken in a container is loaded in a balance as shown in figure I below. In the presence of a magnetic filed, the pan with X is either...

The long-term competitiveness of most manufacturers depends on their product development capabilities. Yet most companies' development process is unruly and unfocused, with a collection of projects...

Holding period return ( HPR ) can be measured as . . . a . Beginning value of investment divided by ending value of investment. b . Beginning value of investment multiplied by ending value of...

Ethnocentrism is the view that the moral correctness of an action depends on whatever one's own culture deems appropriate. True False

2. How is communication defi ned?

=+Understand the different types of personal brands in social media

3. What communication processes and skills are relevant in all contexts?