Question
Could anyone help add to my python code? I now need to calculate the mean and median. In this programming assignment you are to extend
Could anyone help add to my python code? I now need to calculate the mean and median.
In this programming assignment you are to extend the program you wrote for Number Stats to determine the median and mode of the numbers read from the file.
You are to create a program called numstat2.py that reads a series of integer numbers from a file and determines and displays the following:
The name of the file.
The sum of the numbers.
The count of how many numbers are in the file.
The average of the numbers. The average is the sum of the numbers divided by how many there are.
The maximum value.
The minimum value.
The range of the values. The range is the maximum value minus the minimum value.
The median of the numbers.
The mode of the numbers.
The output from the program is to display the information described above using the following strings preceding the values. There is to be a space between the colon and the value.
File name: Sum: Count: Average: Maximum: Minimum: Range: Median: Mode:
At the end of one attempt at reading, or a successful read, of a file the user it to be asked if they would like to evaluate another file of numbers. Use the prompt: Would you like to evaluate another file? (y/n) If the user answers y, then the program is to accept input for another file name. If the user answers with anything other than y, the program is to exit.
The program you write must be able to handle four different circumstances for the file that contains the number data:
The count of the numbers in the file being even.
The count of the numbers in the file being odd.
The file containing only one number.
The file containing no numbers. (An empty file.) The user is to be told that no numbers could be found in the file.
The file containing an even or odd count of numbers is an issue when calculating the median which is discussed below in the Useful Information section.
Here is my code so far:
# import sys package
import sys
# loop begins
while True:
try:
# getting filename from the user
filename=input('Enter the name of the file ')
# opening the file
f=open(filename)
# variable for counting the integers
c=0
# variable to store the maximum number
maxi=-(sys.maxsize)
# variable to store the minimum number
mini=sys.maxsize
# variable to store the total
total=0
# variable to store the average
avg=0.0
# reading the file
for line in f:
# converting the line into integer
i=int(line.strip())
# incrementing the count
c+=1
# adding the number to the total variable
total+=i
# checking for maximum
if i > maxi:
maxi=i
# checking for minimum
if i < mini:
mini=i
# calculating average
avg=total/c
# printing the data
print('File name: %s'% filename)
print('Sum: %d' % total)
print('Count: %d' % c)
print('Average: %.2f' % avg)
print('Maximum: %d' % maxi)
print('Minimum: %d' % mini)
print('Range: %d' % (maxi-mini))
# closing the file
f.close()
except:
# prints when file not found
print('File not found')
# getting user choice of continuing or quitting
choice=input('Would you like to evaluate another file? (y/n) ')
choice=choice.lower()
# if the choice is y , then loop continues
if choice== 'y':
continue
#otherwise quits
else:
break
Useful Information
A list is necessary because we need to sort the list for determining the median and a list will be convenient for coming up with a way to determine the mode. In addition, python has min() and max() functions that can be used to determine the minimum and maximum values in the list rather than having to implement code to do it yourself.
The median is the middle value in an ordered list of values. If there is an odd number of values then it is the number at the middle position. If there is an even number of values then it is the average of the two values around the midpoint.
The mode is the number that is repeated most often. It is possible to have more than one mode.
Example 1
Numbers: 6, 6, 7, 8, 10, 11, 15, 15, 17, 17, 17
There are 11 numbers in this example. This means that the median is the middle value. The middle value is at (11+1)/2 = 6. The 11 is at the 6th position and is therefore the median. Note that to calculate the position of the middle value add 1 to the quantity of numbers and divide by 2.
The mode is 17 because it occurs the most frequently. The number 17 occurs three times. The numbers 15 and 6 occur two times. All the other numbers occur one time.
Example 2
Numbers: 5, 5, 6, 7, 8, 9, 9, 10, 10, 11
There are 10 numbers in this example. Because there are an even count of numbers, the median is the average of the two numbers around the middle. The middle position is (10+1)/2 = 5.5. The numbers at position 5 and position 6 need to be averaged. The number 8 is at position 5 and the number 9 is at position 6. The average of 8 and 9 is (8+9)/2 = 17/2 = 8.5.
The list of numbers has three modes: 5, 9, and 10. These three numbers occur two times. The other numbers occur one time.
Calculating the Median
To calculate the median the list of numbers needs to be ordered from lowest to highest. The numbers in the file are not ordered. The easiest way to order the numbers is to read them from the file and put them in a python list. The list object in python has a method that can be called on it to sort the list. If the variable for the python list is called number_list then use number_list.sort() to sort it.
The other piece of information needed to calculate the median is the length of the list. The len() function returns the length of a list. To get the length of number_list and store the value in a variable called count use:
count = len(number_list)
Calculating the Mode
To determine the mode we need to know how frequently each number occurs in the list of numbers. What is needed is something that looks like a two column table where one column holds the numbers and the other column holds how frequently each of the numbers occurs.
Example:
Number | Frequency |
5 | 2 |
6 | 1 |
7 | 1 |
8 | 1 |
9 | 2 |
10 | 2 |
11 | 1 |
What is needed is called a dictionary in Python. In the above table the numbers are keys in the dictionary and the frequencies are values. Dictionaries are formed out of key / value pairs.
To create a new, empty dictionary use {}. To create a dictionary called number_counts use:
number_counts = {}
To fill the dictionary, you are going to have to step through the list of numbers read from the file and:
See if the number is already in the dictionary.
If it is, you need to increment the count stored in the dictionary for that number.
If it is not, you need to create an entry in the dictionary and set the count to 1.
To step through a list of numbers in a variable number_list we can use:
for number in number_list:
In the for loop number holds the number that has been read from number_list for that iteration of the loop. The for loop ends when there are no more numbers to read.
To determine if a key is in a dictionary use:
if key in dictionary:
See page 371 (Section 9.1) of the textbook: Using the in and not in Operators to Test for a Value in a Dictionary.
If we have a dictionary called number_counts then we can check if a number is already in the dictionary by using:
if number in number_counts:
If the number is already in the dictionary we can increment the count of how many there are of that number by:
number_counts[number] += 1
If the number is not already in the dictionary we can create an entry in the dictionary and set the count to 1 by:
number_counts[number] = 1
Once we have gone through all the numbers in number_list we have a dictionary, number_counts, that holds the information about how many times each number appeared in the list.
The final steps in determining the mode is figuring out the maximum count and what numbers are associated with that maximum count.
Remember that number_counts is a dictionary and is comprised of key / value pairs. To step through the dictionary we can get the keys (the numbers) one at a time by using:
for number in number_counts:
For each number (dictionary key) obtained from the dictionary we can get the count (dictionary value) by:
count = number_counts[number]
The goal is to determine the maximum count, so each count has to be compared to a variable that holds the maximum and if the count is greater than the current maximum the maximum is set to the count that is the new maximum.
Once the maximum count is determined then it used to look in the dictionary for the numbers (dictionary keys) that are associated with that count (dictionary values). Those numbers are the mode values. To do this step through the dictionary, as shown before, comparing each count to the maximum count. If they match, then the number associated with that count is a mode value.
Remember, this steps you through the dictionary:
for number in number_list:
This gets the count from the dictionary associated with the number:
count = number_counts[number]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started