Question: The program must provide following functions to extract some statistics. Note that thedata_list parameter specified in these functions may be the same for all functions
The program must provide following functions to extract some statistics. Note that thedata_list parameter specified in these functions may be the same for all functions or different for different functionsthat is your choice. A skeleton file is provided on Mirmir.
a) open_file()promptstheusertoenterayearnumberforthedatafile.Theprogramwill check whether the year is between 1990 and 2015 (both inclusive). If year number is valid, the program will try to open data file with file name yearXXXX.txt, where XXXX is the year. Appropriate error message should be shown if the data file cannot be opened or if the year number is invalid. This function will loop until it receives proper input and successfully opens the file. It returns a file pointer and year.
i. Hint: use string concatenation to construct the file name b) read_file(fp)has one parameter, a file pointer read. This function returns a list of your
choosing containing data you need for other parts of this project. c) find_average(data_list)takesalistofdata(ofsomeorganizationofyour
choosing) and returns the average salary. The function does not print anything. Hints:
This is NOT (!) the average of the last column of data. It is not mathematically valid to
find an average by finding the average of averagesfor example, in this case there are
many more in the lowest category than in the highest category.
How many wage earners are considered in finding the average (denominator)? There
are a couple of ways to determine this. I think the easiest uses the cumulative number column (Column 4), but using Column 3 is not hard and may make more sense to some students.
How does one find the total dollar value of income (numerator)? Notice that Column 6 is the combined income of all the individuals in this range of income.
For testing your function notice that for the 2014 data the average should be $44,569.20. As a check, note that that value is listed on the web page referenced above.
d) find_median(data_list) takes a list of data (of some organization of your choosing) and returns the median income. The function does not print anything. Unfortunately, this file of data is not sufficient to find the true median so we need to approximate it.
Here is the rule we will use: find the data line whose cumulative percentage (Column 5)
is closest to 50% and return its average income (Column 7). If both data lines are
equally close, return either one.
Hint: Pythons abs() function (absolute value) is potentially useful here.
Hint: your get_range() function should be useful here.
For testing your function, using our rule the median income for the 2014 data is
$27,457.00
e) get_range(data_list, percent) takes a list of data (of some organization of your
choosing) and a percent (float) and returns the salary range as a tuple (Columns 0 and 2) for the data line whose cumulative percentage (Column 5) is greater than or equal to thepercent parameter, the cumulative percentage value (Column 5) and the average income (Column 7). Stated another way: ((col_0,col_2),col_5,col_7) The function does not print anything.
i. For testing using the 2014 data and a percent value of 90 your function will return
((90000.0, 94999.99), 90.80624, 92420.5)
f) get_percent(data_list, income) takes a list of data (of some organization of your choosing) and an income (float) and returns the cumulative percentage (Column 5) for the data line that the specified income is in the income range (Columns 0 and 2), and income range (Columns 0 and 2) . Stated another way: ((col_0,col_2),col_5) The function does not print anything.
i. For testing using the 2014 data and an income value of 150,000 your function will return
((150000.0, 154999.99), 96.87301) g) do_plot(x_vals,y_vals,year)providedbyustakestwoequal-lengthlistsof
numbers and plots them. Note that if you plot the whole file of data, the income ranges are so skewed that the result is a nearly vertical plot at the leftmost edge so close to the edge that you cannot see it in the plotit looks like nothing was plotted. Plotting the lowest 40 income ranges results in a more easily readable plot.
2. main()
a) Open the data file
b) Read the data file (using the file pointer from the opened file).
c) Print the year, the average income, and the median income (and a header). Here is the
output format that I used: "{:<6d}${:<14,.2f}${:<14,.2f}"
d) Prompt whether to plot the data and if yes, plot the data: cumulative percentage (Column
5) vs. income (Column 0) only the lowest 40 income ranges.
e) Loop, prompting for either r for range , p for percent, or nothing
i. r: prompt for a percent (float) and output the income that is below that percent. Print an
error message, if an invalid number is entered (a percent must be between 0 and 100).
Here is the output format that I used:
"{:4.2f}% of incomes are below ${:<13,.2f}." ii.p: prompt for an income (float) and output the percent that earn more. Print an error
message, if an invalid income is entered (income must be positive). Here is the output format that I used:
"An income of ${:<13,.2f} is in the top {:4.2f}% of incomes."
iii. if only a carriage-return is entered, halt the program. 3. Call main() using
if __name__ == "__main__": main()
Initial Program:
import pylab def do_plot(x_vals,y_vals,year): '''Plot x_vals vs. y_vals where each is a list of numbers of the same length.''' pylab.xlabel('Income') pylab.ylabel('Cumulative Percent') pylab.title("Cumulative Percent for Income in "+str(year)) pylab.plot(x_vals,y_vals) pylab.show() def open_file(): '''You fill in the doc string''' year_str = input("Enter a year where 1990 <= year <= 2015: ") pass # replace this line with your code def read_file(fp): '''You fill in the doc string''' pass # replace this line with your code def find_average(data_lst): '''You fill in the doc string''' pass # replace this line with your code def find_median(data_lst): '''You fill in the doc string''' pass # replace this line with your code def get_range(data_lst, percent): '''You fill in the doc string''' pass # replace this line with your code def get_percent(data_lst,salary): '''You fill in the doc string''' pass # replace this line with your code def main(): # Insert code here to determine year, average, and median print("For the year {:4d}:".format(year)) print("The average income was ${:<13,.2f}".format(avg)) print("The median income was ${:<13,.2f}".format(median)) response = input("Do you want to plot values (yes/no)? ") if response.lower() == 'yes': pass # replace this line # determine x_vals, a list of floats -- use the lowest 40 income ranges # determine y_vales, a list of floats of the same length as x_vals # do_plot(x_vals,y_vals,year) choice = input("Enter a choice to get (r)ange, (p)ercent, or nothing to stop: ") while choice: # Insert code here to handle choice choice = input("Enter a choice to get (r)ange, (p)ercent, or nothing to stop: ") if __name__ == "__main__": main()
Tests
There are unit tests for functions: find_average, find_median, get_range, and get_percent. The tests all call your read_file function to get your data structure to pass to those functions. The file read for these unit tests is the 2014 data.
Test 1 Enter a year where 1990 <= year <= 2015: 2014 Year Mean Median 2014 $44,569.20 $27,457.00 Do you want to plot values (yes/no)? no Enter a choice to get (r)ange, (p)ercent, or nothing to stop:
Test 2 Enter a year where 1990 <= year <= 2015: 2014 Year Mean Median 2014 $44,569.20 $27,457.00 Do you want to plot values (yes/no)? no Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r Enter a percent: 90 90.00% of incomes are below $90,000.00 . Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p Enter an income: 100000 An income of $100,000.00 is in the top 92.57% of incomes. Enter a choice to get (r)ange, (p)ercent, or nothing to stop:
Test 3 Enter a year where 1990 <= year <= 2015: xxxx Error in year. Please try again. Enter a year where 1990 <= year <= 2015: 1900 Error in year. Please try again. Enter a year where 1990 <= year <= 2015: 1999 Error in file name: year1999.txt Please try again. Enter a year where 1990 <= year <= 2015: 2015 Year Mean Median 2015 $46,119.78 $27,459.59 Do you want to plot values (yes/no)? no Enter a choice to get (r)ange, (p)ercent, or nothing to stop: x Error in selection. Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r Enter a percent: 104 Error in percent. Please try again Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r Enter a percent: -2 Error in percent. Please try again Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r Enter a percent: 90 90.00% of incomes are below $90,000.00 . Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p Enter an income: -20 Error: income must be positive Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p Enter an income: 100000 An income of $100,000.00 is in the top 92.03% of incomes. Enter a choice to get (r)ange, (p)ercent, or nothing to stop:
Test 4 Enter a year where 1990 <= year <= 2015: 2000 Year Mean Median 2000 $30,846.09 $17,471.75 Do you want to plot values (yes/no)? no Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r Enter a percent: 40 40.00% of incomes are below $15,000.00 . Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p Enter an income: 20000 An income of $20,000.00 is in the top 56.96% of incomes. Enter a choice to get (r)ange, (p)ercent, or nothing to stop:
Test 5 (not on Mirmir because this tests the plot TAs will run this test.) Enter a year where 1990 <= year <= 2015: 2015 Year Mean Median 2015 $46,119.78 $27,459.59 Do you want to plot values (yes/no)? yes Enter a choice to get (r)ange, (p)ercent, or nothing to stop:
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
