Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

CSE 231 Spring 2021 Computer Project #07 Assignment Overview This assignment focuses on the implementation of Python programs to read files and process data by

CSE 231 Spring 2021

Computer Project #07

Assignment Overview

This assignment focuses on the implementation of Python programs to read files and process data by using lists and functions.

It is worth 55 points (5.5% of course grade) and must be completed no later than 11:59 PM on Monday, March 15.

Assignment Deliverable

The deliverable for this assignment is the following file:

proj07.py the source code for your Python program

Be sure to use the specified file name and to submit it for grading via Mimir before the project deadline.

Assignment Background

One commonly hears reference to the one percent referring to the people whose income is in the top 1% of incomes. What is the data behind that number and where do others fall? Using the National Average Wage Index (AWI), an index used by the Social Security Administration to gauge individual's earnings for the purpose of calculating their retirement benefit, we can answer such questions.

In this project, you will process AWI data. Example data for 2019 is provided in the file year2019.txt (2019 is the most recent year of complete data). The data is a table with the first row as the title and the second row defining the data fields; remaining rows are data. The URL for the data is: https://www.ssa.gov/cgi-bin/netcomp.cgi?year=2019

Here is the second line of data from the file followed by descriptions of the data. Notice that some data are ints and some are floats:

5,000.00 9,999.99 12,620,757 32,801,513 19.37150 93,403,927,820.81 7,400.82

Column 0 is bottom of this income range.

Column 1 is the dash separating the bottom of the range from the top (see note below).

Column 2 is the top of this income range (see note below).

Column 3 is the number of individuals in the income range.

Column 4 is the cumulative number of individuals in this income range and all lower ranges.

Column 5 is the Column 4 value represented as a cumulative percentage of all individuals.

Column 6 is the combined income of all the individuals in this range of income.

Column 7 is the average income of individuals in this range of income.

Note: The final row of the file is different than all the others. You must account for that.

Assignment Specifications

The program must provide following functions to extract some statistics.

a) def open_file(): Prompts the user to enter a year number for the data file. The program will check whether the year is between 1990 and 2019 (both inclusive). If year number is valid, the program will try to open data file with file name yearXXXX.txt, where XXXX is the year. Appropriate error message should be shown if the data file cannot be opened or if the year number is invalid. The year is invalid if it is not a number between 1990 and 2019, inclusively. The invalid year error is shown in this case. If the loop is correct but the file does not exist, the other error will be output. This function will loop until it receives proper input and successfully opens the file. It returns a file pointer and year. Hint: use string concatenation to construct the file name. i. Parameters: None

ii. Display: prompt and error message

iii. Return: file pointer and int

b) def handle_commas(s,T) int or float or None

The parameters are s, a string, and T, a string. The expected values of T are int and float; any other value returns None. If the value of T is int, the string s will be converted to an int and that int value will be returned. Similar for float. If a value of s cannot be converted to an int or float, None will be returned (hint: use try-except). Note: this is the same function we had in Project 5.

i. Parameters: str, str

ii. Display: nothing

iii. Returns: int or float or None

c) def read_file(fp): The function uses the file pointer parameter to read the data file. This function returns a list of tuples where each tuple is the data on one line of the file, and is a mix of ints and floats as follows: tup = ((float, float), int, int, float, float, float)

the tuple is filled with the following data:

( (column 0, column 2), column 3, column 4, column 5, column 6, column 7)

Note that the numbers have commas that you should handle (Hint: use the handle_commas function). There are also two header lines to skip. Also, the last line of the file has words where data is supposed to be. Find which column this affects, and record that column as None

i. Parameter: file pointer

ii. Display: nothing

iii. Return: list of tuples

d) def get_range(data_list, percent): Takes a list of data (output from the read_file function) and a percent and returns data for the first data line whose cumulative percentage (Column 5 in the data file) is greater than or equal to the percent parameter. The function should return a tuple of the salary range

i. Parameters: list of tuples, float

(Columns 0 and 2 in the file data) the cumulative percentage value (Column 5 in the data file) and the average income (Column 7 in the data file): ( (column 0, column 2), column 5, column 7) For testing using the 2014 data and a percent value of 90 your function will return ((90000.0, 94999.99), 90.80624, 92420.5) ii. Display: nothing

iii.Return: tuple

e) def get_percent(data_list, income): Takes a list of data (output from the read_file function) and an income and returns the income range (Columns 0 and 2 in the file) that the specified income is in the income range (Columns 0 and 2 in the file) and the corresponding cumulative percentage (Column 5 in the file).( (column 0, column 2), column 5 ) For testing using the 2014 data and an income value of 150,000 your function will return ((150000.0, 154999.99), 96.87301) i. Parameters: list of tuples, float

ii. Display: nothing

iii. Return: tuple

f) def find_average(data_list): Takes a list of data (output from the read_file function) and returns the average salary. Round the result to cents (i.e. two decimal places) before returning the value.

Hints:

i. This is NOT (!) the average of the last column of data. It is not mathematically valid to find an average by finding the average of averagesfor example, in this case there are many more in the lowest category than in the highest category.

ii. How many wage earners are considered in finding the average (denominator)? There are a couple of ways to determine this. I think the easiest uses the cumulative number column (Column 4 in the file), but using Column 3 is not hard and may make more sense to some students.

iii. How does one find the total dollar value of income (numerator)? Notice that Column 6 in the file is the combined income of all the individuals in this range of income.

For testing your function notice that for the 2014 data the average should be $44,569.20. That value is listed on the web page referenced above.

iv. Parameters: list of tuples

v. Display: nothing

vi. Return: float # rounded to two decimal places

g) def find_median(data_list): Takes a list of data (output from the read_file function) and returns the median income. Unfortunately, this file of data is not sufficient to find the true median so we need to approximate it. i. Here is the rule we will use: find the data line whose cumulative percentage (Column 5) is closest to 50% and return its average income (Column 7). If two data lines are equally close, return the smaller.

ii. Hint: Pythons abs() function (absolute value) is potentially useful here.

iii. Hint: your get_range() function should be useful here. The get_range() function returns the first tuple where the cumulative percentage is higher than a particular percentage. For the median the percentage is 50%. However, we need to find the closest to 50% (the closest could be higher or lower than 50%) Think about what number is below 50, and what would happen if you use get_range() with that number.

iv. For testing your function, using our rule, the median income for the 2014 data is $27,457.00

v. Parameters: list of tuples

vi. Display: nothing

vii. Return: float

h) def do_plot(x_vals,y_vals,year) provided by us takes two equal-length lists of numbers and plots them. Note that if you plot the whole file of data, the income ranges are so skewed that the result is a nearly vertical plot at the leftmost edge so close to the edge that you cannot see it in the plotit looks like nothing was plotted. Plotting the lowest 40 income ranges results in a more easily readable plot.

i) def main():

a) Open the file

b) Print the year.

c) Read the file

d) Print the average income.

e) Print the median income.

f) Prompt for plotting (yes/no). If yes, plot the data: cumulative percentage (Column 5 in the file (y values)) vs. income (Column 0 in the file (x values)). Call the do_plot() function to plot the data. Plot the lowest 40 income ranges.

g) Loop, prompting for either r for range , p for percent, or nothing i. r: prompt for a percent and output the income that is below that percent. The percent needs to be valid (between 0 and 100 inclusive). Hint: Call the get_range() function to get the range of income about that percentage. The bottom income range is what we are looking for.

ii. p: prompt for an income and output the percent that earned more. The income needs to be valid (positive). Hint: Call the get_percent() function to get the corresponding cumulative percentage.

iii. if only a carriage-return is entered, halt the program This is a new and different requirement. Hint: if someone simply hits the Enter key, what will be the value input?

Assignment Notes

1. Items 1-9 of the Coding Standard will be enforced for this project.

2. Files for year2000.txt, year2014.txt and year2019.txt are provided so that you can test your program.

3. For output you need to insert commas. There is a format specification, e.g. if you might have formatted a floating-point value without commas as {:<12.2f} you can simply insert a comma before the dot as in {:<12,.2f}.

Sample Output

Test 1

Enter a year where 1990 <= year <= 2019: 2019

For the year 2019:

The average income was $51,916.27

The median income was $32,452.59

Do you want to plot the data (yes/no): no

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r

Enter a percent: 90

90.00% of incomes are below $100,000.00 .

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p

Enter an income: 100000

An income of $100,000.00 is in the top 90.01% of incomes.

Enter a choice to get (r)ange, (p)ercent, or nothing to stop:

Test 2 (no plotting)

Enter a year where 1990 <= year <= 2019: 2000

For the year 2000:

The average income was $30,846.09

The median income was $22,458.80

Do you want to plot the data (yes/no): no Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r

Enter a percent: 40

40.00% of incomes are below $15,000.00 .

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p

Enter an income: 50000

An income of $50,000.00 is in the top 87.41% of incomes.

Enter a choice to get (r)ange, (p)ercent, or nothing to stop:

Test 2 (plotting)

Enter a year where 1990 <= year <= 2019: 2000

For the year 2000:

The average income was $30,846.09

The median income was $22,458.80

Do you want to plot the data (yes/no): yes

Enter a choice to get (r)ange, (p)ercent, or nothing to stop:

Test 3

Enter a year where 1990 <= year <= 2019: xxx

Error in year. Please try again.

Enter a year where 1990 <= year <= 2014: 1900

Error in year. Please try again.

Enter a year where 1990 <= year <= 2014: 1999

Error in file name: year1999.txt Please try again.

Enter a year where 1990 <= year <= 2014: 2014

For the year 2014:

The average income was $44,569.20

The median income was $27,457.00

Do you want to plot the data (yes/no): no

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r

Enter a percent: 70

70.00% of incomes are below $45,000.00 .

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p

Enter an income: 150000

An income of $150,000.00 is in the top 96.87% of incomes.

Enter a choice to get (r)ange, (p)ercent, or nothing to stop:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Visual C# And Databases

Authors: Philip Conrod, Lou Tylee

16th Edition

★★★★★

Why would the FedScope Employment database be more representative of the General Population in terms of Salary Data than the CPS studies?

Answered: 1 week ago

Previous Question Next Question