Question

1 Approved Answer

Posted on Aug 25, 2024

Python Codes, thank you:) In this question, you will use information from Statistics Canada to analyse percentage changes of Canada national population grouped by ages

image text in transcribed

Python Codes, thank you:)

In this question, you will use information from Statistics Canada to analyse percentage changes of Canada national population grouped by ages from year 2013 to year 2017. On Moodle, you will find a CSV file age_statistics.csv with all the information that you require. Also, a starter python file a6q1-starter py is provided. Write your assignment based on the given starter file. The provided starter file opens the CSV file and reads the data it contains. You can just run the starter file to take a look at the data. You can see that the CSV file is a tabular file with commas for delimiters (if you want a better view of what the data in the CSV file looks like, open it in Excel OpenOffice, or similar spreadsheet program: this will nicely visualize the data in columns for you) To complete the assignment, do the following: (a) To analyze data in a file, you first need to separate the data lines/rows from the header lines/rows. If you examine the CSV file. you'll see that first actual data row is the 10th row. Write python code to extract only these data rows from the variable dat a and assign the result to a variable datal (b) As you can see, the first "column of data in dat al gives the age group as a string. e.g."0 to 4 years" We do not need this column for our analysis. Write python code to remove the data in the first column convert all of the remaining data to integers (instead of strings). and assign the result to the variable dat a2. (c) Although we don't want the age group strings in the data that we will use for computation, we still want to refer back to them when we want to output the result of our analysis. Write python code to define a dictionary rov_age_dct mapping the row index to age group strings. The dictionary should look like C0 0 to 4 years', 1 5 to 9 years', 2 10 to 14 ye ars', 3: 15 to 19 years', 4: '20 to 24 years', 5 25 to 29 years', 6 30 to 34 years', 7: 35 to 39 years', 8 40 to 44 years', 9 '45 to 49 years', 10: 50 to 54 years', 11: 55 to 59 years' 12: 60 to 64 years', 13: 65 to 69 y ears', 14: 70 to 74 years', 15 75 to 79 years', 16: 80 to 84 y ears', 17: 85 to 89 years', 18 '90 to 94 years', 19: '95 to 99 y ears', 20: 100 years and over'h (d) Convert the list dat a2 to a 2D numpy array and assign the result ot the variable data_array (e) We know that numpy arrays have some commonly used attributes such as the number of dimensions shape, size and the data type of an array. Print out these 4 attributes of the array data_array (f) Now let's do a little bit of calculation on the array dat a_array. Note that the columns of data that we retained are the population for each age group for the years 2013 through 2017. Use the sum() method defined in numpy module (not the built-in sum() function) to get the total population of each year (sum over all age groups), and print them out to the console. You can use help(np.sum) to check how to use this function to get the results required here. You should print out something like this: The total population in year 2013 is 351 52370 The total population in year 2014 is 35535348 The total population in year 2015 is 358 32513 The total population in year 2016 is 36264604 The total population in year 2017 is 36708083 (g) We would now like to determine year-over-year percentage change in population for the different age groups. Write a function called percentage_change(), which takes two parameters, one is a 2D array (containing the population data) and the other one is an integer which refers to a row index in the 2D array, and calculate the year-over-year percentage change of the age group at the given row index for all years. The return value of the function percentage_change() should be a 1-D array. Document it with a doctring The year over year percentage change for a population is: (current year population - previous year population) previous year population x 100 Hint: you can't compute the year-over-year percentage change for 2013 because you don't have the population data for 2012, so your returned array should be of length 4 and contain the year-over-year percentage changes for 2014 through 2017 Hint: try to use operators on arrays instead of loops to calculate the year-over-year-percentage- changes. (h) Print the following examples to the console by calling the function. print (percent age_change (dat a_array, print (percent age_change (dat a_array, print (percent age_change (dat a_array, print (percent age_change (data_array, 0)) 10)) 19)) 20)) If you did everything right, the function calls above should produce 0.11459547 0.1811961 0.905017340.567 34682] 0.71633971 -0.3930734 1.87950579 -2.619227 811 8.15072093 10.671878310.956 26277 8.530 62629] 2.8125567 1.747 26438 6.6782307 7.6422764 21 0) Finally. write code to determine which age group across ALL age groups, not just the 4 from part h. had the largest absolute (positive or negative) year-over-year population change from 2013 to 2017 (that is the age group that had the SINGLE largest change in any year as opposed to the age group that had the highest average change), and print this to the console, like this: The age group 1th the highest absolute ye ar-OTer-year-pe rceatage-change is 95 to 99 ye ars. Hint: You can use loops to callthepercentage_change () function to get all the year-over-year-percentage- change for all age groups. Use the row index of the largest percentage change to look up the age group as a string from the rov_age_dct dictionary. In this question, you will use information from Statistics Canada to analyse percentage changes of Canada national population grouped by ages from year 2013 to year 2017. On Moodle, you will find a CSV file age_statistics.csv with all the information that you require. Also, a starter python file a6q1-starter py is provided. Write your assignment based on the given starter file. The provided starter file opens the CSV file and reads the data it contains. You can just run the starter file to take a look at the data. You can see that the CSV file is a tabular file with commas for delimiters (if you want a better view of what the data in the CSV file looks like, open it in Excel OpenOffice, or similar spreadsheet program: this will nicely visualize the data in columns for you) To complete the assignment, do the following: (a) To analyze data in a file, you first need to separate the data lines/rows from the header lines/rows. If you examine the CSV file. you'll see that first actual data row is the 10th row. Write python code to extract only these data rows from the variable dat a and assign the result to a variable datal (b) As you can see, the first "column of data in dat al gives the age group as a string. e.g."0 to 4 years" We do not need this column for our analysis. Write python code to remove the data in the first column convert all of the remaining data to integers (instead of strings). and assign the result to the variable dat a2. (c) Although we don't want the age group strings in the data that we will use for computation, we still want to refer back to them when we want to output the result of our analysis. Write python code to define a dictionary rov_age_dct mapping the row index to age group strings. The dictionary should look like C0 0 to 4 years', 1 5 to 9 years', 2 10 to 14 ye ars', 3: 15 to 19 years', 4: '20 to 24 years', 5 25 to 29 years', 6 30 to 34 years', 7: 35 to 39 years', 8 40 to 44 years', 9 '45 to 49 years', 10: 50 to 54 years', 11: 55 to 59 years' 12: 60 to 64 years', 13: 65 to 69 y ears', 14: 70 to 74 years', 15 75 to 79 years', 16: 80 to 84 y ears', 17: 85 to 89 years', 18 '90 to 94 years', 19: '95 to 99 y ears', 20: 100 years and over'h (d) Convert the list dat a2 to a 2D numpy array and assign the result ot the variable data_array (e) We know that numpy arrays have some commonly used attributes such as the number of dimensions shape, size and the data type of an array. Print out these 4 attributes of the array data_array (f) Now let's do a little bit of calculation on the array dat a_array. Note that the columns of data that we retained are the population for each age group for the years 2013 through 2017. Use the sum() method defined in numpy module (not the built-in sum() function) to get the total population of each year (sum over all age groups), and print them out to the console. You can use help(np.sum) to check how to use this function to get the results required here. You should print out something like this: The total population in year 2013 is 351 52370 The total population in year 2014 is 35535348 The total population in year 2015 is 358 32513 The total population in year 2016 is 36264604 The total population in year 2017 is 36708083 (g) We would now like to determine year-over-year percentage change in population for the different age groups. Write a function called percentage_change(), which takes two parameters, one is a 2D array (containing the population data) and the other one is an integer which refers to a row index in the 2D array, and calculate the year-over-year percentage change of the age group at the given row index for all years. The return value of the function percentage_change() should be a 1-D array. Document it with a doctring The year over year percentage change for a population is: (current year population - previous year population) previous year population x 100 Hint: you can't compute the year-over-year percentage change for 2013 because you don't have the population data for 2012, so your returned array should be of length 4 and contain the year-over-year percentage changes for 2014 through 2017 Hint: try to use operators on arrays instead of loops to calculate the year-over-year-percentage- changes. (h) Print the following examples to the console by calling the function. print (percent age_change (dat a_array, print (percent age_change (dat a_array, print (percent age_change (dat a_array, print (percent age_change (data_array, 0)) 10)) 19)) 20)) If you did everything right, the function calls above should produce 0.11459547 0.1811961 0.905017340.567 34682] 0.71633971 -0.3930734 1.87950579 -2.619227 811 8.15072093 10.671878310.956 26277 8.530 62629] 2.8125567 1.747 26438 6.6782307 7.6422764 21 0) Finally. write code to determine which age group across ALL age groups, not just the 4 from part h. had the largest absolute (positive or negative) year-over-year population change from 2013 to 2017 (that is the age group that had the SINGLE largest change in any year as opposed to the age group that had the highest average change), and print this to the console, like this: The age group 1th the highest absolute ye ar-OTer-year-pe rceatage-change is 95 to 99 ye ars. Hint: You can use loops to callthepercentage_change () function to get all the year-over-year-percentage- change for all age groups. Use the row index of the largest percentage change to look up the age group as a string from the rov_age_dct dictionary