In this question, you will use information from Statistics Canada to analyse percentage changes of Canada national population grouped by ages from year 2013 to
In this question, you will use information from Statistics Canada to analyse percentage changes of Canada
national population grouped by ages from year 2013 to year 2017 . On Moodle, you will nd a CSV file
age_statistics.csv with all the information that you require. Also, a starter python file a6q1_starter.py
is provided. Write your assignment based on the given starter file.
CSV file:
"Population estimates on July 1st by age and sex" 1 2 3 4
Annual
Table: 17-10-0005-01 (formerly CANSIM 051-0001)
Geography: Canada, Province or territory
Canada
Both sexes
Age group 2013 2014 2015 2016 2017
Persons
0 to 4 years 1,918,924 1,921,123 1,924,604 1, 942,022 1,953,040
5 to 9 years 1,882,687 1,918,323 1,952,041 1,985,144 2,003,143
10 to 14 years 1,868,495 1,865,818 1,864,760 1,886,340 1,920,898
15 to 19 years 2,178,288 2,137,784 2,097,043 2,066,404 2,056,445
20 to 24 years 2,445,559 2,470,054 2,460,317 2,467,287 2,476,338
25 to 29 years 2,408,813 2,437,377 2,464,184 2,515,993 2,574,384
30 to 34 years 2,434,220 2,479,427 2,499,523 2,529,348 2,553,635
35 to 39 years 2,326,722 2,367,428 2,401,531 2,455,403 2,506,165
40 to 44 years 2,371,026 2,358,616 2,349,528 2,345,732 2,364,959
45 to 49 years 2,568,593 2,492,188 2,432,391 2,415,365 2,405,165
50 to 54 years 2,754,559 2,774,291 2,763,386 2,711,448 2,640,429
55 to 59 years 2,501,797 2,557,158 2,602,741 2,653,893 2,683,302
60 to 64 years 2,110,161 2,167,664 2,234,388 2,300,327 2,374,636
65 to 69 years 1,747,711 1,831,749 1,911,216 1,976,211 1,997,090
70 to 74 years 1,256,700 1,315,039 1,371,962 1,438,585 1,547,668
75 to 79 years 947,393 973,989 1,000,838 1,035,621 1,077,431
80 to 84 years 729,397 738,240 745,302 753,852 763,413
85 to 89 years 452,747 464,667 477,845 492,434 504,232
90 to 94 years 199,304 211,417 220,767 228,925 236,012
95 to 99 years 43,763 47,330 52,381 58,120 63,078
100 years and over 5,511 5,666 5,765 6,150 6,620
starter file:
import numpy as np
import csv
f = open('age_statistics.csv', 'r')
csvreader = csv.reader(f, delimiter=',')
data = []
for row in csvreader:
row1 = [item.replace(',', '') for item in row]
data.append(row1)
print(data)
The provided starter file opens the CSV file and reads the data it contains. You can just run the starter file to
take a look at the data. You can see that the CSV file is a tabular le with commas for delimiters (if you want
a better view of what the data in the CSV file looks like, open it in Excel, OpenOce, or similar spreadsheet
program; this will nicely visualize the data in columns for you).
To complete the assignment, do the following:
(a) To analyze data in a le, you rst need to separate the data lines/rows from the header lines/rows.
If you examine the CSV file, youll see that rst actual data row is the 10th row. Write python code to
extract only these data rows from the variable data and assign the result to a variable data1.
(b) As you can see, the rst column of data in data1 gives the age group as a string, e.g. "0 to 4 years".
We do not need this column for our analysis. Write python code to remove the data in the rst column,
convert all of the remaining data to integers (instead of strings), and assign the result to the variable
data2.
(c) Although we dont want the age group strings in the data that we will use for computation, we still
want to refer back to them when we want to output the result of our analysis. Write python code to
dene a dictionary row_age_dct mapping the row index to age group strings. The dictionary should
look like:
{0: 0 to 4 years , 1: 5 to 9 years , 2: 10 to 14 years ,
3: 15 to 19 years , 4: 20 to 24 years , 5: 25 to 29 years ,
6: 30 to 34 years , 7: 35 to 39 years , 8: 40 to 44 years ,
9: 45 to 49 years , 10: 50 to 54 years , 11: 55 to 59 years ,
12: 60 to 64 years , 13: 65 to 69 years , 14: 70 to 74 years ,
15: 75 to 79 years , 16: 80 to 84 years , 17: 85 to 89 years ,
18: 90 to 94 years , 19: 95 to 99 years , 20: 100 years and over }
(d) Convert the list data2 to a 2D numpy array and assign the result ot the variable data_array.
(e) We know that numpy arrays have some commonly used attributes such as the number of dimensions,
shape, size and the data type of an array. Print out these 4 attributes of the array data_array.
(f) Now lets do a little bit of calculation on the array data_array. Note that the columns of data that
we retained are the population for each age group for the years 2013 through 2017. Use the sum()
method dened in numpy module (not the built-in sum() function) to get the total population of each
year (sum over all age groups), and print them out to the console. You can use help(np.sum) to check
how to use this function to get the results required here. You should print out something like this:
The total population in year 2013 is 35152370
The total population in year 2014 is 35535348
The total population in year 2015 is 35832513
The total population in year 2016 is 36264604
The total population in year 2017 is 36708083
(g) We would now like to determine year-over-year percentage change in population for the dierent age
groups. Write a function called percentage_change(), which takes two parameters, one is a 2D array
(containing the population data) and the other one is an integer which refers to a row index in the 2D
array, and calculate the year-over-year percentage change of the age group at the given row indexfor all years. The return value of the function percentage_change() should be a 1-D array. Document
it with a doctring.
The year over year percentage change for a population is:
(current year population - previous year population)/
previous year population 100
Hint: you cant compute the year-over-year percentage change for 2013 because you dont have the
population data for 2012, so your returned array should be of length 4 and contain the year-over-year
percentage changes for 2014 through 2017.
Hint: try to use operators on arrays instead of loops to calculate the year-over-year-percentage changes
(h) Print the following examples to the console by calling the function.
print ( percentage_change ( data_array , 0))
print ( percentage_change ( data_array , 10))
print ( percentage_change ( data_array , 19))
print ( percentage_change ( data_array , 20))
If you did everything right, the function calls above should produce:
[ 0.11459547 0.1811961 0.90501734 0.56734682]
[ 0.71633971 -0.3930734 -1.87950579 -2.61922781]
[ 8.15072093 10.6718783 10.95626277 8.53062629]
[ 2.8125567 1.74726438 6.6782307 7.64227642]
(i) Finally, write code to determine which age group had the largest absolute (positive or negative) yearover-year
population change from 2013 to 2017 and print this to the console, like this:
The age group with the highest absolute year - over - year - percentage - change is 95 to 99 years.
Hint: You can use loops to call the percentage_change() function to get all the year-over-year-percentagechange
for all age groups. Use the row index of the largest percentage change to look up the age
group as a string from the row_age_dct dictionary.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started