Answered step by step
Verified Expert Solution
Link Copied!
Question
1 Approved Answer

In this question, you will use information from Statistics Canada to analyse percentage changes of Canada national population grouped by ages from year 2013 to

In this question, you will use information from Statistics Canada to analyse percentage changes of Canada

national population grouped by ages from year 2013 to year 2017 . On Moodle, you will nd a CSV file

age_statistics.csv with all the information that you require. Also, a starter python file a6q1_starter.py

is provided. Write your assignment based on the given starter file.

CSV file:

"Population estimates on July 1st by age and sex" 1 2 3 4

Annual

Table: 17-10-0005-01 (formerly CANSIM 051-0001)

Geography: Canada, Province or territory

Canada

Both sexes

Age group 2013 2014 2015 2016 2017

Persons

0 to 4 years 1,918,924 1,921,123 1,924,604 1, 942,022 1,953,040

5 to 9 years 1,882,687 1,918,323 1,952,041 1,985,144 2,003,143

10 to 14 years 1,868,495 1,865,818 1,864,760 1,886,340 1,920,898

15 to 19 years 2,178,288 2,137,784 2,097,043 2,066,404 2,056,445

20 to 24 years 2,445,559 2,470,054 2,460,317 2,467,287 2,476,338

25 to 29 years 2,408,813 2,437,377 2,464,184 2,515,993 2,574,384

30 to 34 years 2,434,220 2,479,427 2,499,523 2,529,348 2,553,635

35 to 39 years 2,326,722 2,367,428 2,401,531 2,455,403 2,506,165

40 to 44 years 2,371,026 2,358,616 2,349,528 2,345,732 2,364,959

45 to 49 years 2,568,593 2,492,188 2,432,391 2,415,365 2,405,165

50 to 54 years 2,754,559 2,774,291 2,763,386 2,711,448 2,640,429

55 to 59 years 2,501,797 2,557,158 2,602,741 2,653,893 2,683,302

60 to 64 years 2,110,161 2,167,664 2,234,388 2,300,327 2,374,636

65 to 69 years 1,747,711 1,831,749 1,911,216 1,976,211 1,997,090

70 to 74 years 1,256,700 1,315,039 1,371,962 1,438,585 1,547,668

75 to 79 years 947,393 973,989 1,000,838 1,035,621 1,077,431

80 to 84 years 729,397 738,240 745,302 753,852 763,413

85 to 89 years 452,747 464,667 477,845 492,434 504,232

90 to 94 years 199,304 211,417 220,767 228,925 236,012

95 to 99 years 43,763 47,330 52,381 58,120 63,078

100 years and over 5,511 5,666 5,765 6,150 6,620

starter file:

import numpy as np

import csv

f = open('age_statistics.csv', 'r')

csvreader = csv.reader(f, delimiter=',')

data = []

for row in csvreader:

row1 = [item.replace(',', '') for item in row]

data.append(row1)

print(data)

The provided starter file opens the CSV file and reads the data it contains. You can just run the starter file to

take a look at the data. You can see that the CSV file is a tabular le with commas for delimiters (if you want

a better view of what the data in the CSV file looks like, open it in Excel, OpenOce, or similar spreadsheet

program; this will nicely visualize the data in columns for you).

To complete the assignment, do the following:

(a) To analyze data in a le, you rst need to separate the data lines/rows from the header lines/rows.

If you examine the CSV file, youll see that rst actual data row is the 10th row. Write python code to

extract only these data rows from the variable data and assign the result to a variable data1.

(b) As you can see, the rst column of data in data1 gives the age group as a string, e.g. "0 to 4 years".

We do not need this column for our analysis. Write python code to remove the data in the rst column,

convert all of the remaining data to integers (instead of strings), and assign the result to the variable

data2.

(c) Although we dont want the age group strings in the data that we will use for computation, we still

want to refer back to them when we want to output the result of our analysis. Write python code to

dene a dictionary row_age_dct mapping the row index to age group strings. The dictionary should

look like:

{0: 0 to 4 years , 1: 5 to 9 years , 2: 10 to 14 years ,

3: 15 to 19 years , 4: 20 to 24 years , 5: 25 to 29 years ,

6: 30 to 34 years , 7: 35 to 39 years , 8: 40 to 44 years ,

9: 45 to 49 years , 10: 50 to 54 years , 11: 55 to 59 years ,

12: 60 to 64 years , 13: 65 to 69 years , 14: 70 to 74 years ,

15: 75 to 79 years , 16: 80 to 84 years , 17: 85 to 89 years ,

18: 90 to 94 years , 19: 95 to 99 years , 20: 100 years and over }

(d) Convert the list data2 to a 2D numpy array and assign the result ot the variable data_array.

(e) We know that numpy arrays have some commonly used attributes such as the number of dimensions,

shape, size and the data type of an array. Print out these 4 attributes of the array data_array.

(f) Now lets do a little bit of calculation on the array data_array. Note that the columns of data that

we retained are the population for each age group for the years 2013 through 2017. Use the sum()

method dened in numpy module (not the built-in sum() function) to get the total population of each

year (sum over all age groups), and print them out to the console. You can use help(np.sum) to check

how to use this function to get the results required here. You should print out something like this:

The total population in year 2013 is 35152370

The total population in year 2014 is 35535348

The total population in year 2015 is 35832513

The total population in year 2016 is 36264604

The total population in year 2017 is 36708083

(g) We would now like to determine year-over-year percentage change in population for the dierent age

groups. Write a function called percentage_change(), which takes two parameters, one is a 2D array

(containing the population data) and the other one is an integer which refers to a row index in the 2D

array, and calculate the year-over-year percentage change of the age group at the given row indexfor all years. The return value of the function percentage_change() should be a 1-D array. Document

it with a doctring.

The year over year percentage change for a population is:

(current year population - previous year population)/

previous year population 100

Hint: you cant compute the year-over-year percentage change for 2013 because you dont have the

population data for 2012, so your returned array should be of length 4 and contain the year-over-year

percentage changes for 2014 through 2017.

Hint: try to use operators on arrays instead of loops to calculate the year-over-year-percentage changes

(h) Print the following examples to the console by calling the function.

print ( percentage_change ( data_array , 0))

print ( percentage_change ( data_array , 10))

print ( percentage_change ( data_array , 19))

print ( percentage_change ( data_array , 20))

If you did everything right, the function calls above should produce:

[ 0.11459547 0.1811961 0.90501734 0.56734682]

[ 0.71633971 -0.3930734 -1.87950579 -2.61922781]

[ 8.15072093 10.6718783 10.95626277 8.53062629]

[ 2.8125567 1.74726438 6.6782307 7.64227642]

(i) Finally, write code to determine which age group had the largest absolute (positive or negative) yearover-year

population change from 2013 to 2017 and print this to the console, like this:

The age group with the highest absolute year - over - year - percentage - change is 95 to 99 years.

Hint: You can use loops to call the percentage_change() function to get all the year-over-year-percentagechange

for all age groups. Use the row index of the largest percentage change to look up the age

group as a string from the row_age_dct dictionary.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image
Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students explore these related Databases questions