Question

1 Approved Answer

Posted on Sep 08, 2024

Language- Python, pandas 3e) Sort zipcodes into Geographic Subdivision The Safe Harbor Method applies to Geographic Subdivisions as opposed to each zipcode itself. Geographic Subdivision:

Language- Python, pandas

image text in transcribed

3e) Sort zipcodes into "Geographic Subdivision" The Safe Harbor Method applies to "Geographic Subdivisions" as opposed to each zipcode itself. Geographic Subdivision: All areas which share the first 3 digits of a zip code Count the total population for each geographic subdivision, storing the first 3 digits of the zip code and its corresponding population in the dictionary zip_dict . (For example, if there were 20 people whose zip code started with 090, the key-value pair in zip_dict would be {'090' : 20} .) You may be tempted to write a gnarly loop to accomplish this. Avoid that temptation. Instead, you'll want to be savy with a dictionary and groupby from pandas here. To get you started... If you wanted to group by whole zip code, you could use something like this: df_zip.groupby (df_zipl'zip']). But, we don't want to group by the entire zip code. Instead, we want to extract the first 3 digits of a zip code, and group by that. To extract the first three digits, you could so something like the following: df_zip['zip'].str[:3] You'll want to combine these two concepts, such that you store this information in a dictionary zip_dict , which stores the first three digits of the zip code as the key and the population of that 3-digit zip code as the value. (If you're stuck and/or to better understand how dictionaries work and how they apply to this concept, check the section materials, use google, and go to discussion sections!) assert isinstance(zip dict, dict) assert zip_dict['100'] == 1502501 3f) Masking the Zip Codes In this part, you should write a for loop, updating the df_users dataframe. Go through each user, and update their zip code, to Safe Harbor specifications: If the user is from a zip code for the which the "Geographic Subdivision" is less than equal to 20,000, change the zip code to O Otherwise, change the zip code to be only the first 3 numbers of the full zip code Do all this rewritting the zip_code columns of the df_users DataFrame Hints: 1. This will be several lines of code, looping through the DataFrame, getting each zip code, checking the geographic subdivision with the population in zip_dict, and setting the zip_code accordingly. 2. Be very aware of your variable types when working with zip codes here. ] : # YOUR CODE HERE raise NotImplementedError(). assert len(df_users) == 943 assert df_users.loc[671, 'zip'] == '285 3e) Sort zipcodes into "Geographic Subdivision" The Safe Harbor Method applies to "Geographic Subdivisions" as opposed to each zipcode itself. Geographic Subdivision: All areas which share the first 3 digits of a zip code Count the total population for each geographic subdivision, storing the first 3 digits of the zip code and its corresponding population in the dictionary zip_dict . (For example, if there were 20 people whose zip code started with 090, the key-value pair in zip_dict would be {'090' : 20} .) You may be tempted to write a gnarly loop to accomplish this. Avoid that temptation. Instead, you'll want to be savy with a dictionary and groupby from pandas here. To get you started... If you wanted to group by whole zip code, you could use something like this: df_zip.groupby (df_zipl'zip']). But, we don't want to group by the entire zip code. Instead, we want to extract the first 3 digits of a zip code, and group by that. To extract the first three digits, you could so something like the following: df_zip['zip'].str[:3] You'll want to combine these two concepts, such that you store this information in a dictionary zip_dict , which stores the first three digits of the zip code as the key and the population of that 3-digit zip code as the value. (If you're stuck and/or to better understand how dictionaries work and how they apply to this concept, check the section materials, use google, and go to discussion sections!) assert isinstance(zip dict, dict) assert zip_dict['100'] == 1502501 3f) Masking the Zip Codes In this part, you should write a for loop, updating the df_users dataframe. Go through each user, and update their zip code, to Safe Harbor specifications: If the user is from a zip code for the which the "Geographic Subdivision" is less than equal to 20,000, change the zip code to O Otherwise, change the zip code to be only the first 3 numbers of the full zip code Do all this rewritting the zip_code columns of the df_users DataFrame Hints: 1. This will be several lines of code, looping through the DataFrame, getting each zip code, checking the geographic subdivision with the population in zip_dict, and setting the zip_code accordingly. 2. Be very aware of your variable types when working with zip codes here. ] : # YOUR CODE HERE raise NotImplementedError(). assert len(df_users) == 943 assert df_users.loc[671, 'zip'] == '285