Question

1 Approved Answer

Posted on Sep 07, 2024

Solve this on Python Jupyter Notebook please! The Names.csv looks like this: And the Salaries.csv looks like this: In this HW, we will study the

Solve this on Python Jupyter Notebook please!

image text in transcribed

The Names.csv looks like this:

image text in transcribed

And the Salaries.csv looks like this:

image text in transcribed

In this HW, we will study the pay gap between men and women who have jobs in San Francisco. We will use the following two csv files to accomplish this task. Salaries.csv : contains salaries for over 100K employees in SF from 2011 to 2014. Names.csv : contains baby names from 1980 to 2014 along with counts of how many times the given baby name was used. We would like to find the average salary of men and women over all jobs from 2011 to 2014. The problem, however, is that the Salaries.csv does not contain gender. Futher, there are many names that are unisex. Since we have counts in the file Names.csv, we use a majority vote to label the gender of each name in Names.csv. You will be asked to write a series of functions to implement this task. Note: Unlike previous homeworks, the problems in this homework are inter-dependent in the sense that you can only pass the test for problem n if you have passed the test cases in problem n-1, since normally problem n requires to call the function in problem n-1. Problem 1 Read the data The following functions ReadData will read in the salary and the names data as pandas dataframes and return a list which contains these two dataframes. In [6]: #Place your import here import pandas as pd import numpy as np def ReadData(): df_salaries = None df_names = None # YOUR CODE HERE raise NotImplementedError() return [df_salaries, df_names ] In [ ]: [df_salaries, df_names ] = ReadData() assert df_salaries.shape == (27386, 6) Problem 2 Get name counts The following functions ParseNames will take the name dataframe as an input. It will then output two dictionaries called male_name and female_name. The key in each of these dictionaries will be the names in all lowercase) and the value will be the sum of counts for the given name when it applied to the given gender. Note that the same name may appear in both the male and female gender. For this function, USE ONLY ITERROWS(), NO GROUPING OR FILTERING YET! In [ ]: def ParseNames (df_names): INPUT: the pandas dataframe contains names.csv OUTPUT: two dictionaries: male_names, female_name. The key in each of these dictionaries will be names (in all lowercase) and the value will be the sum of the counts for the given name when it applies to the given gender. USE ONLY ITERROWS(), NO GROUPING OR FILTERING YET! This above function will take a minute or two to run. #Initialize empty dictionaries for names male_names = {} female_names = {} # YOUR CODE HERE raise Not ImplementedError() return male_names, female_names In [ ]: [male_names, female_names ] = ParseNames (df_names) assert len(male_names) == 9481 assert len(female_names) == 15230 Problem 3 Get First Name This following functions GetFirstName will take a name of a person (name contains first and last names separated by spaces) and return the lower case of the first name of the person. In [ ]: def GetFirstName (name): II II II Gets the first name from a name in the column EmployeeName in Salaries.csv. INPUT: name as string OUTPUT: first name in all lowercase 11 II 11 first_name # YOUR CODE HERE raise NotImplementedError() return first_name In [ ]: assert GetFirstName("Dennis Zhang") == "dennis" Problem 4 GetGender This function takes in the dictionary for the male and female names from ParseNames, and a first name. It then returns "M" if the first name appears more times in male_names than female_names, "F" if the first name appears more times in female_names than male_names (or the two fequencies are the same), and "NA" if the name does not appear in either male_names nor female_names. In [ ]: def AddGender(first_name, male_names, female_names): Find the most likely gender associated with a first name. INPUT: first_name, males_names and females_names which are the dictionaries returned from ParseNames(). OUTPUT: "M" if male_names[name] > female_names[name] "F" if male_names[name] female_names[name] "F" if male_names[name]