App Annie 1 Task 1 2 Group and sort data using PySpark. Requirements 3 You are given a path to a file of comma-separated values
App Annie 1 Task 1 2 Group and sort data using PySpark. Requirements 3 You are given a path to a file of comma-separated values (CSV), jobs.csv, which contains people's names and job titles, such as Dancer, Nurse, Pilot, etc. The dataset has two columns: 'name' (a string data type) and 'job' (also a string data type). name job Tony Sullivan Office manager Mary Henry Film editor II Tiffany Young Dancer Implement a group_sort(input_path) method that reads data from the jobs.csv file and returns a dictionary in which the keys are jobs and the values are counts of how many times each job appears within the dataset. The dictionary should be ordered by count (in ascending order), then job (in ascending order from A to Z). The group_sort(input_path) method takes one argument: input_path a path to the CSV file containing the data. Available packages/libraries - ? Python 3.8 and all of its built-in packages Spark version 3.1.1. Hints You can use reducebykey and sortbykey operations on a key/value RDD Object, or you can use pyspark.sql functions. Examples. calling the group_sort(input_path) method should return the dictionary with the following structure: {job title_1' : count_job_1, 'job_title_2' : count_job_2,.....,'job_title_3': count_job_3}
Step by Step Solution
There are 3 Steps involved in it
Step: 1
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started