Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

App Annie 1 Task 1 2 Group and sort data using PySpark. Requirements 3 You are given a path to a file of comma-separated values

App Annie 1 Task 1 2 Group and sort data using PySpark. Requirements 3 You are given a path to a file of comma-separated values (CSV), jobs.csv, which contains people's names and job titles, such as Dancer, Nurse, Pilot, etc. The dataset has two columns: 'name' (a string data type) and 'job' (also a string data type). name job Tony Sullivan Office manager Mary Henry Film editor II Tiffany Young Dancer Implement a group_sort(input_path) method that reads data from the jobs.csv file and returns a dictionary in which the keys are jobs and the values are counts of how many times each job appears within the dataset. The dictionary should be ordered by count (in ascending order), then job (in ascending order from A to Z). The group_sort(input_path) method takes one argument: input_path a path to the CSV file containing the data. Available packages/libraries - ? Python 3.8 and all of its built-in packages Spark version 3.1.1. Hints You can use reducebykey and sortbykey operations on a key/value RDD Object, or you can use pyspark.sql functions. Examples. calling the group_sort(input_path) method should return the dictionary with the following structure: {job title_1' : count_job_1, 'job_title_2' : count_job_2,.....,'job_title_3': count_job_3}

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Knowledge Discovery In Databases

Authors: Gregory Piatetsky-Shapiro, William Frawley

1st Edition

0262660709, 978-0262660709

More Books

Students also viewed these Databases questions

Question

Understand the use of exit interview data for analytics

Answered: 1 week ago

Question

5-8 What are the advantages and disadvantages of the BYOD movement?

Answered: 1 week ago