Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp: string (nullable true) clickid: string (nullable true) userId: string (nullable = true) userSessionId: string

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed

Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp: string (nullable true) clickid: string (nullable true) userId: string (nullable = true) userSessionId: string (nullable = true) ishit: string (nullable true) teamId: string (nullable = true) |-- teamLevel: string (nullable = true) adclicks.printSchema () = E root T- timestamp: string (nullable true) txId: string (nullable true) userSessionId: string (nullable true) teamId: string (nullable = true) userId: string (nullable = true) |-- adid: string (nullable = true) | -- adCategory: string (nullable = true) Question 1: How many users in each team? Keywords: Dataframe API, SQL, group by, sort Use DataFrame API to group the users by teamID and count how many distinct users in each team. Sort the result in descending order. Indented block [ ] team_counts = # your code goes here (gla: 4 points) team_counts.show(). Now rewrite the above question using pure SQL: [ ] gameclicks.registerTemptable("gameclicks") query = # your code goes here (Q1b: 2 points) team_counts = spark.sql(query) team_counts.show() Questions 2: Now use the ad-clicks dataset to find the number of ad clicks in each hour. Keywords: group by, parse timestamp, plot timestamp_only adclicks.selectExpr(["to_timestamp (timestamp) as timestamp"]) click_count_by_hour = # your code goes here (Q2a: 4 points) click_count_by_hour.show(24) Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp: string (nullable true) clickid: string (nullable true) userId: string (nullable = true) userSessionId: string (nullable = true) ishit: string (nullable true) teamId: string (nullable = true) |-- teamLevel: string (nullable = true) adclicks.printSchema () = E root T- timestamp: string (nullable true) txId: string (nullable true) userSessionId: string (nullable true) teamId: string (nullable = true) userId: string (nullable = true) |-- adid: string (nullable = true) | -- adCategory: string (nullable = true) Question 1: How many users in each team? Keywords: Dataframe API, SQL, group by, sort Use DataFrame API to group the users by teamID and count how many distinct users in each team. Sort the result in descending order. Indented block [ ] team_counts = # your code goes here (gla: 4 points) team_counts.show(). Now rewrite the above question using pure SQL: [ ] gameclicks.registerTemptable("gameclicks") query = # your code goes here (Q1b: 2 points) team_counts = spark.sql(query) team_counts.show() Questions 2: Now use the ad-clicks dataset to find the number of ad clicks in each hour. Keywords: group by, parse timestamp, plot timestamp_only adclicks.selectExpr(["to_timestamp (timestamp) as timestamp"]) click_count_by_hour = # your code goes here (Q2a: 4 points) click_count_by_hour.show(24)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data And Databases

Authors: Jeff Mapua

1st Edition

1978502257, 978-1978502253

More Books

Students also viewed these Databases questions