Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp: string (nullable true) clickid: string (nullable true) userId: string (nullable = true) userSessionId: string
Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp: string (nullable true) clickid: string (nullable true) userId: string (nullable = true) userSessionId: string (nullable = true) ishit: string (nullable true) teamId: string (nullable = true) |-- teamLevel: string (nullable = true) adclicks.printSchema () = E root T- timestamp: string (nullable true) txId: string (nullable true) userSessionId: string (nullable true) teamId: string (nullable = true) userId: string (nullable = true) |-- adid: string (nullable = true) | -- adCategory: string (nullable = true) Question 1: How many users in each team? Keywords: Dataframe API, SQL, group by, sort Use DataFrame API to group the users by teamID and count how many distinct users in each team. Sort the result in descending order. Indented block [ ] team_counts = # your code goes here (gla: 4 points) team_counts.show(). Now rewrite the above question using pure SQL: [ ] gameclicks.registerTemptable("gameclicks") query = # your code goes here (Q1b: 2 points) team_counts = spark.sql(query) team_counts.show() Questions 2: Now use the ad-clicks dataset to find the number of ad clicks in each hour. Keywords: group by, parse timestamp, plot timestamp_only adclicks.selectExpr(["to_timestamp (timestamp) as timestamp"]) click_count_by_hour = # your code goes here (Q2a: 4 points) click_count_by_hour.show(24) Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp: string (nullable true) clickid: string (nullable true) userId: string (nullable = true) userSessionId: string (nullable = true) ishit: string (nullable true) teamId: string (nullable = true) |-- teamLevel: string (nullable = true) adclicks.printSchema () = E root T- timestamp: string (nullable true) txId: string (nullable true) userSessionId: string (nullable true) teamId: string (nullable = true) userId: string (nullable = true) |-- adid: string (nullable = true) | -- adCategory: string (nullable = true) Question 1: How many users in each team? Keywords: Dataframe API, SQL, group by, sort Use DataFrame API to group the users by teamID and count how many distinct users in each team. Sort the result in descending order. Indented block [ ] team_counts = # your code goes here (gla: 4 points) team_counts.show(). Now rewrite the above question using pure SQL: [ ] gameclicks.registerTemptable("gameclicks") query = # your code goes here (Q1b: 2 points) team_counts = spark.sql(query) team_counts.show() Questions 2: Now use the ad-clicks dataset to find the number of ad clicks in each hour. Keywords: group by, parse timestamp, plot timestamp_only adclicks.selectExpr(["to_timestamp (timestamp) as timestamp"]) click_count_by_hour = # your code goes here (Q2a: 4 points) click_count_by_hour.show(24)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started