Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Description: Given a time series data which is a clickstream of user activity is stored in any flat flies, ask is to enrich the data

Description: Given a time series data which is a clickstream of user activity is stored in any flat flies, ask is to enrich the data with session id. Session Definition: Session expires after inactivity of 30 mins, because of inactivity no clickstream record will be generated. Session remains active for a total duration of 2 hours Steps: Load Data in any flat file format. Read the data and use spark batch (pyspark/scala) to do the computation. Save the results in parquet with enriched data. Note: Please do not use direct spark-sql.

Given Dataset: timestamp userid 2018-01-01T11:00:00Z u1

2018-01-01T12:00:00Z u1 2018-01-01T11:00:00Z u2 2018-01-02T11:00:00Z u2 2018-01-01T12:15:00Z u1

QUESTION 3 Description: In addition to the problem statement given in question 2 assume below scenario as well and design schema based on it: Get Number of sessions generated in a day. Total time spent by a user in a day Total time spent by a user over a month. Here are the guidelines and instructions for the solution of above queries: Design the table in any flat file format Write the script to create the file Load data into file Write all the queries in spark-sql Think in the direction of using partitioning, bucketing, etc.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Analysis Using SQL And Excel

Authors: Gordon S Linoff

2nd Edition

111902143X, 9781119021438

More Books

Students also viewed these Databases questions

Question

1. What might have led to the misinformation?

Answered: 1 week ago