Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For datasets employees.csv and departments.csv: 1 st line of employees. c s v : 1 , Ms . Mossie Hagens V , 1 Employees id

For datasets employees.csv and departments.csv:
1st line of employees. csv :
1,Ms. Mossie Hagens V,1
Employees id, name and department id
1st line of departments. csv:
1, Braun Ltd
Department ID, Department Name
Tips: Apply Join with axis of the 2 datasets with base the department id key
Task 1:
Write code with Map /Reduce jobs where implements broadcast join (RDD API)
Task 2:
Write code with Map /Reduce jobs where implements repartition join (RDD API)
Task 3:
The Spark SQL includes different implementations of join algorithms. It also includes a query
optimizer called " Catalyst ". With the help of Catalyst, Spark SQL chooses the best join
algorithm based on the tables we want to join. We can stop Spark from automatically
choosing a join algorithm.
You can use the following script to stop Spark from automatically selecting a connection
algorithm. Run the following query with the query optimizer enabled and disabled. Create a
bar graph of the execution time and explain the results. Take a screenshot of the SQL plot
printed by the spark command. sql (query ). explain () and give a brief explanation of the
SQL plan.
Tips:
You must change each variable defined with > and provide your own custom values
Remember to restart Spark cluster before each measurement, to avoid hot caches,
or you can clear the cache.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning C# 2005 Databases

Authors: Karli Watson

1st Edition

0470044063, 978-0470044063

More Books

Students also viewed these Databases questions