Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Spark-Scala Programming Fundamentals [30 marks] Provide spark-shell executable coding for the following tasks in a file named q1.scala (plain text). The program outputs must show

Spark-Scala Programming Fundamentals [30 marks] Provide spark-shell executable coding for the following tasks in a file named q1.scala (plain text). The program outputs must show clearly in spark-shell (failure to do so may lead to loss of marks). Your file must be appropriately commented to ensure that all significant programming steps have been clearly explained.

Create a Spark data frame from a CSV file which has the headers in the first row (create a small CSV file or use ~/ /Documents/Datasets/simple.csv in the bigdata virtual machine) and verify. [4+1 = 5 marks]

Print the data frames schema. [1 marks]

Convert the data frame to a RDD and display its contents. [1+1 =2 marks]

Create a RDD by reading from a text file (create a text file or use $SPARK_HOME/README.md in the bigdata vm). [2 marks]

Calculate the total length in characters, including white spaces, for all the lines in the $SPARK_HOME/README.md file. [5 marks]

Count and display all the words as (String, Int) pairs, which occur in $SPARK_HOME/README.md file of the bigdata vm. [5 marks]

Write a program which does word count of the $SPARK_HOME/README.md file using Spark. Explain the reduction operation. [2+3 = 5 marks]

Factorial is an integer number calculated as the product of itself with all number below it e.g. Factorial of 3 or 3! = 3x2x1 = 6. Factorial of 0 is always 1. Using these rules write a compact program, which computes the factorials of an integer array X(1,2,3,4,5) and then sums these up into a single value. [5 marks]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Modern Database Management

Authors: Jeff Hoffer, Ramesh Venkataraman, Heikki Topi

12th edition

133544613, 978-0133544619

More Books

Students also viewed these Databases questions

Question

What is meant by Career Planning and development ?

Answered: 1 week ago

Question

What are Fringe Benefits ? List out some.

Answered: 1 week ago

Question

=+What is the most that you should pay to complete development?

Answered: 1 week ago

Question

=+development and make the product, should you go ahead and do so?

Answered: 1 week ago