Question
Spark-Scala Programming Fundamentals [30 marks] Provide spark-shell executable coding for the following tasks in a file named q1.scala (plain text). The program outputs must show
Spark-Scala Programming Fundamentals [30 marks] Provide spark-shell executable coding for the following tasks in a file named q1.scala (plain text). The program outputs must show clearly in spark-shell (failure to do so may lead to loss of marks). Your file must be appropriately commented to ensure that all significant programming steps have been clearly explained.
Create a Spark data frame from a CSV file which has the headers in the first row (create a small CSV file or use ~/ /Documents/Datasets/simple.csv in the bigdata virtual machine) and verify. [4+1 = 5 marks]
Print the data frames schema. [1 marks]
Convert the data frame to a RDD and display its contents. [1+1 =2 marks]
Create a RDD by reading from a text file (create a text file or use $SPARK_HOME/README.md in the bigdata vm). [2 marks]
Calculate the total length in characters, including white spaces, for all the lines in the $SPARK_HOME/README.md file. [5 marks]
Count and display all the words as (String, Int) pairs, which occur in $SPARK_HOME/README.md file of the bigdata vm. [5 marks]
Write a program which does word count of the $SPARK_HOME/README.md file using Spark. Explain the reduction operation. [2+3 = 5 marks]
Factorial is an integer number calculated as the product of itself with all number below it e.g. Factorial of 3 or 3! = 3x2x1 = 6. Factorial of 0 is always 1. Using these rules write a compact program, which computes the factorials of an integer array X(1,2,3,4,5) and then sums these up into a single value. [5 marks]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started