Question
Problem 1 . Rob is installing Spark in Ubuntu 16.04 OS. Please help him with the installation. Step 1 : Since Spark needs Scala, Rob
Problem 1. Rob is installing Spark in Ubuntu 16.04 OS. Please help him with the installation.
Step 1: Since Spark needs Scala, Rob needs to install Scala first. He downloads the Scala (scala-2.12.4.tgz). He unzips the file in the /home/rob/ directory and renames the folder as scala. Therefore /home/rob/scala is the root directory for Scala. After that, please tell Rob how to update the environment variables SCALA_HOME and PATH.
Answer: Open the .bashrc file by using the following command:
$
Add the following two lines to the end of the file to update the environment variables SCALA_HOME and PATH.
1:
2:
Step 2: He downloads the Spark (spark-2.2.1-bin-hadoop2.7.tgz). He unzips the file in the /home/rob/ directory and renames the folder as spark. Thus /home/rob/spark is the root directory for Spark. Now Rob needs to update the environment variables SPARK_HOME and PATH.
Answer: Open the .bashrc file again and add the following two lines to the end of the file.
1:
2:
Step 3: After Spark installation, we can use the following commands to verify the Spark installation. To start the Python Spark shell, we should type:
$
To start the Scala Spark shell, we should type:
$
Step 4: Rob wants to run the WordCount example in the batch mode. Suppose that the Python source code is in the file WordCount.py, please give the command for running this Python Spark source code file. Suppose the input file name and output file directory are hard coded in the source code, so you do not need to pass those parameters in the command line.
$
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started