Question
_______ is the component of Spark that is responsible for assigning work that will be completed in parallel. In a single Databricks cluster, there will
_______ is the component of Spark that is responsible for assigning work that will be completed in parallel. In a single Databricks cluster, there will only be one of this component.
QUESTION 2
Select all different methods to run tasks in parallel.
Running tasks in multiple threads | ||
Running tasks in multiple processes | ||
Increasing CPU clock speed of the node the task is running | ||
Adding additional nodes to run the tasks | ||
Increasing the disk space of the node the task is running | ||
Increasing the memory of the node the task is running |
QUESTION 3
Select all the benefits of MapReduce algorithm over others
MapReduce approach is that it does not require a central data structure | ||
Allows multiple tasks to run in parallel in mapping, shuffling, and reducing | ||
Tasks that require iterative processes over data works well with MapReduce |
QUESTION 4
Select all correct information about Spark DataFrames.
A DataFrame is an immutable, distributed collection of data organized into named columns | ||
A Spark DataFrame carries important metadata that allows Spark to optimize queries | ||
Spark DataFrames conceptually equivalent to a table in a relational database | ||
Information stored in a Spark DataFrame automatically saved into the database |
QUESTION 5
Actions are statements that are computed AND executed when they are encountered in the developer's code.
True
False
QUESTION 6
What are the five main components of Apache Spark ecosystem?
Spark Core | ||
Spark SQL | ||
Spark Streaming and Structured Streaming | ||
Machine Learning Library (MLlib) | ||
Graph Computation (GraphX) | ||
Spark HDFS | ||
Spark YARN |
QUESTION 7
______ are created by the driver and assigned a partiton of data to process. Then, ______ are assigned to slots for parallel execution.
QUESTION 8
Spark can be configured in three different deployment modes: local, client, and cluster mode.
True
False
QUESTION 9
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster
True
False
QUESTION 10
Apache Spark is a sophisticated distributed computation framework for executing code in parallel across many different machines.
True
False
QUESTION 11
How many driver programs can run in a single Spark Cluster?
QUESTION 12
Select all correct information about Spark Executors.
The executors are responsible for carrying out the work assigned by the driver | ||
Execute code assigned by the driver | ||
Report the state of the computation back to the driver | ||
Maintaining information about the Spark Application |
QUESTION 13
What are the four main components of Hadoop ecosystem?
Hadoop Distributed File System (HDFS): | ||
Yet Another Resource Negotiator (YARN) | ||
Hadoop MapReduce | ||
Hadoop Common (Hadoop Core) | ||
Hadoop Analytics | ||
Hadoop Spark |
QUESTION 14
Select all correct information about map reduce algorithm.
A data set is mapped into a collection of (key value) pairs in the mapping step | ||
Mapping step produces intermediate results and associates values with an output key | ||
Shuffling step produces intermediate results and associates values with an output key | ||
Shuffling step groups intermediate results associated with the same output key | ||
Reducing step groups intermediate results associated with the same output key | ||
Reducing step processes groups of intermediate results with the same output key | ||
Mapping step processes groups of intermediate results with the same output key |
QUESTION 15
Select all correct information about Transformations
Transformations are at the core of how you express your business logic in Spark. | ||
Transformations has lazy evaluation | ||
There are 3 types of transformations, narrow, wide, and shuffler. | ||
Narrow transformations mean that the work happens on the executor without changing the way data is partitioned over the system |
QUESTION 16
RDDs uses Catalyst Optimizer to find efficient plan for applying your transformations and actions.
True
False
QUESTION 17
What is the primary difference between Spark and Hadoop MapReduce?
Spark processes and retains data in memory for subsequent steps, whereas MapReduce processes data on disk. | ||
Hadoop processes and retains data in memory for subsequent steps, whereas Spark processes data on disk. | ||
Hadoop brings compute to datasets, whereas Spark brings data during compute. | ||
Spark brings compute to datasets, whereas Hadoop brings data during compute. |
QUESTION 18
A ______ is a collection of rows that sit on one physical machine in the cluster.
QUESTION 19
If you have 3 executors and each executor has 3 slots, what is the maximum number of tasks that can be executed at any one time?
QUESTION 20
Sort the transformational phases in the Catalyst Optimizer.
- 1. 2. 3. 4.
Code generation
- 1. 2. 3. 4.
Analysis
- 1. 2. 3. 4.
Physical planning
- 1. 2. 3. 4.
Logical optimization
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started