is the component of Spark that is responsible for assigning work that will be completed in parallel In a single Databricks cluster, there will only be one of this component QUESTION 2 Select all different methods to run tasks in parallel Running tasks in multiple threads Running tasks in multiple processes Increasing CPU clock speed of the node the task is running Adding additional nodes to run the tasks Increasing the disk space of the node the task is running Increasing the memory of the node the task is running QUESTION 3 Select all the benefits of MapReduce algorithm over others MapReduce approach is that it does not require a central data structure Allows multiple tasks to run in parallel in mapping, shuffling, and reducing Tasks that require iterative processes over data works well with MapReduce QUESTION 4 Select all correct information about Spark DataFrames A DataFrame is an immutable, distributed collection of data organized into named columns A Spark DataFrame carries important metadata that allows Spark to optimize queries Spark DataFrames conceptually equivalent to a table in a relational database Information stored in a Spark DataFrame automatically saved into the database QUESTION 5 Actions are statements that are computed AND executed when they are encountered in the developer's code True False QUESTION 6 What are the five main components of Apache Spark ecosystem Spark Core Spark SQL Spark Streaming and Structured Streaming Machine Learning Library (MLlib) Graph Computation (GraphX) Spark HDFS Spark YARN QUESTION 7 are created by the driver and assigned a partiton of data to process Then, are assigned to slots for parallel execution QUESTION 8 Spark can be configured in three different deployment modes local, client, and cluster mode True False QUESTION 9 MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster True False QUESTION 10 Apache Spark is a sophisticated distributed computation framework for executing code in parallel across many different machines True False QUESTION 11 How many driver programs can run in a single Spark Cluster QUESTION 12 Select all correct information about Spark Executors The executors are responsible for carrying out the work assigned by the driver Execute code assigned by the driver Report the state of the computation back to the driver Maintaining information about the Spark Application QUESTION 13 What are the four main components of Hadoop ecosystem Hadoop Distributed File System (HDFS) Yet Another Resource Negotiator (YARN) Hadoop MapReduce Hadoop Common (Hadoop Core) Hadoop Analytics Hadoop Spark QUESTION 14 Select all correct information about map reduce algorithm A data set is mapped into a collection of (key value) pairs in the mapping step Mapping step produces intermediate results and associates values with an output key Shuffling step produces intermediate results and associates values with an output key Shuffling step groups intermediate results associated with the same output key Reducing step groups intermediate results associated with the same output key Reducing step processes groups of intermediate results with the same output key Mapping step processes groups of intermediate results with the same output key QUESTION 15 Select all correct information about Transformations Transformations are at the core of how you express your business logic in Spark Transformations has lazy evaluation There are 3 types of transformations, narrow, wide, and shuffler Narrow transformations mean that the work happens on the executor without changing the way data is partitioned over the system QUESTION 16 RDDs uses Catalyst Optimizer to find efficient plan for applying your transformations and actions True False QUESTION 17 What is the primary difference between Spark and Hadoop MapReduce Spark processes and retains data in memory for subsequent steps, whereas MapReduce processes data on disk Hadoop processes and retains data in memory for subsequent steps, whereas Spark processes data on disk Hadoop brings compute to datasets, whereas Spark brings data during compute Spark brings compute to datasets, whereas Hadoop brings data during compute QUESTION 18 A is a collection of rows that sit on one physical machine in the cluster QUESTION 19 If you have 3 executors and each executor has 3 slots, what is the maximum number of tasks that can be executed at any one time QUESTION 20 Sort the transformational phases in the Catalyst Optimizer 1 2 3 4 Code generation 1 2 3 4 Analysis 1 2 3 4 Physical planning 1 2 3 4 Logical optimization

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

_______ is the component of Spark that is responsible for assigning work that will be completed in parallel. In a single Databricks cluster, there will

_______ is the component of Spark that is responsible for assigning work that will be completed in parallel. In a single Databricks cluster, there will only be one of this component.

QUESTION 2

Select all different methods to run tasks in parallel.

		Running tasks in multiple threads
		Running tasks in multiple processes
		Increasing CPU clock speed of the node the task is running
		Adding additional nodes to run the tasks
		Increasing the disk space of the node the task is running
		Increasing the memory of the node the task is running

QUESTION 3

Select all the benefits of MapReduce algorithm over others

		MapReduce approach is that it does not require a central data structure
		Allows multiple tasks to run in parallel in mapping, shuffling, and reducing
		Tasks that require iterative processes over data works well with MapReduce

QUESTION 4

Select all correct information about Spark DataFrames.

		A DataFrame is an immutable, distributed collection of data organized into named columns
		A Spark DataFrame carries important metadata that allows Spark to optimize queries
		Spark DataFrames conceptually equivalent to a table in a relational database
		Information stored in a Spark DataFrame automatically saved into the database

QUESTION 5

Actions are statements that are computed AND executed when they are encountered in the developer's code.

True

False

QUESTION 6

What are the five main components of Apache Spark ecosystem?

		Spark Core
		Spark SQL
		Spark Streaming and Structured Streaming
		Machine Learning Library (MLlib)
		Graph Computation (GraphX)
		Spark HDFS
		Spark YARN

QUESTION 7

______ are created by the driver and assigned a partiton of data to process. Then, ______ are assigned to slots for parallel execution.

QUESTION 8

Spark can be configured in three different deployment modes: local, client, and cluster mode.

True

False

QUESTION 9

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster

True

False

QUESTION 10

Apache Spark is a sophisticated distributed computation framework for executing code in parallel across many different machines.

True

False

QUESTION 11

How many driver programs can run in a single Spark Cluster?

QUESTION 12

Select all correct information about Spark Executors.

		The executors are responsible for carrying out the work assigned by the driver
		Execute code assigned by the driver
		Report the state of the computation back to the driver
		Maintaining information about the Spark Application

QUESTION 13

What are the four main components of Hadoop ecosystem?

		Hadoop Distributed File System (HDFS):
		Yet Another Resource Negotiator (YARN)
		Hadoop MapReduce
		Hadoop Common (Hadoop Core)
		Hadoop Analytics
		Hadoop Spark

QUESTION 14

Select all correct information about map reduce algorithm.

		A data set is mapped into a collection of (key value) pairs in the mapping step
		Mapping step produces intermediate results and associates values with an output key
		Shuffling step produces intermediate results and associates values with an output key
		Shuffling step groups intermediate results associated with the same output key
		Reducing step groups intermediate results associated with the same output key
		Reducing step processes groups of intermediate results with the same output key
		Mapping step processes groups of intermediate results with the same output key

QUESTION 15

Select all correct information about Transformations

		Transformations are at the core of how you express your business logic in Spark.
		Transformations has lazy evaluation
		There are 3 types of transformations, narrow, wide, and shuffler.
		Narrow transformations mean that the work happens on the executor without changing the way data is partitioned over the system

QUESTION 16

RDDs uses Catalyst Optimizer to find efficient plan for applying your transformations and actions.

True

False

QUESTION 17

What is the primary difference between Spark and Hadoop MapReduce?

		Spark processes and retains data in memory for subsequent steps, whereas MapReduce processes data on disk.
		Hadoop processes and retains data in memory for subsequent steps, whereas Spark processes data on disk.
		Hadoop brings compute to datasets, whereas Spark brings data during compute.
		Spark brings compute to datasets, whereas Hadoop brings data during compute.