Question
Question: Exercise 3. Using DataFrames, nd the full name of every country in Oceania (continent OC). Show the rst 10 country names in ascending alphabetical
Question:
Exercise 3. Using DataFrames, nd the full name of every country in Oceania (continent OC).
Show the rst 10 country names in ascending alphabetical order.
1) Which of the following set of dataframe functions best answers exercise 3 of lab 6?
a. filter, orderBy, show(10)
b. filter, select, orderBy, show(10)
c. select, orderBy, show(10)
d. filter, select, join, orderBy, show(10)
Exercise 4. Using the fridgeDF DataFrame as input, calculate the average refrigerator eciency
for each brand. Order the results in descending order of average eciency and show the rst 5 rows.
Hint: RelationalGroupedDataset has a method called avg() for calculating per-group averages. It
works similarly to count(), except you must pass avg() the names of the columns to be averaged
2) Which of the following set of dataframe functions best answers exercise 4 of lab 6?
a. groupBy, avg, orderBy, show(5)
b. avg, orderBy, show(5)
c. groupBy, orderBy, show(5)
d. filter, groupBy, orderBy, show(5)
3) Which of the following statements about parquet storage format is false?
a. Parquet storage format stores the schema with the data.
b. Given a dataframe with 100 columns. It is faster to query a single column of the dataframe if the data is stored using the CSV storage format compared to parquet storage format.
c. Given a dataframe with 100 columns. It is faster to query a single column of the dataframe if the data is stored using the parquet storage format compared to it being stored in a CSV storage format.
d. Parquet storage format stores all values of the same column together.
4) Which of the following statements is false?
a. DataSets contain schemas whereas DataFrames do not contain schemas.
b. You can add columns to a dataframe using the withColumn function.
c. After performing a self-join on a dataframe the resulting columns will contain duplicate column names.
d. Executing queries using SparkSQL Dataframes and DataSets functions are at least as fast as using their RDD counterparts, often faster.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started