Answered step by step
Verified Expert Solution
Question
1 Approved Answer
You are doing initial exploratory analysis in PySpark and one of the sources you need to include resides in a PostgreSQL database. Using the
You are doing initial exploratory analysis in PySpark and one of the sources you need to include resides in a PostgreSQL database. Using the provided notebook, answer the following questions within Google Colaboratory and submit your answers on SUNLearn and Git. 1. Using a PySpark dataframe, print the schema of customer table in the pagila PostgreSQL database by utilising a JDBC connection. (3) 2. Use the Spark SQL API to query the customer table, compute the number of unique email addresses in that table and print the result in the notebook. (3) 3. Repeat this calculation using only the Dataframe API and print the result. (1) 4. How many partitions are present in the dataframe resulting from Question 6.3 (additionally provide the code necessary to determine that). (1) 5. Compute the min and max of customer.create_date and print the result (once more using the Spark DataFrame API and not the Spark SQL API). (1) 6. Determine which first names occur more than once 1. using the Spark SQL API (printing the result), and (1) 2. using the Spark Dataframe API (printing the result once more). (1) 7. Port the PostgreSQL below to the PySpark DataFrame API and execute the query within Spark (not directly on PostgreSQL): (5) 1 SELECT 2 3 4 5 FROM payment 6 staff.first_name .staff.last_name ,SUM (payment. amount) INNER JOIN staff ON payment. staff_id staff.staff_id 7 WHERE payment.payment_date BETWEEN 2007-01-01 AND 2020-02-01 8 GROUP BY 9 staff.last_name 10 .staff.first_name.
Step by Step Solution
★★★★★
3.45 Rating (174 Votes )
There are 3 Steps involved in it
Step: 1
Using a PySpark dataframe print the schema of customer table in the pagila PostgreSQL database by utilising a JpDBC connection Python Import necessary libraries from pysparksql import SparkSession Cre...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started