Question
Consider the following schema of a DataFrame (named retailDF), which is created in the PySpark: root |-- InvoiceNo: string (nullable = true) |-- StockCode: string
Consider the following schema of a DataFrame (named retailDF), which is created in the PySpark:
root
|-- InvoiceNo: string (nullable = true)
|-- StockCode: string (nullable = true)
|-- Description: string (nullable = true)
|-- Quantity: integer (nullable = true)
|-- InvoiceDate: timestamp (nullable = true)
|-- UnitPrice: double (nullable = true)
|-- CustomerID: double (nullable = true)
|-- Country: string (nullable = true)
The schema represents online retail sales data. Write code to answer the following :
Use PySparks DataFrames API to compute and show the correlation between quantity and unit price variables. Use PySparks DataFrames API to show top 10 orders with respect to Quantity ordered. Use PySparks DataFrames API to list total number of orders by each country ordered from highest to lowest. Use PySparks SQL API to display data from description, quantity, and country where quantity has value >= 10 and country is Ireland. Use PySparks SQL API to show average value of ordered quantity by each country having average greater than 5.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started