Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

from pyspark.sql import SparkSessionfrom pyspark.sql . functions import col, round, max, avg# Initialize SparkSessionspark = SparkSession.builder . appName ( ChargePointsETLJob )

from pyspark.sql import SparkSessionfrom pyspark.sql.functions import col, round, max, avg# Initialize SparkSessionspark = SparkSession.builder \.appName("ChargePointsETLJob")\.getOrCreate()# Input and Output pathsinput_path = "input_path/electric-chargepoints-2017.csv"output_path = "output_path/chargepoints_summary.parquet"# Read CSV file into DataFramedf = spark.read \.option("header", "true")\.option("inferSchema", "true")\.csv(input_path)# Clean up column names and select relevant columnscleaned_df = df.select( col("CPID").alias("chargepoint_id"), col("Duratio n").cast("double").alias("duration"))# Calculate max and average duration grouped by chargepoint_idsummary_df = cleaned_df.groupBy("chargepoint_id")\.agg( round(max("duration"),2).alias("max_duration"), round(avg("duration"),2).alias("avg_duration"))# Write summary_df as Parquet filesummary_df.write.mode("overwrite").parquet(output_path)# Stop SparkSessionspark.stop()

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Microsoft Visual Basic 2017 For Windows Web And Database Applications

Authors: Corinne Hoisington

1st Edition

1337102113, 978-1337102117

More Books

Students also viewed these Databases questions

Question

What are the objectives of Human resource planning ?

Answered: 1 week ago

Question

Explain the process of Human Resource Planning.

Answered: 1 week ago