Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

SIT 2 2 0 / 7 3 1 2 0 2 3 . T 3 : Task 4 P Working with pandas Data Frames (

SIT220/7312023.T3: Task 4P Working with pandas Data Frames (Heterogeneous Data)
1 Introduction
This task is related to Module 4(see the Learning Resources on the unit site; see also Chapters 10,11,12,16 of Minimalist Data Wrangling with Python).
This task is due on Week 11(Friday). However, ideally, you should complete this task by the end of Week 8. Hence, start tackling it as early as possible. If we find your first solution incomplete or otherwise incorrect, you will still be able to amend it based on the generous feedback we will give you (allow 35 working days). In case of any problems/questions, do hot hesitate to attend our on-campus/online classes or use the Discussion Board on the unit site.
Submitting after the aforementioned due date might incur a late penalty. The cut-off date is Week 12(Friday). There will be no extensions (this is a Week 8 task, after all) and no solutions will be accepted thereafter. At that time, if your submission is not 100% complete, it will be marked as FAIL, without the possibility of correcting and resubmitting. This task is part of the hurdle requirements in this unit. Not submitting the correct version on time results in failing the unit.
A good data engineer must have fine time management skills. To ensure a fair environment for all, we are always very strict about deadlines. Moreover, all submissions will be checked for plagiarism (my PhD student developed a quite advanced code similarity detection tool, which I will be using from time to time: beware). You are expected to work independently on your task solutions. Never share/show parts of solutions with/to anyone. Luckily, 95% of the students know how to do the right thing. If you are one of them, you are the best and do not worry; thank you.
1
2 Task
Download the nycflights13_weather.csv.gz data file from our unit site (Learning Resources -> Data). It gives the hourly meteorological data for three airports in New York: LGA, JFK, and EWR for the whole year of 2013. The columns are:
origin weather station: LGA, JFK, or EWR,
year, month, day, hour time of recording,
temp, dewp temperature and dew point in degrees Fahrenheit,
humid relative humidity,
wind_dir, wind_speed, wind_gust wind direction (in degrees), speed and gust speed (in mph),
precip precipitation, in inches,
pressure sea level pressure in millibars,
visib visibility in miles,
time_hour date and hour (based on the year, month, day, hour fields) formatted as YYYY-mm-
dd HH:MM:SS (actually, YYYY-mm-dd HH:00:00). However, due to a bug in the dataset, the data in this column are (incorrectly!) shifted by 1 hour. Do not rely on it unless you manually correct it.
Then, create a single Jupyter/IPython notebook (see the Artefacts section below for all the requirements), where you perform what follows.
1. Convert all columns so that they use metric (International System of Units, SI) or derived units: temp and dewp to Celsius, precip to millimetres, visib to metres, as well as wind_speed and wind_gust to metres per second. Replace the data in-place (overwrite existing columns with new ones).
2. ComputedailymeanwindspeedsfortheLGAairport(~365totalspeedvalues,foreachdaysepa- rately; you can, for example, group the data by year, month, and day at the same time).
3. PresentthedailymeanwindspeedsatLGA(~365aforementioneddatapoints)inasingleplot,e.g., using the matplotlib.pyplot.plot function. The x-axis labels should be human-readable and intuitive (e.g., month names or dates). Reference result:
10
8
6
4
2
2013-012013-03
2013-052013-07 day
2013-092013-11
2014-01
daily average wind speed [m/s] at LGA
4. IdentifythetenwindiestdaysatLGA(datesandthecorrespondingmeandailywindspeeds).2
Reference result:
## wind_speed
## date
## 2013-11-2411.32
## 2013-01-3110.72
## 2013-02-1710.01
## 2013-02-219.19
## 2013-02-189.17
## 2013-03-149.11
## 2013-11-288.94
## 2013-05-268.85
## 2013-05-258.77
## 2013-02-208.66
Important. All packages must be imported and data must be loaded at the beginning of the file (only once!).3 Additional Tasks for Postgraduate (SIT731) Students (*)
Postgraduate students, apart from the above tasks, are additionally required to solve/address/discuss what follows. Integrate these new requirements into the above subtasks (do not create a separate section of the report).
1. Computethemonthlymeanwindspeedsforallthethreeairports.
There is one obvious outlier amongst the observed wind speeds. Locate it (programmatically, do not hardcode the date/day/row number) and replace it with np.nan (NaN) before computing the means.
2. Drawthemonthlymeanwindspeedsforthethreeairportsonthesameplot(threecurvesofdif- ferent colours). Add a readable legend. Reference result:
LGA EWR JFK
6.05.55.04.54.03.5
2013-012013-03
2013-052013-07 month
2013-092013-11
monthl

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Microsoft Office 365 For Beginners 2022 8 In 1

Authors: James Holler

1st Edition

B0B2WRC1RX, 979-8833565759

More Books

Students also viewed these Databases questions