Answered step by step
Verified Expert Solution
Question
1 Approved Answer
SIT 2 2 0 / 7 3 1 2 0 2 3 . T 3 : Task 4 P Working with pandas Data Frames (
SITT: Task P Working with pandas Data Frames Heterogeneous Data
Introduction
This task is related to Module see the Learning Resources on the unit site; see also Chapters of Minimalist Data Wrangling with Python
This task is due on Week Friday However, ideally, you should complete this task by the end of Week Hence, start tackling it as early as possible. If we find your first solution incomplete or otherwise incorrect, you will still be able to amend it based on the generous feedback we will give you allow working days In case of any problemsquestions do hot hesitate to attend our oncampusonline classes or use the Discussion Board on the unit site.
Submitting after the aforementioned due date might incur a late penalty. The cutoff date is Week Friday There will be no extensions this is a Week task, after all and no solutions will be accepted thereafter. At that time, if your submission is not complete, it will be marked as FAIL, without the possibility of correcting and resubmitting. This task is part of the hurdle requirements in this unit. Not submitting the correct version on time results in failing the unit.
A good data engineer must have fine time management skills. To ensure a fair environment for all, we are always very strict about deadlines. Moreover, all submissions will be checked for plagiarism my PhD student developed a quite advanced code similarity detection tool, which I will be using from time to time: beware You are expected to work independently on your task solutions. Never shareshow parts of solutions withto anyone. Luckily, of the students know how to do the right thing. If you are one of them, you are the best and do not worry; thank you.
Task
Download the nycflightsweather.csvgz data file from our unit site Learning Resources Data It gives the hourly meteorological data for three airports in New York: LGA, JFK and EWR for the whole year of The columns are:
origin weather station: LGA, JFK or EWR,
year, month, day, hour time of recording,
temp, dewp temperature and dew point in degrees Fahrenheit,
humid relative humidity,
winddir, windspeed, windgust wind direction in degrees speed and gust speed in mph
precip precipitation, in inches,
pressure sea level pressure in millibars,
visib visibility in miles,
timehour date and hour based on the year, month, day, hour fields formatted as YYYYmm
dd HH:MM:SS actually YYYYmmdd HH:: However, due to a bug in the dataset, the data in this column are incorrectly shifted by hour. Do not rely on it unless you manually correct it
Then, create a single JupyterIPython notebook see the Artefacts section below for all the requirements where you perform what follows.
Convert all columns so that they use metric International System of Units, SI or derived units: temp and dewp to Celsius, precip to millimetres, visib to metres, as well as windspeed and windgust to metres per second. Replace the data inplace overwrite existing columns with new ones
ComputedailymeanwindspeedsfortheLGAairport~totalspeedvalues,foreachdaysepa rately; you can, for example, group the data by year, month, and day at the same time
PresentthedailymeanwindspeedsatLGA~aforementioneddatapointsinasingleploteg using the matplotlib.pyplot.plot function. The xaxis labels should be humanreadable and intuitive eg month names or dates Reference result:
day
daily average wind speed ms at LGA
IdentifythetenwindiestdaysatLGAdatesandthecorrespondingmeandailywindspeeds
Reference result:
## windspeed
## date
##
##
##
##
##
##
##
##
##
##
Important. All packages must be imported and data must be loaded at the beginning of the file only once! Additional Tasks for Postgraduate SIT Students
Postgraduate students, apart from the above tasks, are additionally required to solveaddressdiscuss what follows. Integrate these new requirements into the above subtasks do not create a separate section of the report
Computethemonthlymeanwindspeedsforallthethreeairports.
There is one obvious outlier amongst the observed wind speeds. Locate it programmatically do not hardcode the datedayrow number and replace it with npnan NaN before computing the means.
Drawthemonthlymeanwindspeedsforthethreeairportsonthesameplotthreecurvesofdif ferent colours Add a readable legend. Reference result:
LGA EWR JFK
month
monthl
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started