Task 1: Load and clean the data 1. [5 pts] Load the data set as a DataFrame in Pandas. It is available at URL
Task 1: Load and clean the data 1. [5 pts] Load the data set as a DataFrame in Pandas. It is available at URL https://raw.githubusercontent.com/asukul/DS201/master/datasets/Lemonade2016-2.csv or https://lidicky.name/DS201/Lemonade2016- S 2.csv 2. Clean the data o [10 pts] Find and remove any duplicate rows in the data. O [10 pts] Find and resolve any missing values in the data, either by interpolating missing values that are in an obvious sequence or by using the average value for the column (rounded to the nearest whole number) into any entries where a value is missing. Task 2: Add Derived Columns (Note: a derived column is a new column whose values depend on other columns' values.) 1. [5 pts] Add a column named sales to DataFrame, which calculates the total sales count of Lemon and orange. 2. [5 pts] Add a column named Revenue to the DataFrame, which calculates the sales revenue by multiplying the total sales count and price. 3. [5 pts] Calculate the total revenue for the summer, and make a note of this value. [ ] # TODO Task 3: Create Charts for Data exploration Create the following 6 plots. 1. [10pts] Create a line chart that shows Date and Revenue. 2. [10pts] Create a scatter-plot chart that shows Leaflets on the X-axis and sales on the Y-axis, add axis title to show which is Leaflets and which is sales, and add title Leaflets and Sales plot to the plot. 3. [10pts] Create a histogram that shows Revenue distributed into 10 bins, give it a title Revenue Histograms of Lemonade Sales and note whether the distribution of this data is normal, left-skewed, or right-skewed. 4. [20pts] Create two Box and Whisker plots in one figure. The first one for two Location (one box for Park, one for Beach) showing Total Sales values. Give it a title All temperatures. o The second one for Location (one box for Park, one for Beach) showing Total Sales values but include ONLY the days where the Temperature was at least 75. Give it a title Temperatures >= 75. Place them side-by-side with common y-axis label. Label the y axis as sales. Label the enitre figure as Sales and Location. Each boxes must be labeled to know which box is for Park, which box is for Beach. 5. [10pts] Create a seaborn pair plot http://seaborn.pydata.org/generated/seaborn.pairplot.html to show relationship of following columns: Revenue, Temperature, Leaflets and group by Location.
Step by Step Solution
3.37 Rating (147 Votes )
There are 3 Steps involved in it
Step: 1
Task 1 Load and clean the data import pandas as pd Load the data url httpsrawgithubusercontentcomasukulDS201masterdatasetsLemonade20162csv Alternative ...See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started