Question
Can you write the code in python to load these csv files and answer the questions. Not sure if we have to combine some of
Can you write the code in python to load these csv files and answer the questions. Not sure if we have to combine some of the dataframes to answer some questions. Have provided the column descriptions to aid. When trying on my own I can't seem to merge the columns without memory error, not sure if I need to do that in the first place.
The data is available at:
https://github.com/Mcompetitions/M5-methods
https://drive.google.com/drive/folders/1wxz-TAfVE7uKGCjh405eCb2Q_pG3kAm9
sales_train_validation.csv
sell_prices.csv
calendar.csv
File 1: "calendar.csv"
Contains information about the dates the products are sold.
date: The date in a "y-m-d" format.
wm_yr_wk: The id of the week the date belongs to.
weekday: The type of the day (Saturday, Sunday, ..., Friday).
wday: The id of the weekday, starting from Saturday.
month: The month of the date.
year: The year of the date.
event_name_1: If the date includes an event, the name of this event.
event_type_1: If the date includes an event, the type of this event.
event_name_2: If the date includes a second event, the name of this event.
event_type_2: If the date includes a second event, the type of this event.
snap_CA, snap_TX, and snap_WI: A binary variable (0 or 1) indicating whether the stores of CA,TX or WI allow SNAP purchases on the examined date. 1 indicates that SNAP purchases are allowed.
File 2: "sell_prices.csv"
Contains information about the price of the products sold per store and date.
- store_id: The id of the store where the product is sold.
- item_id: The id of the product.
- wm_yr_wk: The id of the week.
- sell_price: The price of the product for the given week/store. The price is provided per week
(average across seven days). If not available, this means that the product was not sold during the
examined week. Note that although prices are constant at weekly basis, they may change through
time (both training and test set).
File 3: "sales_train.csv"
Contains the historical daily unit sales data per product and store.
- item_id: The id of the product.
- dept_id: The id of the department the product belongs to.
- cat_id: The id of the category the product belongs to.
- store_id: The id of the store where the product is sold.
- state_id: The State where the store is located.
- d_1, d_2, ..., d_i, ... d_1941: The number of units sold at day i, starting from 2011-01-29.
What our columns actually look like:
Index(['date', 'wm_yr_wk', 'weekday', 'wday', 'month', 'year', 'event_name_1', 'event_type_1', 'event_name_2', 'event_type_2', 'snap_CA', 'snap_TX', 'snap_WI'], dtype='object')
Index(['store_id', 'item_id', 'wm_yr_wk', 'sell_price'], dtype='object')
Index(['item_id', 'dept_id', 'cat_id', 'store_id', 'state_id', 'd_1', 'd_2', 'd_3', 'd_4', 'd_5', ... 'd_1904', 'd_1905', 'd_1906', 'd_1907', 'd_1908', 'd_1909', 'd_1910', 'd_1911', 'd_1912', 'd_1913'], dtype='object', length=1918)
Submission
Task A - Exploratory Data Analysis - please include the code that answers the following questions in your Python notebook
- Which state has the highest sales?
- Which department has the highest sales?
- Which department has the highest number of products?
- Which department has the highest mean price?
- Which is the best performing store?
- Which month had the highest sales?
- Which weekday do people prefer to grocery shopping in different states? 3 answers
- Which holiday or event recorded the highest sales?
Step by Step Solution
3.40 Rating (156 Votes )
There are 3 Steps involved in it
Step: 1
To answer the questions using Python well need to load the provided CSV files perform necessary data manipulations and then analyze the data to derive ...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started