Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You and your team members are analysts working for a small retail chain. At the end of each day, each store provides a CSV file

You and your team members are analysts working for a small retail chain. At the end of each day, each store provides a CSV file where each line is SKU (stock keeping unit ~ a primary key to identify an item of inventory) with three additional variables (i.e. column, field, feature):
Number of units sold that day (SOLD)
Number of units received that day (RECD)
Remaining amount of inventory in the store (RMNG)
For example (store1.csv):
SKU SOLD RECD RMNG
1001452
1002531
1003222
Is this data in the WIDE or LONG format?
Because the store personnel are not experienced data analysts, it make sense to request data from them in this format (it is also easier for them to check). However, as a data analyst, you know that you need to transform the data into long format to prepare to store the archive the data in the enterprise data warehouse (DW -- which is stored in a relational database management system, RDBMS, that is accessed using structured query language, SQL).
In the in directory, notice the three files:
store1.csv
store2.csv
store3.csv
Each member of the team is responsible for reading one file (if there are only two members of the team, then leave the third file, store3.csv, untouched).
REQUIRED
By the end of this assignment, the remote repo should have:
A directory called src containing the Python script load.py
A directory called out containing the file long.csv
The format for long.csv will be like:
SKU STORE VARIABLE VALUE
1001 store1 SOLD 4
1001 store1 RECD 5
1001 store1 RMNG 2
1002 store1 SOLD 5
1002 store1 RECD 3
1002 store1 RMNG 1
1003 store1 SOLD 2
1003 store1 RECD 2
1003 store1 RMNG 2
Note that the above only contains data from store1.csv and your completed long.csv should also include data from store2.csv (and also store3.csv if your group has three members). Data from these CSV files will have the value store2(or store3) in the STORE column variable for the appropriate VARIABLE and VALUE.
IMPORTANT: The order of the rows in long.csv does NOT matter. What matters is that all rows must be present (in any order).
The next sections provide details on the tasks each member of the team should perform.
EACH member of the team should create their own branch, perform all of their work in their own branch, and (when done with their part) issue a pull request (to merge their changes back into the main branch). TEAM MEMBERS SHOULD NOT CLOSE THEIR OWN PULL REQUESTS.
Instead, EACH TEAM MEMBER SHOULD CLOSE THE PULL REQUEST OF A DIFFERENT TEAM MEMBER. On two person teams, this will be quite easy. Teams with three persons will need a little coordination to ensure that each member will close/merge a different person's pull request.
PLEASE DO NOT DELETE BRANCHES (On large, active remote repositories, this makes sense, but we are not in that case.)
First Member
The first member of the team should (in their own branch):
Create the directory src and create the Python script load.py in src (you may create both src and load.py manually).
If there are only two members, then create the DOCSTRING comment. Othewise if there are three members, please allow the third team member to create the DOCSTRING comment.
Write Python code in load.py to read the first CSV file, store1.csv and transform it to long format
HINT
The first team member should simply print-out the results to the console.
The first team member should issue the pull request.
Second Member
The second member of the team should accept the pull request and merge the first team member's changes into the main branch.
The second team member should (in their own branch):
make sure that the data from store1.csv are stored (in a dataframe)
read store2.csv and transform to long format (OK to reuse code from the first member)
create the output directory out
write the long.csv file into the out directory in long format -- the rows should be from store1.csv and store2.csv.
if there is no third member, the second member should be sure to add at least one single-line comment (that begins with #).
The second team member should issue a pull request.
If there are only two team members, the first member should accept the pull request and merget the changes back into the main branch.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Practical Issues In Database Management A Refernce For The Thinking Practitioner

Authors: Fabian Pascal

1st Edition

0201485559, 978-0201485554

Students also viewed these Databases questions

Question

\(P(z

Answered: 1 week ago

Question

L03 Identify the major divisions and functions of the spinal cord.

Answered: 1 week ago