Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Overall Objective of Group Assignments The objective of group assignments is to provide you with an opportunity to apply data analysis and problem - solving

Overall Objective of Group Assignments
The objective of group assignments is to provide you with an opportunity to apply data analysis and
problem-solving skills in a decision environment of realistic scale and complexity. Your task is to work
with your group to accomplish the task within the short timeframe provided.
Scenario
In the last assignment, you used transaction data from the Steel
Wheels order entry system to answer questions about
customers, products, and changes over time. Based on the
amount of effort required to create these queries against the
operational database, you decided to hire a consultant to
implement a data warehouse. The consultant developed an ETL
(extract, transform, load) process that periodically extracts
normalized data from the order entry system and loads it as
dimensional data in the data warehouse. You are now in a position to use OLAP tools to conduct ad hoc
analyses of the firms data.
You are considering two OLAP tools at this point. MICROSOFT EXCEL and TABLEAU.
1. EXCEL is simple but it is not easy to connect EXCEL to your MYSQL data warehouse1
. You decide
instead to run an extract from the data warehouse and use the extract as the data source for
your EXCEL analysis. The obvious downside of this is the need to run the extract periodically and
make sure everyone on your team is using the same version of the extracted data!
2. TABLEAU can connect directly to the data warehouse as long as you are behind the corporate
firewall. If you are not, you must use the special TABLEAU extract instead.
Create the Extract from the Data Warehouse
In the previous assignment, you wrote queries against the operational database on the server
(steelwheels). In this assignment, you use the new data warehouse (steelwheelsDW). As a first step,
1
Your consultant tells you that you need to install the MYSQL ODBC driver on each computer that accesses the
data warehouse. In addition, you need to create an ODBC connection. This is a headache on WINDOWS and a
nightmare on MAC.
2
you will export data from the data warehouse to a CSV file and use the extract as the basis for pivot
tables in EXCEL.
The granularity of the desired analysis has not yet been determined. For example, you have not yet
decided whether you should look at daily sales, monthly sales, or yearly sales. Accordingly, your extract
should be at the finest level of granularity supported by the data warehouse. You can always aggregate
to a coarser level of granularity in your OLAP tool. For example, you can use a pivot table to sum daily
sales to get weekly or yearly sales. However, you cannot un-sum measures that have already been
aggregated in the extract. You cannot use this data to look for hourly sales trends because the
underlying data does not support that level of granularity.
1. Create an extract from the data warehouse containing all the data you might need for analysis.
2. Question: Describe (in words) the finest level of granularity of the data. What does each row in
the extracted data correspond to in the real world? To put it differently: Each row is the total
sales of what?
Hints:
You create an extract by writing an SQL query against the data warehouse and exporting the
results. The SELECT part of the extract query should include every column that has potential
business meaning. The query should exclude foreign keys and other plumbing fields that will
clutter up the pivot table dialog. For example:
SELECT
P.PRODUCTNAME, P.PRODUCTLINE, P.PRODUCTVENDOR,
C.CUSTOMERNAME, C.CITY, C.STATE, C.COUNTRY,
C.SALESREP, C.SALESOFFICE, C.SALESTERRITORY,
T.ORDERDATE, T.ORDERDAY, T.ORDERMONTH, T.QUARTER, T.YEAR_ID,
S.TOTALSALE, S.QUANTITYORDERED
The FROM portion of the SQL statement should use the star schema. That is, it should join the
fact table to the dimension tables:
FROM factSale S
INNER JOIN dimProduct P ON S.PRODUCTCODE = P.PRODUCTCODE
INNER JOIN dimCustomer C ON S.CUSTOMERNUMBER = C.CUSTOMERNUMBER
INNER JOIN dimTime T ON S.TIMEID = T.TIMEID
Remember to export the results of the query (see below), not the tables or other database
objects.
3
Remember to save the CSV file as an EXCEL workbook. Otherwise you will lose your pages and
pivots!
Replicate SQL Queries in a Pivot Table
You can use a pivot table in EXCEL to re-answer some2
of the questions you answered in the previous
assignment using SQL queries.
3. Question: Which companies are your all-time top-ten best customers? Cut and paste the pivot
table results into your submission document.
4. Question: What is the total number of items sold in 2003 and 2004 for each SKU? Create a new
pivot table within the EXCEL workbook and cut and paste the first few entries from the pivot
table into your submission document. Leave the rows in alphabetical order by product name.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing Fundamentals Design And Implementation

Authors: KROENKE DAVID M.

1st Edition

8120322258, 978-8120322257

More Books

Students also viewed these Databases questions