Question
MISM 6205: Data Wrangling for Business Fall 2022 Data Enrichment Assignment Instructions Save a copy of this document and answer the below questions in this
MISM 6205: Data Wrangling for Business
Fall 2022
Data Enrichment Assignment
Instructions
Save a copy of this document and answer the below questions in this document.
You may use both Python and Excel for this assignment. Clearly state how you performed each task in the data wrangling process.
Upload this completed document and attach any data and Excel/Python files in Canvas by the due date.
Scoring for assignments will be based not only on the completion of your answers, but also on the organization, clarity, and quality of your written answers and explanations for each question.
You must use the following two datasets for this assignment. The corresponding datasets are attached in Canvas.
Dataset 1 (It is the same dataset as in the data profiling and preprocessing assignments.)
Data: | Boston Buildings Inventory |
Source: | https://data.boston.gov/dataset/boston-buildings-inventory |
Dataset 2
Data: | Boston Snow Emergency Parking Lots |
Source: | https://data.boston.gov/dataset/snow-emergency-parking-lots |
Suppose guests are staying in Boston during the winter and snow emergencies may be likely. The City of Boston would like to send out a list of nearby snow emergency parking lots to various establishments. Assume that stranded guests may need to utilize the snow emergency parking lots when hotel parking lots become full. Lets take a subset of the buildings inventory dataset and focus on hotels. We also make the following assumptions for this assignment.
Data profiling has been completed for both datasets. (You can run it as well!)
Dates from both datasets are aligned.
Part 1: Data Enrichment (maximum 50 points)
Clean and transform the data to get a new dataset for hotels and their nearby snow emergency parking lots. In your output, each observation represents one individual hotel. You can select the most relevant columns (i.e., the new subset does not need to include all the columns).
Part 2: Your Data Enrichment Process (maximum 25 points)
List all your final data enrichment steps and the main functions used to achieve your output. Your explanations of the steps should (at minimum) include:
How you selected the corresponding parking lots for each hotel.
Which columns you kept for the output dataset.
Data formatting and data preprocessing tasks.
Optional: Draw a process diagram.
Part 3: Data Validation (maximum 25 points)
Develop at least five validation rules or checks for your data wrangling process. Your explanations of the rules should (at minimum) include:
What is the purpose of the validation rule or check? You should include the corresponding root condition(s) and data quality pattern(s).
What is the validation rule type (e.g., business or system)?
What is the specific process step and task for the rule?
Provide an example of a specific validation for each rule.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started