Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

MISM 6205: Data Wrangling for Business Fall 2022 Data Enrichment Assignment Instructions Save a copy of this document and answer the below questions in this

MISM 6205: Data Wrangling for Business

Fall 2022

Data Enrichment Assignment

Instructions

Save a copy of this document and answer the below questions in this document.

You may use both Python and Excel for this assignment. Clearly state how you performed each task in the data wrangling process.

Upload this completed document and attach any data and Excel/Python files in Canvas by the due date.

Scoring for assignments will be based not only on the completion of your answers, but also on the organization, clarity, and quality of your written answers and explanations for each question.

You must use the following two datasets for this assignment. The corresponding datasets are attached in Canvas.

Dataset 1 (It is the same dataset as in the data profiling and preprocessing assignments.)

Data:

Boston Buildings Inventory

Source:

https://data.boston.gov/dataset/boston-buildings-inventory

Dataset 2

Data:

Boston Snow Emergency Parking Lots

Source:

https://data.boston.gov/dataset/snow-emergency-parking-lots

Suppose guests are staying in Boston during the winter and snow emergencies may be likely. The City of Boston would like to send out a list of nearby snow emergency parking lots to various establishments. Assume that stranded guests may need to utilize the snow emergency parking lots when hotel parking lots become full. Lets take a subset of the buildings inventory dataset and focus on hotels. We also make the following assumptions for this assignment.

Data profiling has been completed for both datasets. (You can run it as well!)

Dates from both datasets are aligned.

Part 1: Data Enrichment (maximum 50 points)

Clean and transform the data to get a new dataset for hotels and their nearby snow emergency parking lots. In your output, each observation represents one individual hotel. You can select the most relevant columns (i.e., the new subset does not need to include all the columns).

Part 2: Your Data Enrichment Process (maximum 25 points)

List all your final data enrichment steps and the main functions used to achieve your output. Your explanations of the steps should (at minimum) include:

How you selected the corresponding parking lots for each hotel.

Which columns you kept for the output dataset.

Data formatting and data preprocessing tasks.

Optional: Draw a process diagram.

Part 3: Data Validation (maximum 25 points)

Develop at least five validation rules or checks for your data wrangling process. Your explanations of the rules should (at minimum) include:

What is the purpose of the validation rule or check? You should include the corresponding root condition(s) and data quality pattern(s).

What is the validation rule type (e.g., business or system)?

What is the specific process step and task for the rule?

Provide an example of a specific validation for each rule.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Accounting questions