Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This assignment tests your data cleaning skills. You are provided with a dataset in . xlsx format from Returnorama Outfitters ( available here ) .

This assignment tests your data cleaning skills. You are provided with a dataset in .xlsx format from Returnorama Outfitters (available here). The data includes 7 columns that are described below:
Order ID: A label provided to each order placed on the Returnorama Outfitters website.
Number of units ordered: The number of items in the basket for a single order (must be a positive quantity).
Total value of items in the order: The total $ amount for a single order (must be a positive quantity).
Shipping option: The shipping opon chosen by the customer (Standard, Expedited, Premium).
Destination: Is the order to be shipped to a domestic or international address?
Number of items returned: The number of items that the customer returned to Returnorama Outfitters. This number must be less than or equal to the number of units ordered and non-negative.
Past orders: The number of previous orders placed by the same customer (must be a non-negative quantity).
They have had many different people working on data entry and used two different database management systems during this time window. This is cause for worry since the dataset that they want to analyze may contain erroneous entries.
The following questions are of two types. 1) Carry out a step of data cleaning and provided an answer based on what you did, and 2) Carry out some summarization of the data. Please follow the instructions associated with each question and pick the correct answer.
You have two attempts on this assignment and have to answer only incorrect ones on your second attempt.
Question 1(10 points)
Ensure that you convert all columns to their appropriate data types. Which of these columns should be converted to numeric type? Select all that apply.
Question 1 options:
Past orders
Destination
Number of items returned
Total value of items in the order
Shipping option
Number of units ordered
Question 2(10 points)
Look for typos in each column and fix them. How many columns had typos? Only consider columns that are categorical/qualitative for this question.
Question 2 options:
2
3
0
1
Question 3(10 points)
Which of the following columns has missing values? Select all that apply
Question 3 options:
Destination
Shipping option
Number of units ordered
Number of items returned
Question 4(10 points)
How many rows have missing values? Delete these rows after you answer the question
Question 4 options:
3
1
5
0
Question 5(10 points)
Which of the following statements is true?
Question 5 options:
There are four rows/observations that have more items returned than number ordered
There are four orders with negative values in the column titled "Total value of items in the order"
There is one row/observation that has more items returned than number ordered
The "Number of units ordered" column has 3 entries with negative values
Question 6(10 points)
How many duplicate rows are in the dataset at this point? Remove all of them after answering this question.
Question 6 options:
0
1
3
2
For the remaining questions, please use the cleaned data set provided here.
Question 7(10 points)
What is the average number of items returned for orders with a "Domestic" destination?
Question 7 options:
0.13
0.02
0.5
0.2
Question 8(10 points)
High risk orders are defined as those international orders with an average basket value (=total value of items ordered/number of units ordered) greater than 100. How many high risk orders are in the cleaned dataset?
Question 8 options:
3
19
10
86
Question 9(10 points)
Which shipping method is the most commonly used?
Question 9 options:
Standard
Premium
Expedited
All of them are used with the exact same frequency
Question 10(10 points)
For observations with a value of more than 5 in the Past Orders column, what is the highest value in the Number of items returned column?
Question 10 options:
1
2
3
0

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Modern Database Management

Authors: Heikki Topi, Jeffrey A Hoffer, Ramesh Venkataraman

13th Edition

0134773659, 978-0134773650

More Books

Students also viewed these Databases questions

Question

=+c) Calculate the lower control limit of the p chart.

Answered: 1 week ago