Question
Consider the following 2-week bike rental dataset: The data set contains missing values for the weight attribute (denoted as ? in the table). Compare the
Consider the following 2-week bike rental dataset:
The data set contains missing values for the weight attribute (denoted as ? in the table). Compare the following three approaches for imputing the missing values:
Approach 1: Discard the missing values.
Approach 2: Replace the missing value with the global mean (i.e., average number of rentals for all the non-missing days).
Approach 3: Replace the missing value with the stratified mean. For example, if the missing value is on a weekday, replace it by the average number of rentals for all non-missing weekdays.
(a) What are the imputed values for day 1 and day 12 using approaches 2 and 3 described above?
Solution:
(b) Suppose we are interested in calculating the average number of rentals for all days (weekdays and weekends). Which approach, 2 or 3, will give the same average number of rentals for all days as approach 1?
Solution:
(c) Which of the three approaches is the best approach to deal with the missing value problem shown above. State your reasons clearly.
Solution
(d) Give a scenario in which approach 1 would be the best way to deal with the missing value problem.
Solution:
Weekday/Weekend Weekday Weekday Weekday Weekend Weekend Weekday Weekday Weekday Weekday Weekday Weekend Weekend Number of Rentals 1 130 150 280 200 110 130 120 180 160 240 4 5 6 10 12 Weekday/Weekend Weekday Weekday Weekday Weekend Weekend Weekday Weekday Weekday Weekday Weekday Weekend Weekend Number of Rentals 1 130 150 280 200 110 130 120 180 160 240 4 5 6 10 12Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started