Question
Q2 Based on your graphical and numerical analysis in Q1, which method -- 1.5(IQR) or 3(SD) -- is more appropriate to remove outliers from the
Q2 Based on your graphical and numerical analysis in Q1, which method -- 1.5(IQR) or 3(SD) -- is more appropriate to remove outliers from the `departure delay` variable? Remove outliers for `departure delay` with the appropriate method. Store this new dataset as `no_out_dd`. You'll want to use this new dataset without outliers for use in Q3. What proportion of rows remains following the removal of these outliers? Store this number as Q2. Do not hardcode the answer. * Note: A boxplot of departure delays in new dataset will still reveal outliers, based on the new five-number summary. For the purpose of this assignment, we will retain these "new" outliers in our dataframe. To completely remove all outliers, we would need to repeat the outlier removal process multiple times. - Your answer should be a number assigned to Q2. Do not round.
IQR_delay <- IQR(flights$dep_delay, na.rm = TRUE) lower_bound <- quantile(flights$dep_delay, 0.25, na.rm = TRUE) - 1.5 * IQR_delay upper_bound <- quantile(flights$dep_delay, 0.75, na.rm = TRUE) + 1.5 * IQR_delay sd(flights$dep_delay) no_out_dd <- flights |> filter(dep_delay >= lower_bound & dep_delay <= upper_bound) Q2 <- nrow(no_out_dd) %/% nrow(flights)
This was the code I came up with but it's not producing the correct answer. Can you help me figure out what the issue is?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Lets break down your problem and identify where the error might be in your procedure It ...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started