Question
Predicting Delayed Flights (Boosting). The file FlightDelays.csv contains information on all commercial flights departing the Washington, DC area and arriving at New York during January
Predicting Delayed Flights (Boosting). The file FlightDelays.csv contains information
on all commercial flights departing the Washington, DC area and arriving at New
York during January 2004. For each flight there is information on the departure and
arrival airports, the distance of the route, the scheduled time and date of the flight, and
so on. The variable that we are trying to predict is whether or not a flight is delayed.
A delay is defined as an arrival that is at least 15 minutes later than scheduled
Data Preprocessing. Transform variable day of week info a categorical variable. Bin
the scheduled departure time into eight bins (in R use function cut()). Partition the
data into training and validation sets.
Run a boosted classification tree for delay. Leave the default number of weak
learners, and select resampling. Set maximum levels to display at 6, and minimum
number of records in a terminal node to 1.
a. Compared with the single tree, how does the boosted tree behave in terms of overall
accuracy?
b. Compared with the single tree, how does the boosted tree behave in terms of
accuracy in identifying delayed flights?
c. Explain why this model might have the best performance over the other models
you fit.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started