Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Data Mining: I've tried this problem a couple of times and keep getting errors even with other chegg experts help. Can you show how to

Data Mining: I've tried this problem a couple of times and keep getting errors even with other chegg experts help. Can you show how to code this using R studio? Start to Finish (MAKE SURE TO TEST IT IN R yourself! ) Data Mining
The data are taken from Shmueli et al.(2010). The data set consists of 2201 airplane
flights in January 2004 from the Washington DC area into the NYC area. The
characteristic of interest (the response) is whether or not a flight has been delayed by
more than 15min (coded as 0 for no delay, and 1 for delay).
The explanatory variables (predictor) include three different arrival airports (Kennedy,
Newark, and LaGuardia); three different departure airports (Reagan, Dulles, and
Baltimore); eight carriers; a categorical variable for schedule time (morning, evening,
night); weather conditions (0= good ?1= bad); day of week (1 for Sunday and Monday;
and 0 for all other days).
Here the objective is to identify flights that are likely to be delayed.
You will need to do some feature engineering: Use the variable "schedtime" to create a
new variable "sched_time" that indicates whether the schedule was in morning, evening
or night.
Do not use flight number as predictor? Why: Because that's not an informative variable
and would force model remember the outcome based on flight number, and won't work
in the test data.
Use the flight delay data to predict the flight delay status (Ontime vs Delayed).**
Use logistic regression model and one more classification model of your
choosing (Decision tree, Nave Bayes, KNN (pick the easiest one!)).
Interpret the coefficients estimated from the logistic regression model.
Provide model performance metrics for both the modes (Logistic vs the other model ).
Provide interpretation of these model performance metrics.
Which model you would choose and why?
Show the detailed work with all the steps for feature engineering, explanatory data
analysis, model fitting and prediction, and model evaluation. Preferably submit a html
file generated using rmarkdown.
Data Categories in data set that im using are as follows: "schedtime", "carrier", "deptime", "dest", "distance", "date", "flightnumber", "origin", "weather", "dayweek", "daymonth", "tailnu", and "delay"
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2010 Barcelona Spain September 2010 Proceedings Part 2 Lnai 6322

Authors: Jose L. Balcazar ,Francesco Bonchi ,Aristides Gionis ,Michele Sebag

2010th Edition

364215882X, 978-3642158827

More Books

Students also viewed these Databases questions

Question

4-6 Is there a digital divide? If so, why does it matter?

Answered: 1 week ago