Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Data Mining: I've tried this problem a couple of times and keep getting errors. Can you show how to code this using R studio? Start

Data Mining: I've tried this problem a couple of times and keep getting errors. Can you show how to code this using R studio? Start to FinishData Mining
The data are taken from Shmueli et al.(2010). The data set consists of 2201 airplane
flights in January 2004 from the Washington DC area into the NYC area. The
characteristic of interest (the response) is whether or not a flight has been delayed by
more than 15min (coded as 0 for no delay, and 1 for delay).
The explanatory variables (predictor) include three different arrival airports (Kennedy,
Newark, and LaGuardia); three different departure airports (Reagan, Dulles, and
Baltimore); eight carriers; a categorical variable for schedule time (morning, evening,
night); weather conditions (0= good ?1= bad); day of week (1 for Sunday and Monday;
and 0 for all other days).
Here the objective is to identify flights that are likely to be delayed.
You will need to do some feature engineering: Use the variable "schedtime" to create a
new variable "sched_time" that indicates whether the schedule was in morning, evening
or night.
Do not use flight number as predictor? Why: Because that's not an informative variable
and would force model remember the outcome based on flight number, and won't work
in the test data.
Use the flight delay data to predict the flight delay status (Ontime vs Delayed).
Use logistic regression model and one more classification model of your
choosing (Decision tree, Nave Bayes, KNN).
Interpret the coefficients estimated from the logistic regression model.
Provide model performance metrics for both the modes (Logistic vs the other
model of your choosing).
Provide interpretation of these model performance metrics.
Which model you would choose and why?
Show the detailed work with all the steps for feature engineering, explanatory data
analysis, model fitting and prediction, and model evaluation. Preferably submit a html
file generated using rmarkdown.
Data Categories that im using are as follows: "schedtime", "carrier", "deptime", "dest", "distance", "date", "flightnumber", "origin", "weather", "dayweek", "daymonth", "tailnu", and "delay"
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems For Advanced Applications 17th International Conference Dasfaa 2012 Busan South Korea April 2012 Proceedings Part 1 Lncs 7238

Authors: Sang-goo Lee ,Zhiyong Peng ,Xiaofang Zhou ,Yang-Sae Moon ,Rainer Unland ,Jaesoo Yoo

2012 Edition

364229037X, 978-3642290374

More Books

Students also viewed these Databases questions

Question

5. Understand how cultural values influence conflict behavior.

Answered: 1 week ago

Question

8. Explain the relationship between communication and context.

Answered: 1 week ago