Question
The file FlightDelays.csv contains information on all commercial flights departed the Washington, D.C., area and arrived at New York during January 2004. For each flight
The file FlightDelays.csv contains information on all commercial flights departed the Washington, D.C., area and arrived at New York during January 2004. For each flight there is information on the departure and arrival airports, the distance of the route, the scheduled time and date of the flight, and so on. The variable that we are trying to predict is whether or not a flight is delayed. A delay is defined as an arrival that is at least 15 minutes later than scheduled.
Data:
https://docs.google.com/spreadsheets/d/1nhpseMWC8CidxL-7iZ3R2FuWzppHf7Su/edit#gid=140134711
This assignment has three phases:
A. Data Preprocessing
1. Data Reduction: Reduce the number of predictors using the necessary operation (domain knowledge, correlation matrix, etc.). Store the result of this step in a new file "FlightDelaysTrainingData.csv"
2. Data Exploration stage: From the "FlightDelaysTrainingData.csv" file extract four data summarizations using four different Pivot tables to highlight different facts about the dataset (example: number of delayed flights per CARRIER, etc.).
3. Data Conversion: As some of the algorithms don't comply with numerical data. The non-numerical data in the database is required to be converted. You need to provide a reference table to the transformed data.
B. Model Building with Weka Use the "FlightDelaysTrainingData.csv" data file build models based on:
1. Nave Bayes (NB) Model.
2. Classification and Regression Tree (CART):
3. Logistic Regression.
Ensue that the data is split into training and validation using the standard 60%-40% split.
C. Classify Testing Data with Weka
Make up five new records (instances) of data and store them in a new file "FlightDelaysTestingData.csv". Use the models developed in section B to classify the data.
Submissions:
I. Report;
1. Discuss and explain why such a predictor was removed/will be used in model building
2. Provide a reference table to the transformed data
3. Compare the results of the above built models and recommend an algorithm to be used for future prediction.
4. Use the best model to classify the data in "FlightDelaysTestingData.csv" file.
II. Excel files
Submit all the excel files you have used in this project:
1) FlightDelaysTrainingData.csv
2) FlightDelaysTestingData.csv that shows the classified data
III. Weka based model files
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started