Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The file FlightDelays.csv contains information on all commercial flights departed the Washington, D.C., area and arrived at New York during January 2004. For each flight

The file FlightDelays.csv contains information on all commercial flights departed the Washington, D.C., area and arrived at New York during January 2004. For each flight there is information on the departure and arrival airports, the distance of the route, the scheduled time and date of the flight, and so on. The variable that we are trying to predict is whether or not a flight is delayed. A delay is defined as an arrival that is at least 15 minutes later than scheduled.

Data:

https://docs.google.com/spreadsheets/d/1nhpseMWC8CidxL-7iZ3R2FuWzppHf7Su/edit#gid=140134711

This assignment has three phases:

A. Data Preprocessing

1. Data Reduction: Reduce the number of predictors using the necessary operation (domain knowledge, correlation matrix, etc.). Store the result of this step in a new file "FlightDelaysTrainingData.csv"

2. Data Exploration stage: From the "FlightDelaysTrainingData.csv" file extract four data summarizations using four different Pivot tables to highlight different facts about the dataset (example: number of delayed flights per CARRIER, etc.).

3. Data Conversion: As some of the algorithms don't comply with numerical data. The non-numerical data in the database is required to be converted. You need to provide a reference table to the transformed data.

B. Model Building with Weka Use the "FlightDelaysTrainingData.csv" data file build models based on:

1. Nave Bayes (NB) Model.

2. Classification and Regression Tree (CART):

3. Logistic Regression.

Ensue that the data is split into training and validation using the standard 60%-40% split.

C. Classify Testing Data with Weka

Make up five new records (instances) of data and store them in a new file "FlightDelaysTestingData.csv". Use the models developed in section B to classify the data.

Submissions:

I. Report;

1. Discuss and explain why such a predictor was removed/will be used in model building

2. Provide a reference table to the transformed data

3. Compare the results of the above built models and recommend an algorithm to be used for future prediction.

4. Use the best model to classify the data in "FlightDelaysTestingData.csv" file.

II. Excel files

Submit all the excel files you have used in this project:

1) FlightDelaysTrainingData.csv

2) FlightDelaysTestingData.csv that shows the classified data

III. Weka based model files

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Theory Of Distributions

Authors: Svetlin G Georgiev

1st Edition

3319195271, 9783319195278

More Books

Students also viewed these Mathematics questions

Question

=+a) Is this an experiment or observational study? Explain.

Answered: 1 week ago