Answered step by step
Verified Expert Solution
Link Copied!

Question

00
1 Approved Answer

In this assignment, we will build and evaluate a spam filter using a dataset that contains some columns indicating the most common words in an

In this assignment, we will build and evaluate a spam filter using a dataset that contains some columns indicating the most common words in an email (frequency of given words and characters), and a label column indicating if the email was spam or not. Please answer the following questions based on your implemented code (implementation in Matlab):

a) Draw a bar chart to view of the distribution of spam and non-spam email samples in the dataset. How many emails are in the dataset? How many of the emails are spam?

b) Divide the dataset into training and test sets, since this is a binary classification problem, use a Logistic regression or Random Forest algorithm to build a model that can tell whether an email is spam or not.

c) Build the confusion matrix and calculate precision and recall metrics to evaluate the performance of your model.

d) Take another look at the distribution of sample emails (i.e. part a). Are there any imbalances in the distribution? If yes, oversample the minority class using SMOTE algorithm and retrain your model.

e) Rebuild the confusion matrix and compare it with your initial matrix. What are the differences between these models? Does SMOTE work well? Explain your answer in detail

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions