Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 27, 2024

In this assignment, we will build and evaluate a spam filter using a dataset that contains some columns indicating the most common words in an

In this assignment, we will build and evaluate a spam filter using a dataset that contains some columns indicating the most common words in an email (frequency of given words and characters), and a label column indicating if the email was spam or not. Please answer the following questions based on your implemented code (implementation in Python):

a) Draw a bar chart to view of the distribution of spam and non-spam email samples in the dataset. How many emails are in the dataset? How many of the emails are spam?

b) Divide the dataset into training and test sets, since this is a binary classification problem, use a Logistic regression or Random Forest algorithm to build a model that can tell whether an email is spam or not.

c) Build the confusion matrix and calculate precision and recall metrics to evaluate the performance of your model.

d) Take another look at the distribution of sample emails (i.e. part a). Are there any imbalances in the distribution? If yes, oversample the minority class using SMOTE algorithm and retrain your model.

e) Rebuild the confusion matrix and compare it with your initial matrix. What are the differences between these models? Does SMOTE work well? Explain your answer in detail

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Design Query Formulation And Administration Using Oracle And PostgreSQL

Database Design Query Formulation And Administration Using Oracle And PostgreSQL

Authors: Michael Mannino

8th Edition

1948426951, 978-1948426954

More Books

Students also viewed these Databases questions

Question

★★★★★

The probability density of a one-parameter Weibull distribution is given by (a) Using a random sample of size n, obtain a moment estimator for . (b) Assuming that the following data are from a...

Answered: 1 week ago

Question

★★★★★

From the ninth century BC until the proliferation of gunpowder in the fifteenth century AD , the ultimate weapon of mass destruction was the catapult (Wilford, John Noble, "How Catapults Married...

Answered: 1 week ago

Question

★★★★★

45. Exercise 33 in Section 1.3 presented a sample of 26 escape times for oilworkers in a simulated escape exercise. Calculate and interpret the sample standard deviation. (Hint: xi 9638 and x2 i ...

Answered: 1 week ago

Question

★★★★★

Jay Gatsby categorizes wines into one of three clusters. The centroids of these clusters, describing the average characteristics of a wine in each cluster, are listed in the following table. Jay has...

Answered: 1 week ago

Question

★★★★★

Consider the three stocks in the following table. P t represents price at time t , and Q t represents shares outstanding at time t . Stock C splits two - for - one in the last period ( t = 2 ) . a ....

Answered: 1 week ago

Question

★★★★★

Dan Branson is a farmer in Abbotsford, British Columbia. A major part of Dan's farm is blueberries. Dan is entrepreneurial and ambitious. He is considering opening the processing of blueberries into...

Answered: 1 week ago

Question

★★★★★

D Question 6 Consider the given function: f(x) = sin (2) (2+7) 5x+2 -2 - if x T if 2 Answered: 1 week ago

Answered: 1 week ago

Question

★★★★★

1. Create a disruptive innovation for your healthcare organization. 2. Outline the core competitive dimensions that make your new service or product unique. 3. Discuss long-term sustainable...

Answered: 1 week ago

Question

★★★★★

PLEASE ANSWER PROBLEM 3. IT HAS 15 QUESTIONS Introduction to Cost Accounting and Cost Accounting Cycle 7. Utility costs were incurred in the factory, P450,000. 8. Advertising costs paid, P2,250,000....

Answered: 1 week ago

Question

★★★★★

Identifies process strategies most useful for a window manufacture by using the tools of process analysis. Then Identify recent advances in production technology that would be useful in the window...

Answered: 1 week ago

Question

★★★★★

Mini-Case B (7 marks) Like every Canadian couple, Amira and Owen carry quite a lot of debt. . Three years ago, Amira purchased a Subaru Outback costing $34,750 before sales tax of 15%, with a $5,000...

Answered: 1 week ago

Question

★★★★★

evaluate and draft recruitment and selection policies and procedures

Answered: 1 week ago

Question

★★★★★

understand the role of human resource managers and line managers in the recruitment and selection process

Answered: 1 week ago

Question

★★★★★

2. Why are there so few female engineers compared to male engineers and what can the engineering sector do to encourage females to take up a career in engineering?

Answered: 1 week ago

Previous Question Next Question