Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 24, 2024

In this assignment, we will build and evaluate a spam filter using a dataset that contains some columns indicating the most common words in an

In this assignment, we will build and evaluate a spam filter using a dataset that contains some columns indicating the most common words in an email (frequency of given words and characters), and a label column indicating if the email was spam or not. Please answer the following questions based on your implemented code (implementation in Matlab):

a) Draw a bar chart to view of the distribution of spam and non-spam email samples in the dataset. How many emails are in the dataset? How many of the emails are spam?

b) Divide the dataset into training and test sets, since this is a binary classification problem, use a Logistic regression or Random Forest algorithm to build a model that can tell whether an email is spam or not.

c) Build the confusion matrix and calculate precision and recall metrics to evaluate the performance of your model.

d) Take another look at the distribution of sample emails (i.e. part a). Are there any imbalances in the distribution? If yes, oversample the minority class using SMOTE algorithm and retrain your model.

e) Rebuild the confusion matrix and compare it with your initial matrix. What are the differences between these models? Does SMOTE work well? Explain your answer in detail

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Optimizing Database Performance Techniques To Optimize The Efficiency Of Database Systems And Applications

Optimizing Database Performance Techniques To Optimize The Efficiency Of Database Systems And Applications

Authors: Craig S Mullins

1st Edition

B0CFZFD49Y, 979-8857641286

More Books

Students also viewed these Databases questions

Question

★★★★★

Describe the role of branding in the global luxury products market.

Answered: 1 week ago

Question

★★★★★

Write a short essay on flexibility.

Answered: 1 week ago

Question

★★★★★

4. Daydreams can provide insight into your characters deepest fears and desires. They also can be used to reveal the characters psychological and emotional state. Think of how you can use a daydream...

Answered: 1 week ago

Question

★★★★★

Trotman Company had three intangible assets at the end of 2013 (end of the accounting year): a. Computer software and Web development technology purchased on January 1, 2012, for $70,000. The...

Answered: 1 week ago

Question

★★★★★

just the answers only i dont need no steps Multiple Choice Question 71 Boswell Company manufactures two products, Regular and Supreme. Boswell's overhead costs consist of machining, $3600000; and...

Answered: 1 week ago

Question

★★★★★

When conducting project meetings, why is the first meeting so important?

Answered: 1 week ago

Question

★★★★★

Apple is delaying its plan to use neuralMatchLinks to an external site. technology to scan US iPhones for child pornography and child abuse. Currently, Apple uses end-to-end encryption to ensure...

Answered: 1 week ago

Question

★★★★★

You use Tableau for a project about spacecraft trajectories. You drag the Spacecraft field onto a shelf, selecting a specific craft. This enables you to monitor the trajectories that will be most...

Answered: 1 week ago

Question

★★★★★

Ringle Company is a manufacturer of compact disks (CDs). Place each of the following costs in the appropriate column. Product Cost Cost item Cost item Period Direct Direct Factory Cost Materials...

Answered: 1 week ago

Question

★★★★★

Multiple Select Question Select all that apply Which of the following about planning for objections is true? Multiple select question. Salespeople should only focus on the disadvantages of their...

Answered: 1 week ago

Question

★★★★★

Activity 2: IP and routing This is a group activity. Therefore, you need to form a group of four (4)~ five (5) people. At your table (or in MS team chat) discuss the following questions with your...

Answered: 1 week ago

Question

★★★★★

d. Prizes. Employees who achieve weekly sales goals established by management would be eligible for prizes such as sports or theater tickets, dinner at a nice restaurant, gift certificates, or...

Answered: 1 week ago

Question

★★★★★

2. Some leaders in business and industry say that worker pride is the byproduct of achievement. What are your thoughts? What factors constitute achievement at your local McDonalds restaurant? Your...

Answered: 1 week ago

Question

★★★★★

2. If you decide to create your own brand, what personal qualities will give you greater visibility, recognition, and acceptance in the labor market? These qualities should send the message, Pick me;...

Answered: 1 week ago

Previous Question Next Question