Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Project 1 MCIS6173 Spam and Ham Assignment Project 1 MCIS6173 Spam and Ham As you know, a big problem in todays exchanging messages among people

Project 1 MCIS6173 Spam and Ham Assignment

Project 1 MCIS6173 Spam and Ham As you know, a big problem in todays exchanging messages among people is the spam email. It is a major problem and it holds a security threat to a lot of users. In this project you will be working to try to filter email messages into spam or ham using simple metrics. Usually this type of work will involve machine learning techniques but we are not asking you to go that route. Objectives: 1- Understanding the danger that these spam email poses. 2- Having a glance of what the spam messages could look like. 3- Identifying what constitutes a spam email. 4- Applying some techniques that are used to separate spam from ham. 5- Extracting information that supports a decision of considering an email as spam or ham. 6- Working with real dataset that is used in industry and research in this topic. Guidelines: You will be given 51 email messages. Some of these are spam and some are ham. The labels for these messages are given in the labels.txt file, 0 is spam and 1 is ham. Your code should perform two tasks: a. An email message should be given to your code. You code has to report if the message is ham or spam. In addition, the code should report the evidence for the decision of that categorization. The evidence is in the form of words that are found in the message and made you decide that it is ham or spam. The spam words are given in the file spamwords.txt. The file has spam categories, so you need to report the category of the spam email as well based on the ones you see in the file. The categories appear under the ----- line. b. Your code should also report the number of spam and the number ham messages in the dataset based on analyzing each email message and not based on the labels. Notes: 1- It is preferred that the project be developed under the Linux OS without the need to install any special packages or libraries except the default compilers and libraries.

2- Languages to be used are only python or java.

3- Name the solution file like: spam.py, spam.java

4- Only one code file should be submitted per group. a. Do not submit other files like snapshots of the program

5- Your code should start with a block of comment. a. This comment block has: i. Students names and ids

. 6- Be careful about the Path names/information. a. Always assume current folder/directory.

7- The command to run your code would be similar to: python2.6 spam.py dataset/ abc.eml java spam dataset/ abc.eml

8- An example of the output should be: abc.eml is commerce spam evidence: buying , order, shipped number of spam messages is: 30 and ham is: 21

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction To Data Mining

Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar

1st Edition

321321367, 978-0321321367

More Books

Students also viewed these Databases questions