Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Machine learning-based SMS Spam Filtering Project Statements - Objective For this project, you are asked to implement a detection program supporting Short Message Service (SMS)

Machine learning-based SMS Spam Filtering Project Statements - Objective For this project, you are asked to implement a detection program supporting Short Message Service (SMS) spam filtering. The main concern is to design/generate features to differentiate SMS spam messages from legitimate ones, and run machine learning techniques (i.e., supervised learning) to classify SMS spam messages. Unlike email spam filtering, SMS spam filtering poses its own intrinsic problem because the length of text messages is relatively small (up to 160 characters or less). To come up with this project successfully, you must devise robust and efficient detection features to solve this problem. . - Dataset You will explore real SMS Spam Collection Data Set, corpus of mobile SMS labeled Spam/Legitimate. The SMS Spam collection contains a total of 1324 SMS messages, which is composed of 82 spam and 1002 legitimate messages. Each line has one SMS message including two columns separated by ,: the second column indicates label if the SMS message is spam or legitimate (ham), and the first one is the content of the message (i.e., raw text). Table 1 shows some samples of SMS Spam/Legitimate messages in the given dataset. Table 1: SMS Spam/legitimate message samples
Label Text
spam "For the most sparkling shopping breaks from 45 per person; call 0121 2025050 or visit www.shortbreaks.org.uk"
spam December only! Had your mobile 11mths+? You are entitled to update to the latest colour camera mobile for Free! Call The Mobile Update Co FREE on 08002986906
ham Yup next stop.
ham No. I dont want to hear anything
Project Requirements - Feature Extraction Write a java (or python, or other program languages but you need to give a demo to me) program for the detection features from raw text and generate a feature set file. You will be programming FOUR detection features and are responsible for justifying the implementation strategy for each of the features in the report. If you are using any references in terms of the implementation, you should cite and explain it concisely. For project 1, the detection features are:
Detection feature Description
Number of Characters typed in Message In Table 1 above, it appears that SMS spam messages include relatively large number of characters than legitimate messages. From the SMS spammers' (e.g., marketers) perspective, they are likely to use more characters available as long as it doesn't exceed the limit of SMS to send the sufficient information to customers for illicit profits.
Number of Currency Symbols To take the bait by mobile users, SMS spammers might emphasize the prize (or cash) using the currency symbol (e.g., ) in the SMS message. This is typically different from legitimate messages. Here is an example, Please call our customer service representative on 0800 169 6031 between 10am-9pm as you have WON a guaranteed 1000 cash or 5000 prize!
Number of Numeric String One of the intrinsic factors from the SMS spams is a CONTACT number or PROMOTION code. Since the phone number is sensitive, it is not likely to be in the legitimate messages frequently. (Example: PRIVATE! Your 2003 Account Statement for shows 800 un-redeemed S. I. M. points. Call 08719899230 Identifier Code: 41685 Expires 07/11/04)
The frequency of most popular term/word
- Binary Classification From WEKA, run different binary classifiers to identify SMS Spam messages using the feature set you devised. You should run the following FIVE classification algorithms with Cross Validation (by default, K =10) and report the experimental results you analyzed. Specifically, you must report the best classifier with i) Accuracy, ii) True Positive rate (TPR), and iii) False Positive rate (FPR).
  • Decision Tree (J48)
  • Multinomial Naive Bayes
  • K-Nearest Neighbors
  • SVM (LibSVM)
  • RandomForest
Programming Requirements We highly recommend you to use Java (or Python) and WEKA for this assignment. You can also use other langagues but you need to clearly write the instrucitons to configure the compling and running environment to test your code. You might also be required to give a demo of your code. You need to write the programs to extract detection features and apply machine learning techniques using WEKA. Here is official Weka documentation (e.g., Weka Wiki, FAQ, Tutorials) link (http://www.cs.waikato.ac.nz/ml/weka/documentation.html). If the program only runs in your own machine, you should show the demo to TA or the instructor in person. Your program should generate the output in the format of Attribute-Relation File Format (ARFF), which can be directly analyzed by the Weka. Submission Requirements The assignment must be done originally. Please submit TWO files: i) A single tar (or zip) file that include all files below:
  • Source codes: your java (or python) codes and a shell script files (if applicable) that you use for testing. Your code also needs to be well-documented, with any major constructs (i.e., functions) clearly commented.
  • README: overall high-level documentation of instructions to run your program. (1 page)
  • Report: well-documented summary that includes the annotation/justification of detection features and the experimental results you analyzed. (at least 5 pages, the more formal the better.)
  • Dataset: a feature set data in the format of ARFF to be tested
ii) a MS word (or Acrobat PDF, Plain Text) file, including ONLY source codes for originality checking.

Attachments:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Recommended Textbook for

App Inventor

Authors: David Wolber, Hal Abelson

1st Edition

1449397484, 9781449397487

Students also viewed these Programming questions

Question

Explain the pages in white the expert taxes

Answered: 1 week ago