Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Sentiment Classification with Na ve Bayes * In this assignment, you will build a na ve Bayes classifier for sentiment classification. We are definingsentiment classification
Sentiment Classification with Nave BayesIn this assignment, you will build a nave Bayes classifier for sentiment classification. We are definingsentiment classification as two classes: positive and negative. Our data set consists of airline reviews. Thezip directory for the data contains training and test datasets, where each file contains one airline reviewtweet. You will build the model using training data and evaluate with test data. Each of training data andtest data contains reviews. You will have to build the system from the scratch eg numpy Do notuse any existing libraries eg scikitlearn Build your naive Bayes classifier Create your Vocabulary: Read the complete training data word by word and create the vocabulary Vfor the corpus. You must not include the test set in this process. Remove any markup tags, eg HTMLtags, from the data. Lower case capitalized words ie starts with a capital letter but not all capitalwords eg USA Keep all stop words. Create versions of V: with stemming and without stemming.You can use appropriate tools in nltk to stem. Tokenize at white space and also at each punctuation.In other words, childs consists of two tokens child and s home. consists of two tokens homeand Consider emoticons in this process. You can use an emoticon tokenizer, if you so choose. Ifyes, specify which one. Extract Features: Convert documents to vectors using Bag of Words BoW representation. Do this intwo ways: keeping frequency count where each word is represented by its count in each document,keeping binary representation that only keeps track of presence or not of a word in a document. Training: calculate the prior for each class & the likelihood for each wordclassNote that: Ignore any words that appear in the test set but not the training set If you want to experiment with different stemmers or other aspects of the input features, youmust do so on the training set, through crossvalidation. You must not do such preliminaryevaluations on test data. When you have finalized your system, features, and parameters, youcan evaluate on test data. Evaluation: Compute the most likely class for each document in the test set using each of the combinationsof stemming frequency count, stemming binary, nostemming frequency count, nostemming binary Compute and report accuracy. Accuracy is number of correctly classified reviewsnumber of allreviews in test Create a confusion matrix for each classifier. Save your results in atxt or log file.Bonus points: how would the results change if you used term frequency x inverse document frequency insteadof binary representation for Nave Bayes pointsOriginally designed by Dr Uzuner for AIT Revised by Dr Liao for AIT Documentation Identify your information group number, name, date and etc and describe the problem to be solvedwell enough so that someone not familiar with our class could understand. Give actual examples of program input and output, along with usage instructions. Describe the algorithm you have used to solve the problem, specified in a stepwise or point by pointfashion. Additional description: Please state whether the bonus credit questions are answered or not DeliverablesPlease submit a zip file named with studentfirstname initiallastnamestudentfirstnameinitiallastnamehw#zip ie student jamie lee, student kahyun lee: jleekleehwzipZip file should include the following: Your codeslog file or txt file that contains your output. You can choose whatever is convenient for you. log canbe created using logging library. txt file can be created using simpleio library.NOTE: if you use jupyter notebook, you still need to save results to a text file and do NOT print all results inthe Jupter notebook. You can have some small outputs in the notebook and then save into HTMLs Zip allnotebook files HTML files, andor related intermediate datasets, and other files into ONE zip file. Do not zipthe original datasets provided for the assignment. Please submit only one zip file.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started