Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

PROBLEM Documentation for each function: - CountVectorizer (http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html) - TfidfTransformer (http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html) - MultinomialNB (http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html) ---------- Textmining.zip file (REQUIRED): https://drive.google.com/drive/folders/1xULa8boCMsZIcmk8oTfttaKdWcdYy2O_?usp=sharing Problem Definition This assignment has three

PROBLEM

image text in transcribedDocumentation for each function:

- CountVectorizer (http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html)

- TfidfTransformer (http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html)

- MultinomialNB (http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html)

image text in transcribed----------

"Textmining.zip" file (REQUIRED):

https://drive.google.com/drive/folders/1xULa8boCMsZIcmk8oTfttaKdWcdYy2O_?usp=sharing

Problem Definition This assignment has three objectives: Learn how Text Classification is used in business per the Research Paper provided to you for this assignment Apply concepts you learned around Text Preprocessing and Naive Bayes Classifier Implement Nave Bayes using Python code provided to you for this assignment . . You have been given TextMining.zip. This zip file contains a Research Paper, Data, and Python Code for this assignment. Unzip this file in the location where you have been developing your Python code. Once unzipped, the TextMiing directory will contain the following files: ICIS2015MousaviRaghuFrey.pdf: The Research Paper that applies Naive Bayes to solve a Business Problem. "Assessing Order Effects in Online Community-Based Health Forums", R Mousavi, T. S. Raghu, Keith Frey, Thirty Sixth International Conference on Information Systems, 2015 One of two data sets you can use for this assignment: Usa bero ce Directory: Contains Health data pertaining to the Research Paper. Please be sure to read the cautionary note in the Preamble by the Authors of the Paper below before using this datase MoviePosNeg Directory: Contains data pertaining to Movie Reviews which you can use if you chose not to use the Health data per the cautionary note below o . Select one of two Python Script files that implements the Nave Bayes Classifier, located inside the TextMining.zip file, depending on the type of computer (Windows or Mac) you are using o o Most students are using Windows, thus the file to use is SKLearnNB-Windows-Only.py If you are using a Mac, the file to use is SKLearnNB-Mac-Only.py Here is the suggested order of working through this assignment Read through all of the instructions in this assignment sheet. Read through the Paper (ICIS2015MousaviRaghuFrey.pdf) to understand how Text Classification is used to solve Business problems. Decide for yourself whether you want to use the Health dataset, or the Movie Reviews dataset per the cautionary note below . . .Update the code in either SKLearnNB-Windows-Only.py if you are using a Windows computer, or SKLearnNB-Mac-Only.py if you are using a Mac computer, as called out in the next section Requirement for this Assignment The code in either SKLearnNB-Windows-Only.py if you are using a Windows computer, or SKLearnNB-Mac-Only.py if you are using a Mac computer, as it stands, runs the following step:s Reads in the Health (or Movie) Data Creates a "Pipeline" of data transformation and classification Uses Naive Bayes Classifier to classify the Health (or Movie) data into Positive and Negative Sentiment Uses 6-fold cross-validation . . . Take the following steps to complete this assignment: . Run the code as-is and note the Accuracy of the Classification . Learn about the functions in lines 65, 66 and 67 of either SKLearnNB-Windows-Only.py or SKLearnNB-Mac-Only.py. Then, add, remove, tweak, or update the function parameters to improve Accuracy of the Classifier. Try different combinations till you can settle on your personal best results. Note: your improved Accuracy must be >= 95 for the Health data set (and 75 for the Movie data set) 64 pipeline-Pipeline( 65 ('vect, CountVectorizer)), 66 (tfidf, 67 (clf 68]) TfidfTransformer()), MultinomialNB)) Documentation for each of the functions is available here: 0 Problem Definition This assignment has three objectives: Learn how Text Classification is used in business per the Research Paper provided to you for this assignment Apply concepts you learned around Text Preprocessing and Naive Bayes Classifier Implement Nave Bayes using Python code provided to you for this assignment . . You have been given TextMining.zip. This zip file contains a Research Paper, Data, and Python Code for this assignment. Unzip this file in the location where you have been developing your Python code. Once unzipped, the TextMiing directory will contain the following files: ICIS2015MousaviRaghuFrey.pdf: The Research Paper that applies Naive Bayes to solve a Business Problem. "Assessing Order Effects in Online Community-Based Health Forums", R Mousavi, T. S. Raghu, Keith Frey, Thirty Sixth International Conference on Information Systems, 2015 One of two data sets you can use for this assignment: Usa bero ce Directory: Contains Health data pertaining to the Research Paper. Please be sure to read the cautionary note in the Preamble by the Authors of the Paper below before using this datase MoviePosNeg Directory: Contains data pertaining to Movie Reviews which you can use if you chose not to use the Health data per the cautionary note below o . Select one of two Python Script files that implements the Nave Bayes Classifier, located inside the TextMining.zip file, depending on the type of computer (Windows or Mac) you are using o o Most students are using Windows, thus the file to use is SKLearnNB-Windows-Only.py If you are using a Mac, the file to use is SKLearnNB-Mac-Only.py Here is the suggested order of working through this assignment Read through all of the instructions in this assignment sheet. Read through the Paper (ICIS2015MousaviRaghuFrey.pdf) to understand how Text Classification is used to solve Business problems. Decide for yourself whether you want to use the Health dataset, or the Movie Reviews dataset per the cautionary note below . . .Update the code in either SKLearnNB-Windows-Only.py if you are using a Windows computer, or SKLearnNB-Mac-Only.py if you are using a Mac computer, as called out in the next section Requirement for this Assignment The code in either SKLearnNB-Windows-Only.py if you are using a Windows computer, or SKLearnNB-Mac-Only.py if you are using a Mac computer, as it stands, runs the following step:s Reads in the Health (or Movie) Data Creates a "Pipeline" of data transformation and classification Uses Naive Bayes Classifier to classify the Health (or Movie) data into Positive and Negative Sentiment Uses 6-fold cross-validation . . . Take the following steps to complete this assignment: . Run the code as-is and note the Accuracy of the Classification . Learn about the functions in lines 65, 66 and 67 of either SKLearnNB-Windows-Only.py or SKLearnNB-Mac-Only.py. Then, add, remove, tweak, or update the function parameters to improve Accuracy of the Classifier. Try different combinations till you can settle on your personal best results. Note: your improved Accuracy must be >= 95 for the Health data set (and 75 for the Movie data set) 64 pipeline-Pipeline( 65 ('vect, CountVectorizer)), 66 (tfidf, 67 (clf 68]) TfidfTransformer()), MultinomialNB)) Documentation for each of the functions is available here: 0

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions