Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

CMPSC 445: Applied Machine Learning in Data Science Project-1: Bag of Words For the chosen domain of interest (movie review, product review, email spam filter,

image text in transcribed

CMPSC 445: Applied Machine Learning in Data Science Project-1: Bag of Words For the chosen domain of interest (movie review, product review, email spam filter, etc.) develop a Bayesian Classifier. The following are the steps to be considered for the project implementation. 1. Data Collection (i.e. documents with known category-make sure that you have enough data for training). 2. Develop the feature data set (.e. Bag of Words) for each known category. 3. Separate 50% of the collection data (i.e. documents) for training and 50% for testing, 4. For the training data document, Create the table with the bag of words and the corresponding Class category. 5. Calculate the prior probability, and conditional independence for each event (i.e. each word in Bag of words). 6. Perform the Bayesian classification for each test data (i.e. each review for classification in the testing data). 7. Record the results. 8. Repeat Step-3 to Step-7 for 80% training and 20% testing. 9. Record the results. 10. Compare Step-7 and Step-9 for difference in prediction. Develop a program to perform Steps-4 to Steps-7. You can use either C++ or python for the programming the application. The program shall perform the following: 1. Read the test data from a flat file (i.e. text file) 2. Have appropriate delimiter to separate the reviews from the classification. For example: The movie was a real treat to the fans. Totally enjoyed it: Positive Here, ?' is the delimiter. 3. Count the frequency of each bag of words in the training document, perform the calculation (i.e. conditional independence and prior probability) and display the results. 4. Read the test data and perform the classification and display the results. 5. Perform the above steps for 80% training and 20% testing. 6. After predicting the class for each of the test document (20% test data), add it back to the training data set and perform the training again. - This is called as the evidential learning. 7. Record the test when there is change in the prediction compared to the Step-7.- 8. Show an example of Bayesian Network for considering conditional dependency. + CMPSC 445: Applied Machine Learning in Data Science Project-1: Bag of Words For the chosen domain of interest (movie review, product review, email spam filter, etc.) develop a Bayesian Classifier. The following are the steps to be considered for the project implementation. 1. Data Collection (i.e. documents with known category-make sure that you have enough data for training). 2. Develop the feature data set (.e. Bag of Words) for each known category. 3. Separate 50% of the collection data (i.e. documents) for training and 50% for testing, 4. For the training data document, Create the table with the bag of words and the corresponding Class category. 5. Calculate the prior probability, and conditional independence for each event (i.e. each word in Bag of words). 6. Perform the Bayesian classification for each test data (i.e. each review for classification in the testing data). 7. Record the results. 8. Repeat Step-3 to Step-7 for 80% training and 20% testing. 9. Record the results. 10. Compare Step-7 and Step-9 for difference in prediction. Develop a program to perform Steps-4 to Steps-7. You can use either C++ or python for the programming the application. The program shall perform the following: 1. Read the test data from a flat file (i.e. text file) 2. Have appropriate delimiter to separate the reviews from the classification. For example: The movie was a real treat to the fans. Totally enjoyed it: Positive Here, ?' is the delimiter. 3. Count the frequency of each bag of words in the training document, perform the calculation (i.e. conditional independence and prior probability) and display the results. 4. Read the test data and perform the classification and display the results. 5. Perform the above steps for 80% training and 20% testing. 6. After predicting the class for each of the test document (20% test data), add it back to the training data set and perform the training again. - This is called as the evidential learning. 7. Record the test when there is change in the prediction compared to the Step-7.- 8. Show an example of Bayesian Network for considering conditional dependency. +

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Management Databases And Organizations

Authors: Richard T. Watson

3rd Edition

0471418455, 978-0471418450

More Books

Students also viewed these Databases questions

Question

Physicians have no say in what prices to charge their patients.

Answered: 1 week ago

Question

2. To compare the costs of alternative training programs.

Answered: 1 week ago

Question

1. The evaluation results can be used to change the program.

Answered: 1 week ago