Answered step by step
Verified Expert Solution
Question
1 Approved Answer
CMPSC 445: Applied Machine Learning in Data Science Project-1: Bag of Words For the chosen domain of interest (movie review, product review, email spam filter,
CMPSC 445: Applied Machine Learning in Data Science Project-1: Bag of Words For the chosen domain of interest (movie review, product review, email spam filter, etc.) develop a Bayesian Classifier. The following are the steps to be considered for the project implementation. 1. Data Collection (i.e. documents with known category-make sure that you have enough data for training). 2. Develop the feature data set (.e. Bag of Words) for each known category. 3. Separate 50% of the collection data (i.e. documents) for training and 50% for testing, 4. For the training data document, Create the table with the bag of words and the corresponding Class category. 5. Calculate the prior probability, and conditional independence for each event (i.e. each word in Bag of words). 6. Perform the Bayesian classification for each test data (i.e. each review for classification in the testing data). 7. Record the results. 8. Repeat Step-3 to Step-7 for 80% training and 20% testing. 9. Record the results. 10. Compare Step-7 and Step-9 for difference in prediction. Develop a program to perform Steps-4 to Steps-7. You can use either C++ or python for the programming the application. The program shall perform the following: 1. Read the test data from a flat file (i.e. text file) 2. Have appropriate delimiter to separate the reviews from the classification. For example: The movie was a real treat to the fans. Totally enjoyed it: Positive Here, ?' is the delimiter. 3. Count the frequency of each bag of words in the training document, perform the calculation (i.e. conditional independence and prior probability) and display the results. 4. Read the test data and perform the classification and display the results. 5. Perform the above steps for 80% training and 20% testing. 6. After predicting the class for each of the test document (20% test data), add it back to the training data set and perform the training again. - This is called as the evidential learning. 7. Record the test when there is change in the prediction compared to the Step-7.- 8. Show an example of Bayesian Network for considering conditional dependency. + CMPSC 445: Applied Machine Learning in Data Science Project-1: Bag of Words For the chosen domain of interest (movie review, product review, email spam filter, etc.) develop a Bayesian Classifier. The following are the steps to be considered for the project implementation. 1. Data Collection (i.e. documents with known category-make sure that you have enough data for training). 2. Develop the feature data set (.e. Bag of Words) for each known category. 3. Separate 50% of the collection data (i.e. documents) for training and 50% for testing, 4. For the training data document, Create the table with the bag of words and the corresponding Class category. 5. Calculate the prior probability, and conditional independence for each event (i.e. each word in Bag of words). 6. Perform the Bayesian classification for each test data (i.e. each review for classification in the testing data). 7. Record the results. 8. Repeat Step-3 to Step-7 for 80% training and 20% testing. 9. Record the results. 10. Compare Step-7 and Step-9 for difference in prediction. Develop a program to perform Steps-4 to Steps-7. You can use either C++ or python for the programming the application. The program shall perform the following: 1. Read the test data from a flat file (i.e. text file) 2. Have appropriate delimiter to separate the reviews from the classification. For example: The movie was a real treat to the fans. Totally enjoyed it: Positive Here, ?' is the delimiter. 3. Count the frequency of each bag of words in the training document, perform the calculation (i.e. conditional independence and prior probability) and display the results. 4. Read the test data and perform the classification and display the results. 5. Perform the above steps for 80% training and 20% testing. 6. After predicting the class for each of the test document (20% test data), add it back to the training data set and perform the training again. - This is called as the evidential learning. 7. Record the test when there is change in the prediction compared to the Step-7.- 8. Show an example of Bayesian Network for considering conditional dependency. +
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started