Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Q 1 . Naive Bayes: Code [ 2 5 ] In this question, you will learn to build a Naive Bayes Classifier for the binary
Q Naive Bayes: Code In this question, you will learn to build a Naive Bayes Classifier for the binary classification task. Dataset: "Financial Phrasebank" dataset from HuggingFace. To load the data, you need to install library "datasets" pip install datasets and then use loaddatset method to load the dataset. You can find the code on the link provided above. The dataset contains class labels, neutral positive and negative Consider only positive and negative samples and ignore the neutral samples. Use of the samples selected randomly to train the model and the remaining for the test. Clean the dataset with the steps from the previous assignment and build a vocabulary of all the words. Compute the prior probability of each class Here, count is the number of samples with class and is the total number of samples in the dataset. Compute the likelihood for a all words and all classes with following equation: Here, the count is the frequency of the word in class while count is the frequency of all the words in the class Laplace smoothing is used to avoid zero probability in the case of a new word. For each sample in the test set, predict class which is the class with the highest posterior probability. To avoid underflow and increase speed, use log space to predict the class as follows: Using the metrics from scikitlearn library calculate the accuracy and macroaverage precision, recall, and F score, and also provide the confusion matrix on the test set.
Q Naive Bayes: Code
In this question, you will learn to build a Naive Bayes Classifier for the binary classification
task.
Dataset: "Financial Phrasebank" dataset from HuggingFace. To load the data, you
need to install library "datasets" pip install datasets and then use loaddatset
method to load the dataset. You can find the code on the link provided above.
The dataset contains class labels, neutral positive and negative Consider
only positive and negative samples and ignore the neutral samples. Use of the
samples selected randomly to train the model and the remaining for the test.
Clean the dataset with the steps from the previous assignment and build a vocabulary of
all the words.
Compute the prior probability of each class
Here, count is the number of samples with class and is the total number of
samples in the dataset.
Compute the likelihood for a all words and all classes with following equation:
Here, the count is the frequency of the word in class while count is
the frequency of all the words in the class Laplace smoothing is used to avoid zero
probability in the case of a new word.
For each sample in the test set, predict class which is the class with the highest
posterior probability. To avoid underflow and increase speed, use log space to predict
the class as follows:
Using the metrics from scikitlearn library calculate the accuracy and macroaverage
precision, recall, and F score, and also provide the confusion matrix on the test set.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started