Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 11, 2024

Q 1 . Naive Bayes: Code [ 2 5 ] In this question, you will learn to build a Naive Bayes Classifier for the binary

Q $1 .$ Naive Bayes: Code $[25]$

In this question, you will learn to build a Naive Bayes Classifier for the binary classification

task.

Dataset: "Financial Phrasebank" dataset from HuggingFace. $?^{1}$ To load the data, you

need to install library "datasets" $($ pip install datasets $)$ and then use load $_$ datset $()$

method to load the dataset. You can find the code on the link provided above.

The dataset contains $3$ class labels, neutral $(1),$ positive $(2),$ and negative $(0) .$ Consider

only positive and negative samples and ignore the neutral samples. Use $80 %$ of the

samples selected randomly to train the model and the remaining $20 %$ for the test.

Clean the dataset with the steps from the previous assignment and build a vocabulary of

all the words.

Compute the prior probability of each class

$p (c_{i}) = \frac{c o u n t (c_{i})}{N}$

Here, count $(c_{i})$ is the number of samples with class $c_{i}$ and $N$ is the total number of

samples in the dataset.

Compute the likelihood $p (w_{i} | c)$ for a all words $w_{i}$ and all classes $c$ with following equation:

$p (w_{i} | c) = \frac{c o u n t (w_{i}, c) + 1}{| V | +_{w_{} V}^{?} c o u n t (w, c)}$

Here, the count $(w_{i}, c)$ is the frequency of the word $w_{i}$ in class $c$ while $_{w V}^{?}$ count $(w, c)$ is

the frequency of all the words in the class $c .$ Laplace smoothing is used to avoid zero

probability in the case of a new word.

For each sample in the test set, predict class $c_{N B}$ which is the class with the highest

posterior probability. To avoid underflow and increase speed, use log space to predict

the class as follows:

$c_{N B} = a r g m a x_{c C} (l o g p (c) +_{w_{i} V}^{?} l o g p (w_{i} | c))$

Using the metrics from scikit $-$ learn library $?^{2},$ calculate the accuracy and macro $-$ average

precision, recall, and F $1$ score, and also provide the confusion matrix on the test set.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Hands-On Database

Authors: Steve Conger

2nd Edition

0133024415, 978-0133024418

More Books

Students also viewed these Databases questions

Question

★★★★★

In reading psychological research, you encounter the following statements. Interpret each one. (a) The IQ scores were approximately normally distributed. (b) A bimodal distribution of physical...

Answered: 1 week ago

Question

★★★★★

Becker et al. (A-23) conducted a study using a sample of 50 ethnic Fijian women. The women completed a self-report questionnaire on dieting and attitudes toward body shape and change. The researchers...

Answered: 1 week ago

Question

★★★★★

At September 30, 2016, the accounts of Spring Heights Medical Center (SHMC) include the following: Accounts Receivable...............$143,000 Allowance for Bad Debts (credit balance)....... 3,300...

Answered: 1 week ago

Question

★★★★★

Q2 . When would researchers choose to conduct a meta-analysis?

Answered: 1 week ago

Question

★★★★★

Producing at any point on the PPC/PPF means all resources and technology available are used (full employment). all resources and technology used are producing at their maximum potential (full...

Answered: 1 week ago

Question

★★★★★

Which one of the following isn't something leaders and commanders must remember to be aware of what during decision making? A. Influencing factors B. Competing interests C. Animosities among staff...

Answered: 1 week ago

Question

★★★★★

Provide 3 Facts per Domain (PAHCOM DOMAINS) 1.Risk Management 2.Human Resources 3.Finance 4.Contract Management 5.Business Management 6.Technology and Data Management 7.Clinical Performance Reporting...

Answered: 1 week ago

Question

★★★★★

(1 point) Let Consider the equation u = 84i, v=1+8i, w=-3+1i. ux + v = w. Solve it for x using exactly the same ideas we used for solving linear equations with real coefficients. Ask what bothers...

Answered: 1 week ago

Question

★★★★★

4. The number of daily visitors to the famous 'Bondi Beach' in Sydney, Australia, and the daily temperature on those days, were recorded for eight days in February. The table below shows this data...

Answered: 1 week ago

Question

★★★★★

Unicap Company is opening its doors to investors and shared the following prospective financial information: 2021 2022 2023 2024 Revenues 10,000,000 12,000,000 14,000,000 15,000,000 Cost of Goods...

Answered: 1 week ago

Question

★★★★★

Which laws affecting employment opportunities must Disney need to address in their Executive Incubator and Studio Internship Programs?

Answered: 1 week ago

Question

★★★★★

The text discusses Affirmative Action (AA) and defines what it means: planned special efforts to recruit, hire, and promote women and members of minority groups. How would you assess or describe...

Answered: 1 week ago

Question

★★★★★

What areas of employment discrimination does Disney seem to be targeting with their diversity programs?

Answered: 1 week ago

Previous Question Next Question