Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 04, 2024

In this question, you will learn to build a Naive Bayes Classifier for the binary classification task. 1 . Dataset: Financial Phrasebank dataset from HuggingFace

In this question, you will learn to build a Naive Bayes Classifier for the binary classification

task.

1 .

Dataset: "Financial Phrasebank" dataset from HuggingFace

. 1

To load the data, you

need to install library "datasets"

(

pip install datasets

)

and then use load

_

datset

()

method to load the dataset. You can find the code on the link provided above.

2 .

The dataset contains

3

class labels, neutral

(1),

positive

(2),

and negative

(0) .

Consider

only positive and negative samples and ignore the neutral samples. Use

80 %

of the

samples selected randomly to train the model and the remaining

20 %

for the test.

3 .

Clean the dataset with the steps from the previous assignment and build a vocabulary of

all the words.

4 .

Compute the prior probability of each class

() = ()

Here,

()

is the number of samples with class

and

is the total number of

samples in the dataset.

5 .

Compute the likelihood

(

|)

for a all words

and all classes

with following equation:

(

|) = (

,) + 1

| | + (,)

Here, the

(

,)

is the frequency of the word

in class c while

(,)

the frequency of all the words in the class c

.

Laplace smoothing is used to avoid zero

probability in the case of a new word.

6 .

For each sample in the test set, predict class

which is the class with the highest

posterior probability. To avoid underflow and increase speed, use log space to predict

the class as follows:

=

argmax

(

log

() +

log

(

|)) (3.1)

7 .

Using the metrics from scikit

-

learn library

2

,

calculate the accuracy and macro

-

average

precision, recall, and F

1

score, and also provide the confusion matrix on the test set.

2 .

Logistic Regression: Code

[25]

In this task, you will learn to build a Logistic Regression Classifier for the same "Financial

Phrasebank" dataset. Bag of Words model will be used for this task.

1 .

Use

60 %

of the data selected randomly for training,

20 %

selected randomly for testing and

the remaining

20 %

for validation set. Use classes

positive

and

negative

only. Perform

the same cleaning tasks on the text data and build a vocabulary of the words.

1

Link to the dataset.

2

Link to the scikit

-

learn documentation for metrics.

2

2 .

Using CountVectorizer

3

,

fit the cleaned train data. This will create the bag

-

-

words

model for the train data. Transform test and validation sets using same CountVectorizer.

3 .

To implement the logistic regression using following equations,

= .

= ()

we need the weight vector W

.

Create an array of dimension equal to those of each

from the CountVectorizer.

4 .

Apply above equations over whole training dataset and calculate

and cross

-

entropy

loss

which can be calculated as

=

log

+ (1)

log

(1)

5 .

Now, update the weights as follows:

+ 1 = () .

Here,

() .

is the gradient of sigmoid function and

= 0.01

is the learning rate.

6 .

Repeat step

4

and step

5

for

500

iterations or epochs. For each iteration, calculate the

cross

-

entropy loss on validation set.

7 .

Calculate the accuracy and macro

-

average precision, recall, and F

1

score and provide

the confusion matrix on the test set.

8 .

Experiment with varying values of

(0.0001, 0.001, 0.01, 0.1) .

Report your observations

with respect to the performance of the model. You can vary the number of iterations to

enhance the performance if necessary

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Design And Relational Theory Normal Forms And All That Jazz

Authors: Chris Date

1st Edition

1449328016, 978-1449328016

More Books

Students also viewed these Databases questions

Question

★★★★★

In a PI transduction experiment, the PI lysate contains phages that carry pieces of the host chromosomal DNA, but the lysate also con-tains broken pieces of chromosomal DNA (see Figure 7.10). If a...

Answered: 1 week ago

Question

★★★★★

Discuss the impact of ACA changes in the provision of end-of-life care services to those needing LTSS. AppendixLO1

Answered: 1 week ago

Question

★★★★★

122. A plan for an executive travelers club has been developed by an airline on the premise that 10% of its current customers would qualify for membership. a. Assuming the validity of this premise,...

Answered: 1 week ago

Question

★★★★★

Fijisawa, Inc., is considering a major expansion of its product line and has estimated the following free cash flows associated with such an expansion. The initial outlay associated with the...

Answered: 1 week ago

Question

★★★★★

Joyner Companys income statement for Year 2 follows: Sales $ 705,000 Cost of goods sold 236,000 Gross margin 469,000 Selling and administrative expenses 217,000 Net operating income 252,000...

Answered: 1 week ago

Question

★★★★★

Ewing Natural Gas is a large energy company with headquarters in Dallas, Texas. The company offers a wide variety of energy products and has annual revenues of approximately $50 billion. Because of...

Answered: 1 week ago

Question

★★★★★

A sample of size n=25 is selected from a normal population. The sample has a mean of 15 and a sample variance of 4. What is the standard error of the sample mean?

Answered: 1 week ago

Question

★★★★★

After a breakthrough election victory at Amazon, the Amazon Labor Union fell far short of winning enough votes to unionize a second warehouse on Staten Island in New York. The vote was 618 votes...

Answered: 1 week ago

Question

★★★★★

How can you customize this report to find any open balances? Customize > Filter, then choose Uncleared in the Cleared dropdown Gear icon > Change columns, then mark the Credit checkbox in Change...

Answered: 1 week ago

Question

★★★★★

A codebook is used in quantitative research to define the variables (or columns) in a data set and the response categories for each of these variables. Looking at the codebook for this Assessment,...

Answered: 1 week ago

Question

★★★★★

Based on Iceberg Model of "Learning Organization" ( Artifacts, Espoused Values and Basic Assumptions) are analyze and reflected in each of the different levels of culture at Amazon? Reference:...

Answered: 1 week ago

Question

★★★★★

For this task, you need to use the provided pcapanalysis.py and TCP.reflection.pcap files to create three functions. The snippet below shows where you need to code the functions and the expected...

Answered: 1 week ago

Question

★★★★★

A Is it crucial that an employer know how long it took you to earn your B.A.? Is it unethical to simply note the date you finished it?

Answered: 1 week ago

Question

★★★★★

3. A member of a group you belong to suggests that everyone contribute $50 for a wedding present for another member. A. Wow, thats pretty generous! B. What specifically are we going to buy? C. Im...

Answered: 1 week ago

Question

★★★★★

4. Youre asked to give a speech about your athletic success to a high school sports team. A. Im no good at that sort of thing. B. What goal do you have in mind? C. I cant commit to speaking this fall.

Answered: 1 week ago

Previous Question Next Question