Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 27, 2024

Your task: Recall that Naive Bayes assumes all xi s are conditionally independent given y . However, in this assignment, you will be building a

Your task:

Recall that Naive Bayes assumes all xi

s are conditionally independent given y

.

However, in this assignment, you will be building a not

-

so

-

Naive Bayes model where now xi

s are correlated. In particular, we will only consider two features x

1

and x

2,

where x

1

and x

2

are not conditionally independent given y

.

We assume f

(

x

1,

x

2 |

y

=

k

)

is a bi

-

variate Gaussian

1

distribution N

(\

mu

1

k

, \

mu

2

k

, \

sigma

1, \

sigma

2, \

rho

),

where

\

mu

1

k and

\

mu

2

k are means of x

1

and x

2

given y

=

k

, \

sigma

1

and

\

sigma

2

are standard deviations of x

1

and x

2,

and

\

rho is the correlation between x

1

and x

2 .

Notethat

\

mu

1

k and

\

mu

2

k dependonthevaluekofy,but

\

sigma

1, \

sigma

2,

and

\

rho donotdependony. Also recall that the density of a bi

-

variate Gaussian distribution, given

(\

mu

1

k

, \

mu

2

k

, \

sigma

1, \

sigma

2, \

rho

),

is:

\

sigma

2 (

x

1 \

mu

1

k

) 2 + \

sigma

12 (

x

2 \

mu

2

k

) 2 2 \

rho

\

sigma

1 \

sigma

2 (

x

1 \

mu

1

k

) (

x

2 \

mu

2

k

) 2 (1 \

rho

2) \

sigma

2 \

sigma

2

1 2

f

(

x

1,

x

2 |

y

=

k

)

P

(

y

=

k

)

P

(

y

=

k

|

x

1,

x

2) =

Pi f

(

x

1,

x

2 |

y

=

i

)

P

(

y

=

i

)

Build a not

-

so

-

Naive

-

Bayes model that uses x

1

and x

2

as features to predict y

.

Use train data to build your model and make predictions for the test set. When making predictions use the optimal Bayes classifier idea, i

.

e

.,

if P

(

y

= 1 |

x

1,

x

2) > = 0.5,

classify the test instance as

1,

else classify the test instance as

0 .

Finally, report the accuracy of your model in the TEST set.

Function Parameters:

train: This is the training data set used to train your model. Particularly, use train set to determine the necessary parameters of the model

\

mu

1

k

, \

mu

2

k

, \

sigma

1, \

sigma

2, \

rho and P

(

y

=

i

) .

DO NOT use the training set to make predictions and compute the accuracy of your model as you should do it using the test set. train should be a pandas dataframe, that has three columns x

1,

x

2,

y

.

Please note that you should use the entire train data to build your model, in other words, you DON

T need to further split the training data to create another test set. You can find a sample training set in HW

5

traindata.csv

.

test: This is the test data set used to make predictions and test the performance of your model. Please note that test data set should not be used during the training of the model and only be used to make predictions and compute the accuracy. test is also a pandas dataframe, that has three columns x

1,

x

2,

y

.

A sample test data set can be found in HW

5

testdata.csv

Output: Your function should return class predictions for the test set and overall accu

-

racy of your model in the test set. The type of predictions should be a numpy array and the type of the reported accuracy should be a numpy float.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Spomenik Monument Database

Spomenik Monument Database

Authors: Donald Niebyl, FUEL, Damon Murray, Stephen Sorrell

1st Edition

0995745536, 978-0995745537

More Books

Students also viewed these Databases questions

Question

★★★★★

Claras utility function is U(X, Y) = (X + 2) (Y + 1), where X is her consumption of good X and Y is her consumption of good Y. (a) Write an equation for Claras indifference curve that goes through...

Answered: 1 week ago

Question

★★★★★

=+f. Using the process you gave in Part (b), select a stratified random sample that includes a total of at least 20 selected pages, and record the number of words on each of the selected pages....

Answered: 1 week ago

Question

★★★★★

=+ (b) Show that the closure of a trifling set is also trifling.

Answered: 1 week ago

Question

★★★★★

Home Depot, Inc. (HD) had 1.7 billion shares of common stock outstanding in 2008, whereas Lowes Companies, Inc. (LOW) had 1.46 billion shares outstanding. Given both firms 2008 earnings levels found...

Answered: 1 week ago

Question

★★★★★

Kingbird Company uses the gross profit method to estimate inventory for monthly reporting purposes. Presented below is information for the month of May. Inventory, May 1 $ 156,000 Purchases (gross)...

Answered: 1 week ago

Question

★★★★★

The following is the extract from the shareholder's equity section of the balance sheet at the fiscal year end November 30, 2018: $ 150,000 Preferred A shares $3.15 cumulative unlimited authorized...

Answered: 1 week ago

Question

★★★★★

Saved On January 1, 2024, the Mason Manufacturing Company began construction of a building to be used as its office headquarters. building was completed on September 30, 2025. Expenditures on the...

Answered: 1 week ago

Question

★★★★★

Comet Halley is an irregularly-shaped comet which is visible from Earth every 75-76 years. For the purposes of this question, we will assume that the comet is spherical. a) Taking the diameter of the...

Answered: 1 week ago

Question

★★★★★

Xerox 2300 Copier The DESKTOP XEROX 2300 copier is a versatile model that delivers the first copy in six seconds. It is also the lowest-priced newest Xerox copier available. The 2300 is designed as a...

Answered: 1 week ago

Question

★★★★★

Answer atleast 2-3 categories. ACTIVITY 1. TRENDS AND FADS Instructions: Fill in the chart with a trend and a fad that you've known over the years. For the trends, be able to discuss its rise,...

Answered: 1 week ago

Question

★★★★★

1. Which of the following formulas can be considered an algorithm for computing the area of a triangle whose side lengths are given positive numbers a, b, and c? Explain these. a. Sp(p-a)(p - b)(pc),...

Answered: 1 week ago

Question

★★★★★

4. Explain key barriers to competent intercultural communication

Answered: 1 week ago

Question

★★★★★

1. LaunchPad for Real Communication offers key term videos and encourages selfassessment through adaptive quizzing. Go to bedfordstmartins.com/realcomm to get access to: LearningCurve Adaptive...

Answered: 1 week ago

Question

★★★★★

5. Demonstrate behaviors that contribute to intercultural competence

Answered: 1 week ago

Previous Question Next Question