Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

lab test : Need gaussian naive byes python code for digits dataset In this exercise, you are to implement only one of two possible classifiers

lab test : Need gaussian naive byes python code for digits dataset

In this exercise, you are to implement only one of two possible classifiers (your choice). Note, you are not to use modules which provide these functions - that would be too easy (no sklearn.naive_bayes, for example) but rather create them yourselves. Students who implement the logic for writing code for both the classifiers will be given bonus credit. The performance of your classifier implementation will be evaluated for the classifier functionality (whether you correctly implement kNN or Nave Bayes for the dataset) rather than efficiency. The data set to use is the digit recognition data set available from the sklearn module; the demonstration linked here should provide some guidance. You are expected to use Jupyter notebooks and Python on this assignment, but can ask for exceptions. Your goal is to take the first half of the data set to train your model, and the last half is used for prediction.

image text in transcribed

b) Gaussian Naive Bayes x represents the image vector (x1,x2,x3,x64) ck represents class k= that is, one of the 10 digits for recognition Recall, we're looking for the highest p(ckx) by using this fact: p(ckx)=p(xck)p(ck)/p(x) Let's step through the parts: - p(ck) is simply the proportion of that class in the training data. E.g. if there are 20 fives out of 200 digits in the training sample p( five )=20/200=0.1 - p(xck) is more complicated - The main assumption of naive Bayes is that the features should be treated independently (which is why it's "naive"). This means p(xck)=p(x1ck)p(x2ck)p(x64ck) For each class, k, in the training data: - Calculate the mean and variance of each pixel location for that class - Use that and the formula for a gaussian probability to calculate p(xick) g(x)=21e21(x)2. - p(x) is the normalization term. You don't need to calculate this, since you just want to pick the largest p(ckx), and p(x) is the same denominator in calculating p(ckx) for every class. - However, if you want p(ckx) to provide a true estimate of the probability, you can use the following formula to calculate p(x) : p(x)=kp(x,ck)=kp(xck)p(ck) The predicted class is the largest p(ckx) for each image 1. Report the overall accuracy of your prediction. 2. Show the classification matrix. 3. Note which errors are more common. In what way does that match your intuitions

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases On The Web Designing And Programming For Network Access

Authors: Patricia Ju

1st Edition

1558515100, 978-1558515109

Students also viewed these Databases questions