Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Using Python or MATLAB. Do not use build-in classifier functions in the package you are using. You need to implement those classifiers by your own.

Using Python or MATLAB.

Do not use build-in classifier functions in the package you are using. You need to implement those classifiers by your own. Other built-in functions like cov() inv() or mean() are allowed.

You will design a Bayes classifier, a Naive Bayes classifier and a k-nearest neighbor classifier in this project.For the generated data, it is an 400 by 3 matrix for training and testing, respectively. Each row in the data sets is one observation. All observations belong to one of the two classes (class 0 and class 1). The first two columns are 2 dimensional observations and the third column contains class IDs. For the zip code datasets, there are 16 inputs (features) and the last column are the class IDs, which are from 1 to 10.

Task 1: For the generated datasets, scatter plot the training and testing data sets. Show class 0 in red and class 1 in blue.

Task 2: For both the generated data and the zip code data, design a Bayes classifier assuming that the data follows a Gaussian distribution. Estimate corresponding parameters from the training data (parametric estimation). Apply your Bayes classifier to the training and testing data sets, respectively. Report training and testing classification accuracies. (For the zip code dataset, if the inverse of any covariance matrix does not exist (this can be done by checking if the determinant of the covariance matrix is zero), then add a small positive value to all the diagonal components and report the value you added in.

Task 3: Repeat Task 2 by designing a naive Bayes classifier. (A simple way to design a nave Bayes classifier is to make those off-diagonal elements in the estimated covariance matrix zero.)

Task 4: Repeat Task 2. Utilize a nonparametric estimation technique to estimate the conditional distribution p(x|Ci), using a Gaussian kernel. Try different kernel sizes h and report the best testing accuracies and the corresponding h values.

Task 5: For both the generated data and the zip code data, design a k-nearest neighbor classifier. Try different k values (Keep in mind that k must be odd) and report the best testing accuracies and the corresponding k values. Turn in all of your codes and results.

Task 6: For the generated data set, display the decision boundaries produced by each of the above classifiers, using the method we have discussed in class. Show them separately.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

User Defined Tensor Data Analysis

Authors: Bin Dong ,Kesheng Wu ,Suren Byna

1st Edition

3030707490, 978-3030707491

More Books

Students also viewed these Databases questions