Question

1 Approved Answer

Posted on Jul 01, 2024

Project 1 Naive Bayes and Logistic Regression In this project you will code up two of the classification algorithms covered in class: Naive Bayes and

Project 1 Naive Bayes and Logistic Regression In this project you will code up two of the classification algorithms covered in class: Naive Bayes and Logistic Regression. The framework code for this question can be downloaded from CANVAS. . Programming Language: You must write your code in R. . Submission Instructions: For cach sub-question you will be given a single function signature, You will be asked to write a single R function which satisfies the signature. In the framework code, we have provided you with a R script for the functions you need to complete. Do not change the structure of the file. Complete each of these functions, and compress the code and the results files, evaluation.txt as a.tar file and submit it to Canvas. You may submit it multiple times. Each submission will overwrite the previous submission. Only the last submission before the deadline will be graded. Presentation slides: Make slides to summarize your results. You do not need submit the slides, but I will randomly draw a couple of groups to present their slides in class. . SUBMISSION CHECKLIST Submission executes in less than 20 minutes. Submission is smaller than 100K. - Submission is a . tar file. - Submission returns matrices of the exact dimension specified. . Data: All questions will use the following datastructures: - "Thin e R"X/ is a matrix of training data, where each row is a training point, and each column is a feature. Test ( RX) is a matrix of test data, where cach row is a test point, and cach column is a feature. - yTmin E (1,...,c}"x] is a vector of training labels - yTest ( {1,...,cymxl is a (hidden) vector of test labels. 1 Logspace Arithmetic [10 pts] When working with very small and very large numbers (such as probabilities), it is useful to work in logspace to avoid numerical precision issues. In logspace, we keep track of the logs of numbers, instead of the numbers themselves. (We generally use natural logs for this). For example, if p(x) and p(y) are proba- bility values, instead of storing p(x] and p(y) and computing p(z) . p(y), we work in log space by storing log p(x), log p(y), log[p(z) * p(#)], where logp(z) - p(#)] is computed as logp(z) + logply). The challenge is to add and multiply these numbers while remaining in logspace, without exponentialing. Note that if we exponentiale our numbers at any point in the calculation it completely defeats the purpose of working in log space. 1. Logspace Multiplication [5 pts] Complete logProd-function(x) which takes as input a vector of numbers in logspace (i.e., z, = log p.). and returns the product of these numbers in logspace - i.c., logProd(x) - log II, pi. 2. Logspace Addition [5 pts] Complete logSum=function(x) which takes as input a vector of numbers in logspace (Le., I, = logp,). and returns the sum of these numbers in logspace - i.c., logSum(x) = log ), pi