Question
Description In this project, we will implement KNN using Python. You are given three data sets*: data type have a label training set Yes validation
Description In this project, we will implement KNN using Python.
You are given three data sets*:
data type have a label
training set Yes
validation set Yes
test set No
*A typical division is that the training set accounts for 50% of the total sample, and the others account for 25%, and all of them are randomly selected from the sample.
Dataset: Document number The sentence words Emotion Train 1 I buy an apple phone happy Train 2 I eat the gig apple happy Train 3 The Apple products are too expensive sad Train 4 My friend has an apple ? (1) use KNN for classification problems, here we select Manhattan distance model. When calculating distance (i, j) for a data i, do not include the distance to itself as its always 0. (1.1) You can use one-hot matrix to represent the sentences. Please use all the data sets to create the dictionary. (1.2) Adjust the K value (3 , here N denotes the size of data set), measure the KNN predict accuracy on both training set, and validation set. On the verification set, find the value of K with highest accuracy.
Example: P.S., Distance formula: Manhattan distance: In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 - x2| + |y1 - y2|. Euclidean distance: In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is ((# #)$ + ($ $)$. (2) Apply the KNN model obtained in step 1 on the test set, and save the output result as "my_result.csv". Example:
Submission You have to submit the followings to D2L: 1. MS word file - Describe what you have done for the homework assignment. 2. Python source code file(s) - Must be well organized (comments, indentation, ) - You need to upload the original python file (*.py) and also its PDF version. o For the PDF file, you can just convert the source file to PDF. One way is to print the source file and save to PDF. You have to submit the files SEPERATELY. DO NOT compress into a ZIP file.
k = 5 Validation set correct rate: 0.27009646302250806 k. 7 Validation set correct rate: 0.2604501607717042 k 9 Validation set correct rate: 0.26688102893890675 k 11 Validation set correct rate: 0.29260450160771706 = my_result Words (split by space) label joy senator carl Krueger thinks ipods can kill you who is prince frederic von anhalt prestige has magic touch joy joy study female seals picky about mates joy no e book for harry potter vii joy blair apologises over friendly fire inquest fear k = 5 Validation set correct rate: 0.27009646302250806 k. 7 Validation set correct rate: 0.2604501607717042 k 9 Validation set correct rate: 0.26688102893890675 k 11 Validation set correct rate: 0.29260450160771706 = my_result Words (split by space) label joy senator carl Krueger thinks ipods can kill you who is prince frederic von anhalt prestige has magic touch joy joy study female seals picky about mates joy no e book for harry potter vii joy blair apologises over friendly fire inquest fearStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started