Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Description In this project, we will implement KNN using Python. You are given three data sets*: data type have a label training set Yes validation

Description In this project, we will implement KNN using Python.

You are given three data sets*:

data type have a label

training set Yes

validation set Yes

test set No

*A typical division is that the training set accounts for 50% of the total sample, and the others account for 25%, and all of them are randomly selected from the sample.

Dataset: Document number The sentence words Emotion Train 1 I buy an apple phone happy Train 2 I eat the gig apple happy Train 3 The Apple products are too expensive sad Train 4 My friend has an apple ? (1) use KNN for classification problems, here we select Manhattan distance model. When calculating distance (i, j) for a data i, do not include the distance to itself as its always 0. (1.1) You can use one-hot matrix to represent the sentences. Please use all the data sets to create the dictionary. (1.2) Adjust the K value (3 , here N denotes the size of data set), measure the KNN predict accuracy on both training set, and validation set. On the verification set, find the value of K with highest accuracy.

Example: P.S., Distance formula: Manhattan distance: In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 - x2| + |y1 - y2|. Euclidean distance: In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is ((# #)$ + ($ $)$. (2) Apply the KNN model obtained in step 1 on the test set, and save the output result as "my_result.csv". Example:

image text in transcribed Submissionimage text in transcribed You have to submit the followings to D2L: 1. MS word file - Describe what you have done for the homework assignment. 2. Python source code file(s) - Must be well organized (comments, indentation, ) - You need to upload the original python file (*.py) and also its PDF version. o For the PDF file, you can just convert the source file to PDF. One way is to print the source file and save to PDF. You have to submit the files SEPERATELY. DO NOT compress into a ZIP file.

k = 5 Validation set correct rate: 0.27009646302250806 k. 7 Validation set correct rate: 0.2604501607717042 k 9 Validation set correct rate: 0.26688102893890675 k 11 Validation set correct rate: 0.29260450160771706 = my_result Words (split by space) label joy senator carl Krueger thinks ipods can kill you who is prince frederic von anhalt prestige has magic touch joy joy study female seals picky about mates joy no e book for harry potter vii joy blair apologises over friendly fire inquest fear k = 5 Validation set correct rate: 0.27009646302250806 k. 7 Validation set correct rate: 0.2604501607717042 k 9 Validation set correct rate: 0.26688102893890675 k 11 Validation set correct rate: 0.29260450160771706 = my_result Words (split by space) label joy senator carl Krueger thinks ipods can kill you who is prince frederic von anhalt prestige has magic touch joy joy study female seals picky about mates joy no e book for harry potter vii joy blair apologises over friendly fire inquest fear

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning C# 5.0 Databases

Authors: Vidya Vrat Agarwal

2nd Edition

1430242604, 978-1430242604

More Books

Students also viewed these Databases questions

Question

Describe Table Structures in RDMSs.

Answered: 1 week ago