Question
Hello, I am struggling to call kNearestNeighbors function on my training and testing matrices. I keep getting this error: What can I do about it
Hello, I am struggling to call kNearestNeighbors function on my training and testing matrices. I keep getting this error: What can I do about it ?
IndexError: index 57 is out of bounds for axis 0 with size 57
import numpy as np import pandas as pd from math import sqrt from csv import reader
#euclideanDistance: d(x, y)=sqrRootofsum(xi-yi)^2 def euclideanDistance(x, y): #calculate the distance between two rows in a dataset sum=0.0 #Initialize the sum to 0 for j in range(len(x)-1): #Iterate through all of the components in the arrays sum += (x[j]-y[j])**2 #get the difference between squareRoot = sqrt(sum) return squareRoot # def kNearestNeighbors(trainingData, testingDataRow, K): euclideanDistances=list() for i in trainingData: #for each row in the training data distance=euclideanDistance(testingDataRow, i) print(distance)#get the euclidean distance between the testing data row and the training data rows euclideanDistances.append((i, distance)) #add the traiing data and distances between the training data and the testing data row to the list euclideanDistances.sort(key=lambda x: x[1]) #sort the lists KNN=list() #create list of k nearest neighbors for j in range(K): KNN.append(euclideanDistances[j][0]) #add the smallest k euclidean distances to the list of k nearest neighbors return KNN
#classification def classification(trainingData, testingDataRow, K): KNN=[row[-1] for row in kNearestNeighbors(trainingData, testingDataRow, K)] #get the last column for all of the rows and put it in a vector classPrediction=max(set(KNN), key=KNN.count) #get the highest count of a classication from the set of classes return classPrediction
spam_test=pd.read_csv('spam_test.csv') #put the spam_test.csv file in variable spam_test spam_train=pd.read_csv('spam_train.csv') #put the spam_test.csv file in variable spam_train print(spam_test)
k_values=[1, 5, 11, 21, 41, 61, 81, 101, 201, 401] #put all of the different K values in a list index= [ 'f'+str(i) for i in range(1,58) ] #turn the feature labels into strings
testMatrix = spam_test[index].values trainMatrix = spam_train[index].values print(testMatrix) kNearestNeighbors(trainMatrix, testMatrix, 3)
Here is an example of my testing data csv file by the way if that helps in anyway
_A _B _c _0 _F K L M N O . P Q R S _E f4f5 DIf1f2f3 f6 f7f8 f9 f10 f11 f12 f13 f14 f15 f16 f19 f20 f21 f18 1.25 2.5 0:31 191 0.21 0 0 0,1 0,37 0.31 0.75 f17 0 0.53 1.87 0.98 0.21 0,37 1 0.63 0,37 0.98 0:29 0,4 1.7 2.63 0.98 0.75 1,96 0,75 196 0.43 0,4 0.5 0.14 0:29 0,42 _0 _0 0.54 1.43 _0 0 0.65 0.03 0,43 0.51 0.2 0.03 0,18 0.2 0.62 0.61 0.65 0.2 0,2 2.0 0 0 6 0.03 0,4 _0 0 0.54 0.65 1.31 5.- O 1.63 _0 0.54 2.- O 2.73 2.61 0.54 0.65 0.65 0.65 3.26 1 O 235 0 0 2.35 0,32 0,32 0.65 1.31 0.54 0 1.17 0,32 _0 0,49 0,32 0,32 0,32 0.32 3.- _0 0 . 1.3 0 0,32 _0 0,24 0.64 0.64 . 0 . 0 0.54 0.65 1.17 0,32 0,25 0.99 0.32 0,32 0,78 0:24 0.24 . 0 11/t10 12 t11 13 |t12 14 t13 15 |t14 16 t15 17 t16 18 |t17 19 1t18 20 tus 21/t20 22 t21 23 22 0.- | 2.66 _0 1.31 4.91 1.3 2.35 2,28 _0 247 2:27 2.27 1.56 0,4 1.57 6.06 3.36 0.87 0.24 0,32 0,32 0,78 0.64 0.64 0,32 0,32 0,32 0,32 0.65 _0 0,49 0.64 0.64 1.56 0 1.57 151 0.96 0,43 .3 0.78 0 23 0:24 _0.32 0.32 0.78 0 0.78 0.37 _0 0,4 0 0,4 2.: 4.- 0,78 0,37 _0 0,78 n 0.75 00/ . 0 0,37 0,37 _0 0,75 0,48 1.57 0,75 1.44 0 1.89 _0.96 0,43 1.13 _0_ 0 0.96 0,43 _0 0 _0,48 0 0,43 0,43 0 _A _B _c _0 _F K L M N O . P Q R S _E f4f5 DIf1f2f3 f6 f7f8 f9 f10 f11 f12 f13 f14 f15 f16 f19 f20 f21 f18 1.25 2.5 0:31 191 0.21 0 0 0,1 0,37 0.31 0.75 f17 0 0.53 1.87 0.98 0.21 0,37 1 0.63 0,37 0.98 0:29 0,4 1.7 2.63 0.98 0.75 1,96 0,75 196 0.43 0,4 0.5 0.14 0:29 0,42 _0 _0 0.54 1.43 _0 0 0.65 0.03 0,43 0.51 0.2 0.03 0,18 0.2 0.62 0.61 0.65 0.2 0,2 2.0 0 0 6 0.03 0,4 _0 0 0.54 0.65 1.31 5.- O 1.63 _0 0.54 2.- O 2.73 2.61 0.54 0.65 0.65 0.65 3.26 1 O 235 0 0 2.35 0,32 0,32 0.65 1.31 0.54 0 1.17 0,32 _0 0,49 0,32 0,32 0,32 0.32 3.- _0 0 . 1.3 0 0,32 _0 0,24 0.64 0.64 . 0 . 0 0.54 0.65 1.17 0,32 0,25 0.99 0.32 0,32 0,78 0:24 0.24 . 0 11/t10 12 t11 13 |t12 14 t13 15 |t14 16 t15 17 t16 18 |t17 19 1t18 20 tus 21/t20 22 t21 23 22 0.- | 2.66 _0 1.31 4.91 1.3 2.35 2,28 _0 247 2:27 2.27 1.56 0,4 1.57 6.06 3.36 0.87 0.24 0,32 0,32 0,78 0.64 0.64 0,32 0,32 0,32 0,32 0.65 _0 0,49 0.64 0.64 1.56 0 1.57 151 0.96 0,43 .3 0.78 0 23 0:24 _0.32 0.32 0.78 0 0.78 0.37 _0 0,4 0 0,4 2.: 4.- 0,78 0,37 _0 0,78 n 0.75 00/ . 0 0,37 0,37 _0 0,75 0,48 1.57 0,75 1.44 0 1.89 _0.96 0,43 1.13 _0_ 0 0.96 0,43 _0 0 _0,48 0 0,43 0,43 0Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started