Question
I did this assignment to complete a KNN classifier, but I'm having trouble to identify what's wrong with my code, specifically when trying to use
I did this assignment to complete a KNN classifier, but I'm having trouble to identify what's wrong with my code, specifically when trying to use the model with 3 neighbors.
This is the title of the assignment:
This question uses the MNIST 784 dataset which is readily available online. Modify the class above to implement a KNN classifier. There are three methods that you need to complete:
predict: Given an matrix of validation data with examples each with features, return a length- vector of predicted labels by calling the classify function on each example.
classify: Given a single query example with features, return its predicted class label as an integer using KNN by calling the majority function.
majority: Given an array of indices into the training set corresponding to the training examples that are nearest to the query point, return the majority label as an integer. If there is a tie for the majority label using nearest neighbors, reduce by 1 and try again. Continue reducing until there is a winning label.
Notes:
Don't even think about implementing nearest-neighbor search or any distance metrics yourself. Instead, go read the documentation for Scikit-Learn's BallTree object. You will find that its implemented query method can do most of the heavy lifting for you. Do not use Scikit-Learn's KNeighborsClassifier in this problem. We're implementing this ourselves.
Check how well your classifier does Use your KNN class to perform KNN on the validation data with =3.
This is my first attempt code:
class KNN: """ Class to store data for regression problems """ def __init__(self, x_train, y_train, K=5): """ Creates a kNN instance
:param x_train: numpy array with shape (n_rows,1)- e.g. [[1,2],[3,4]] :param y_train: numpy array with shape (n_rows,)- e.g. [1,-1] :param K: The number of nearest points to consider in classification """ # Import and build the BallTree on training features from sklearn.neighbors import BallTree self.balltree = BallTree(x_train) # Cache training labels and parameter K self.y_train = y_train self.K = K def majority(self, neighbor_indices, neighbor_distances=None): """ Given indices of nearest neighbors in training set, return the majority label. Break ties by considering 1 fewer neighbor until a clear winner is found.
:param neighbor_indices: The indices of the K nearest neighbors in self.X_train :param neighbor_distances: Corresponding distances from query point to K nearest neighbors. """ while self.K >= 1: # Count the frequency of each label in the neighbors labels, count = np.unique(self.y_train[neighbor_indices[:,:self.K].astype(int)], return_counts=True)
# Find the label with the highest frequency majority_index = np.argmax(count) majority_label = labels[majority_index]
# If there's a clear winner, return it if count[majority_index] > self.K/2: return majority_label
# Reduce K and try again self.K -= 1 # If there's still a tie after reducing K to 1, return the label of the single nearest neighbor return self.y_train[neighbor_indices[0,0]] def classify(self, x): """ Given a query point, return the predicted label :param x: a query point stored as an ndarray """ # Find the K nearest neighbors in the training set neighbor_indices, neighbor_distances = self.balltree.query(x.reshape(1, -1), k=self.K) # Predict the label using the majority of the labels in the nearest neighbors return self.majority(neighbor_indices, neighbor_distances) def predict(self, X): """ Given an ndarray of query points, return yhat, an ndarray of predictions
:param X: an (m x p) dimension ndarray of points to predict labels for """ # Predict the label for each query point return np.array([self.classify(x) for x in X])
Feedback error: AssertionError: Look at the KNN class. Does it perform as expected with three neighbors?
y_pred seems to be ok
I understand there's probably something wrong with my majority function in the KNN class, so I changed the "marjority" function and kept the rest of the code:
This is my second attempt code:
def majority(self, neighbor_indices, neighbor_distances=None): """ Given indices of nearest neighbors in training set, return the majority label. Break ties by considering 1 fewer neighbor until a clear winner is found.
:param neighbor_indices: The indices of the K nearest neighbors in self.X_train :param neighbor_distances: Corresponding distances from query point to K nearest neighbors. """ max_neighbors = self.K * 2 num_neighbors = self.K while num_neighbors <= max_neighbors: # Count the frequency of each label in the neighbors labels, count = np.unique(self.y_train[neighbor_indices[:,:num_neighbors].astype(int)], return_counts=True)
majority_index = np.argmax(count) majority_label = labels[majority_index]
# If there's a clear winner, return it if count[majority_index] > num_neighbors/2: return majority_label
# Increase the number of neighbors and try again num_neighbors += 1 return self.y_train[neighbor_indices[0,0]]
Feedback error: AssertionError: Look at the KNN class. Does it perform as expected with one neighbors?
y_pred returns 'none'
This is my third attempt code:
def majority(self, neighbor_indices, neighbor_distances=None): """ Given indices of nearest neighbors in training set, return the majority label. Break ties by considering more neighbors until a clear winner is found.
:param neighbor_indices: The indices of the K nearest neighbors in self.X_train :param neighbor_distances: Corresponding distances from query point to K nearest neighbors. """
max_neighbors = len(neighbor_indices.astype(int))
for num_neighbors in range(self.K, max_neighbors + 1): # Count the frequency of each label in the neighbors labels, count = np.unique(self.y_train[neighbor_indices[:num_neighbors].astype(int)], return_counts=True)
# Find the label with the highest frequency majority_index = np.argmax(count) majority_label = labels[majority_index]
# If there's a clear winner, return it if count[majority_index] > num_neighbors / 2: return majority_label
# If there's still a tie after considering all neighbors, return the label of the single nearest neighbor return self.y_train[neighbor_indices[0].astype(int)]
Feedback error: AssertionError: Look at the KNN class. Does it perform as expected with one neighbors?
y_pred returns 'none'
This is my fourth attempt code:
def majority(self, neighbor_indices, neighbor_distances=None): """ Given indices of nearest neighbors in training set, return the majority label. Break ties by considering 1 more neighbor until a clear winner is found.
:param neighbor_indices: The indices of the K nearest neighbors in self.X_train :param neighbor_distances: Corresponding distances from query point to K nearest neighbors. """
max_neighbors = self.K * 2 num_neighbors = self.K
while num_neighbors <= max_neighbors and num_neighbors <= len(neighbor_indices): # Count the frequency of each label in the neighbors labels, count = np.unique(self.y_train[neighbor_indices[:,:num_neighbors].astype(int)], return_counts=True)
# Find the label with the highest frequency majority_index = np.argmax(count) majority_label = labels[majority_index]
# If there's a clear winner, return it if count[majority_index] > num_neighbors/2: return majority_label
# Increase the number of neighbors and try again num_neighbors += 1
# If there's still a tie after trying all possible numbers of neighbors, return the label of the single nearest neighbor return self.y_train[neighbor_indices[0,0]]
Feedback error: AssertionError: Look at the KNN class. Does it perform as expected with one neighbors?
y_pred returns 'none'
my fifth attempt code:
def majority(self, neighbor_indices, neighbor_distances=None): """ Given indices of nearest neighbors in the training set, return the majority label. Break ties by considering more neighbors until a clear winner is found.
:param neighbor_indices: The indices of the k nearest neighbors in self.X_train :param neighbor_distances: Corresponding distances from query point to k nearest neighbors. """
for k in range(self.K, len(neighbor_indices) + 1): labels, counts = np.unique(self.y_train[neighbor_indices[:k]], return_counts=True) if len(labels) == 1: return labels[0] if counts[0] > counts[1]: return labels[0]
# If no clear winner is found, return the label of the nearest neighbor return self.y_train[neighbor_indices[0]]
Feedback error: AssertionError: Look at the KNN class. Does it perform as expected with one neighbors?
y_pred returns 'none'
my sixth attempt code:
def majority(self, neighbor_indices, neighbor_distances=None): """ Given indices of nearest neighbors in training set, return the majority label. Break ties by considering fewer neighbors until a clear winner is found.
:param neighbor_indices: The indices of the K nearest neighbors in self.X_train :param neighbor_distances: Corresponding distances from query point to K nearest neighbors. """ k = self.K while k >= 1: # Get the labels of the k nearest neighbors labels = self.y_train[neighbor_indices[0][:k]]
# Count the frequency of each label in the neighbors unique_labels, counts = np.unique(labels, return_counts=True)
# Find the label with the highest frequency max_count_index = np.argmax(counts) max_count = counts[max_count_index] max_label = unique_labels[max_count_index]
# Check for ties ties = (counts == max_count) if ties.sum() == 1: return max_label else: k -= 1
# If there is still a tie after reducing k to 1, return the label of the single nearest neighbor return self.y_train[neighbor_indices[0][0]]
Feedback error: AssertionError: Look at the KNN class. Does it perform as expected with one neighbors?
y_pred returns 'none'
The question asks to "tie for the majority label using nearest neighbors, reduce by 1 and try again. Continue reducing until there is a winning label.". I believe this is mandatory to pass de code. All these functions are also mandatory in the knn class: __init__, majority, classify and predict. It seems like my code is having a problem when trying to use the model with 3 neighbors. I believe there's some tricky thing about 3 neighbors that I'm not seeing. Can somebody please help me?
TO DO: What's wrong with my code? How can I fix this issue? Can somebody help me to make my model work with 3 neighbors and following the reducing until there is a winning label? I've been posting this same problem here for a few days but it seems like nobody knows how to fix this.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started