Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This question uses the MNIST 784 dataset which is readily available online. TO DO: Modify the class above to implement a KNN classifier. There are

This question uses the MNIST 784 dataset which is readily available online.

TO DO: Modify the class above to implement a KNN classifier. There are three methods that you need to complete:

predict: Given an matrix of validation data with examples each with features, return a length- vector of predicted labels by calling the classify function on each example.

classify: Given a single query example with features, return its predicted class label as an integer using KNN by calling the majority function.

majority: Given an array of indices into the training set corresponding to the training examples that are nearest to the query point, return the majority label as an integer. If there is a tie for the majority label using nearest neighbors, reduce by 1 and try again. Continue reducing until there is a winning label.

Notes:

Don't even think about implementing nearest-neighbor search or any distance metrics yourself. Instead, go read the documentation for Scikit-Learn's BallTree object. You will find that its implemented query method can do most of the heavy lifting for you.

Do not use Scikit-Learn's KNeighborsClassifier in this problem. We're implementing this ourselves.

##Given code

import math import pickle import gzip import numpy as np import matplotlib.pylab as plt %matplotlib inline

# importing all the required libraries

from math import exp import numpy as np import pandas as pd import sklearn from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt %matplotlib inline

# This cell sets up the MNIST dataset

class MNIST_import: """ sets up MNIST dataset from OpenML """ def __init__(self): df = pd.read_csv("data/mnist_784.csv") # Create arrays for the features and the response variable # store for use later y = df['class'].values X = df.drop('class', axis=1).values # Convert the labels to numeric labels y = np.array(pd.to_numeric(y)) # create training and validation sets self.train_x, self.train_y = X[:5000,:], y[:5000] self.val_x, self.val_y = X[5000:6000,:], y[5000:6000] data = MNIST_import()

class KNN: """ Class to store data for regression problems """ def __init__(self, x_train, y_train, K=5): """ Creates a kNN instance

:param x_train: numpy array with shape (n_rows,1)- e.g. [[1,2],[3,4]] :param y_train: numpy array with shape (n_rows,)- e.g. [1,-1] :param K: The number of nearest points to consider in classification """ # Import and build the BallTree on training features from sklearn.neighbors import BallTree self.balltree = BallTree(x_train) # Cache training labels and parameter K self.y_train = y_train self.K = K def majority(self, neighbor_indices, neighbor_distances=None): """ Given indices of nearest neighbors in training set, return the majority label. Break ties by considering 1 fewer neighbor until a clear winner is found.

:param neighbor_indices: The indices of the K nearest neighbors in self.X_train :param neighbor_distances: Corresponding distances from query point to K nearest neighbors. """ # complete your code here def classify(self, x): """ Given a query point, return the predicted label :param x: a query point stored as an ndarray """ # complete your code here def predict(self, X): """ Given an ndarray of query points, return yhat, an ndarray of predictions

:param X: an (m x p) dimension ndarray of points to predict labels for

# complete your code here

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Deductive And Object Oriented Databases Second International Conference Dood 91 Munich Germany December 18 1991 Proceedings Lncs 566

Authors: Claude Delobel ,Michael Kifer ,Yoshifumi Masunaga

1st Edition

3540550151, 978-3540550150

More Books

Students also viewed these Databases questions

Question

What is meant by decentralisation of authority ?

Answered: 1 week ago

Question

Briefly explain the qualities of an able supervisor

Answered: 1 week ago

Question

Define policy making?

Answered: 1 week ago

Question

Define co-ordination?

Answered: 1 week ago

Question

What are the role of supervisors ?

Answered: 1 week ago

Question

Factors Affecting Conflict

Answered: 1 week ago

Question

Describe the factors that lead to productive conflict

Answered: 1 week ago

Question

Understanding Conflict Conflict Triggers

Answered: 1 week ago