Question
The first thing is to load the MNIST data into our Machine Learning programs. We will set up the digitClassifier.py to expect input as two
The first thing is to load the MNIST data into our Machine Learning programs. We will set up the digitClassifier.py to expect input as two matrices: (i) the set of training images X and (ii) the set of training labels Y (there will be a corresponding pair of test matrices). Download all 4 files of the MNIST is available at: http://yann.lecun.com/exdb/mnist/
The images in the MNIST data are 28 × 28 pixels, in a proprietory formt For
our work we will flatten out each image to a single row of 28×28+1 = 785 ele- ments in the X matrix. The *+1* is the first element and correspond to the first column of 1's as for any regression X matrix. There are 60,000 example in the train-images-idx3-ubyte.gz file and 60,000 labels in the train-labels-idx1-ubyte.gz file. These are gzipped files. Please read the note (in bold under the data sets) (you
may need to replace the '-' after images with a · if you get a file-not-found error).
The code to load the uncompressed files into your code is given below:
# An MNIST loader.
import numpy as np
#import gzip
import struct
def load_images(filename):
# Open and unzip the file of images:
# with gzip.open(filename, 'rb') as f:
fh = open(filename, 'rb')
# Read the header information into a bunch of variables
_ignored, n_images, columns, rows = struct.unpack('>IIII', fh.read(16))
# Read all the pixels into a NumPy array of bytes:
all_pixels = np.frombuffer(fh.read(), dtype=np.uint8)
# Reshape the pixels into a matrix where each line is an image:
return all_pixels.reshape(n_images, columns * rows)
def stack_ones(X):
c1 = np.ones(len(X)) # can use np.shape(X)[0]
return np.column_stack((c1, X))
def load_labels(filename):
#with gzip.open(filename, 'rb') as f: # uncomment to unzip & open the gzippe
fj = open(filename, 'rb')
# Skip the header bytes:
fj.read(8)
# Read all the labels into a list:
all_labels = fj.read()
# Reshape the list of labels into a one-column matrix:
return np.frombuffer(all_labels, dtype=np.uint8).reshape(-1, 1)
def encode_sevens(Y):
# Convert all 7s to 1, and everything else to 0
return (Y == 7).astype(int)
# The following commands load the data into
# 60000 images, each 785 elements (1 bias + 28 * 28 pixels)
X_train = stack_ones(load_images("train-images.idx3-ubyte"))
#Note "-" replaced wit
# 60K labels, each with value 1 if the digit is a five, and 0 otherwise
Y_train = encode_sevens(load_labels("train-labels.idx1-ubyte"))
# 10000 images, each 785 elements, with the same structure as X_train
X_test = stack_ones(load_images("t10k-images.idx3-ubyte"))
# 10000 labels, with the same encoding as Y_train
Y_test = encode_sevens(load_labels("t10k-labels.idx1-ubyte"))
The above file loads all the data into constants X train etc.. These can be imported into your classification program with import mnist as mn.
Write functions:
- def train( X, Y, numIter, learningRate) This sets up the vector of βs and find the βs by calling the gradient function. (Recall: the beta vector is updated in each iteration by -gradient(. . .)*learningRate)
- def classify(X, beta) which simply rounds and returns the Yˆ (the pre- dicted result for the logistic regression).
- def test(X, Y, beta) that computes the percentage of correct classification from: classify(X, beta). (Note the count of correct results are obtained by summing classify(X, beta) == Y. Recall, Y = 1 for the selected digit and zeros for the rest). The function should print the loss for every tenth iteration.
- def loss(. . .) To compute the log-loss function
- def gradient(X, Y, beta)
- def sigPredict(X, beta)
Run train for 100 iterations with a learningRate = 1.e-5 and then 1000 iterations with a learningRate = 1.e-3 to experiment.
Please screenshot the process & output in Python language.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started