Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

Data Processing As you could see in the above plot, the images are grayscale images have pixel values that range from 0 to 2 5

Data Processing

As you could see in the above plot, the images are grayscale images have pixel values that range from

0

255 .

Also, these images have a dimension of

28

28 .

As a result, you'll need to preprocess the data before you feed it into the model.

As a first step, convert each

28

28

image of the train and test set into a matrix of size

28

28

1

which is fed into the network.

#Reshape data

train

_

=

train

_

.

reshape

(- 1, 28, 28, 1)

test

_

=

test

_

.

reshape

(- 1, 28, 28, 1)

train

_

.

shape, test

_

.

shape

Print the shape of the data to determine its size and print the output here.

[1

mark

]

The data right now is in an int

8

format, so before you feed it into the network you need to convert its type to float

32,

and you also have to rescale the pixel values in range

0 - 1

inclusive. So let's do that!

#Normalize data between

0

and

1

train

_

=

train

_

.

astype

('

float

32')

test

_

=

test

_

.

astype

('

float

32')

train

_

=

train

_

/ 255 .

test

_

=

test

_

/ 255 .

In one

-

hot encoding, you convert the categorical data into a vector of numbers. The reason why you convert the categorical data in one hot encoding is that machine learning algorithms cannot work with categorical data directly. You generate one boolean column for each category or class. Only one of these columns could take on the value

1

for each sample. Hence, the term one

-

hot encoding.

For your problem statement, the one hot encoding will be a row vector, and for each image, it will have a dimension of

1

10 .

The important thing to note here is that the vector consists of all zeros except for the class that it represents, and for that, it is

1 .

For example, the ankle boot image that you plotted above has a label of

9,

so for all the ankle boot images, the one hot encoding vector would be

[0 0 0 0 0 0 0 0 1 0] .

So let's convert the training and testing labels into one

-

hot encoding vectors:

# Change the labels from categorical to one

-

hot encoding

train

_

_

one

_

hot

=

_

categorical

(

train

_

)

test

_

_

one

_

hot

=

_

categorical

(

test

_

)

Please print the output of above given commands

[1

Mark

]

Splitting the data into training and validation sets

This next step is a crucial one. In machine learning or any data specific task, you should partition the data correctly. For the model to generalize well, you split the training data into two parts, one designed for training and another one for validation. In this case, you will train the model on

80 \ %

of the training data and validate it on

20 \ %

of the remaining training data. This will also help to reduce overfitting since you will be validating the model on the data it would not have seen in training phase, which will help in boosting the test performance.

#Split the training and testing data into

80

and

20

configuration

train

_

,

valid

_

,

train

_

label,valid

_

label

=

train

_

test

_

split

(

train

_

,

train

_

_

one

_

hot, test

_

size

= 0.2,

random

_

state

= 13)

Building the Deep Neural Network

The images are of size

28

28 .

You convert the image matrix to an array, rescale it between

0

and

1,

reshape it so that it's of size

28

28

1,

and feed this as an input to the network.

You'll use three convolutional layers:

The first layer will have

32 - 3

3

filters,

The second layer will have

64 - 3

3

filters and

The third layer will have

128 - 3

3

filters.

In addition, there are three max

-

pooling layers each of size

2

2 .

You will use a batch size of

64

using a higher batch size of

128

256

is also preferable it all depends on the memory. It contributes massively to determining the learning parameters and affects the prediction accuracy. You will train the network for

20

epochs.

cnn

=

.

keras.models.Sequential

()

batch

_

size

= 64

epochs

= 20

num

_

classes

= 10

In Keras, you can just stack up layers by adding the desired layer one by one. That's exactly what you'll do here: you'll first add a first convolutional layer with Conv

2

() .

Note that you use this function because you're working with images! Next, you add the Leaky ReLU activation function which helps the network learn non

-

linear decision boundaries. Since you have ten different classes, you'll need a non

-

linear decision boundary that could separate these ten classes which are not linearly separable.

More specifically, you add Leaky ReLUs because they attempt to fix the problem of dying Rectified Linear Units

(

ReLUs

) .

The ReLU activation function is used a lot in neural network architectures and more specifically in convolutional networks, where it has proven to be more effective than the widely used logistic sigmoid function. As of

2017,

this activation function is the most popular one for deep neural networks. The ReLU function allows the activation to be thresholded at zero. However, during the training, ReLU units can "die". This can happen when a large gradient flows through a ReLU neuron: it can cause the weights to update in such a way that the neuron will never activate on any data point again. If this happens, then the gradient flowing th