Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

The data represent the log - transformed Mel spectrograms derived from the GTZAN dataset. The original GTZAN dataset contains 3 0 - second audio files

The data represent the log

-

transformed Mel spectrograms derived from the GTZAN dataset. The original GTZAN dataset contains

30 -

second audio files of

1, 000

songs associated with

10

different genres

(100

per genre

) .

We have reduced the original data to

4

genres

(400

songs

)

and transformed it to obtain, for each song,

15

log

-

transformed Mel spectrograms. Each Mel spectrogram is an image file which describes the time, frequency and intensity of a song segment. In particular, the x

-

axis represents time, the y

-

axis is a transformation of the frequency

(

to log scale and then the so

-

called mel scale

)

and the color of a point represents the decibels of that frequency at that time

(

with darker colours indicating lower decibels

) .

Here you can see an example of Mel spectrogram

(

x and y ticks identify the pixels making up the picture

)

alt text

The training data represent approximately

66 %

of the total number of data points, the validation set

14 %

and test set

30 % .

The labels of the classes are such that:

the first class corresponds to classical music

the second to disco music

the third to metal music

the fourth to rock music

1 -

CNN

For this exercise, you must use the CPU runtime.

The goal is to train a CNN

-

based classifier on the Mel spectrograms to predict the corresponding music genres. Implement the following CNN architecture:

2

D convolutional layer with

4

channels of squared filters of size

5,

padding, default stride, ReLU activation function and default weight and bias initialisations.

2

D max pooling layer with size

2

and stride

2 .

2

D convolutional layer with

8

channels of squared filters of size

5,

padding, default stride, ReLU activation function and default weight and bias initialisations.

2

D max pooling layer with size

2

and stride

2 .

2

D convolutional layer with

16

channels of squared filters of size

5,

padding, default stride, ReLU activation function and default weight and bias initialisations.

2

D max pooling layer with size

2

and stride

2 .

a layer transforming the output filters to a

1

D vector.

a dense layer made of

50

neurons, ReLU activation and L

2

regularisation with a penalty of

0.01

an output layer with the required number of neurons and activation function.

Compile the model using an appropriate evaluation metric and loss function. To optimise, use the mini

-

batch stochastic gradient descent algorithm with batch size

32 .

Train the model for

20

epochs.

IMPORTANT:

-

For reproducibility of the results, before training your model you must run the following

2

lines of code to fix the seeds:

.

keras.utils.set

_

random

_

seed

(42)

.

config.experimental.enable

_

_

determinism

()

Answer to the following questions:

1 .

How many parameters does the model train? Before performing the training, do you expect this model to overfit? Which aspects would influence the overfitting

(

or not

)

of this model?

2 .

Plot the loss function and the accuracy per epoch for the train and validation sets.

3 .

Which accuracy do you obtain on the test set?

4 .

Using the function plot

_

confusion

_

matrix plot the confusion matrices of the classification task on the train set and test set. What do you observe from this metric? Which classes display more correct predictions? And wrong?

5 .

Using the function ind

_

correct

_

uncorrect extract the indexes of the training data that were predicted correctly and incorrectly, per each class. For each music genre, perform the following steps:

Using the function plot

_

spectrograms plot the

12

mel spectrograms of the first

6

data points which were predicted correctly and the first

6

which were predicted wrongly. Do you observe some differences among music genres?

Using the function print

_

wrong

_

prediction print the predicted classes of the first

6

data points which were predicted wrongly.

Using the Grad

-

CAM method, implemented in the function plot

_

gradcam

_

spectrogram, print the heatmaps of the last pooling layer for the same

12

extracts

(6

correct

+ 6

wrong

) .

Comment on the heatmaps obtained. Do you observe differences among the heatmaps of different music genres? Can you understand why the model got some predictions wrong?

6 .

Comment on the previous question: what are your thoughts about the applicability of the Grad

-

CAM tool on these data?

2 -

Disentangling time and frequency

The images we are using in this assignment are different from a usual picture: the x and y axes carry different meanings. With the tools we are exploring during lectures and seminars, can you propose a CNN architecture that takes into account differently the time and frequency components of the spectrograms?

Present and describe the architecture you have chosen and justify the rationale behind it

.

Plot training and validation loss and accuracy over

20

epochs

(

this time you can use the GPU runtime if the model is slow to train

) .

Print the accuracy on the test set and the confusion matrices on the training and test sets.

Data:

_

test

_

train

_

val

_

test

_

test

_

num

_

train

_

train

_

num

_

val

_

val

_

num

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Time Series Databases New Ways To Store And Access Data

Authors: Ted Dunning, Ellen Friedman

1st Edition

★★★★★

An economy is operating with output $400 billion below its natural rate, and fiscal policymakers want to close this recessionary gap. The central bank agrees to adjust the money supply to hold the...

Answered: 1 week ago

Previous Question Next Question