Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this first task, you will create a deep learning model to classify images of skin lesions into one of seven classes: 1.

In this first task, you will create a deep learning model to classify images of skin lesions into one of seven classes:  

1.   "MEL" = Melanoma
2.   "NV" = Melanocytic nevus
3.   "BCC" = Basal cell carcinoma
4.   "AKIEC" = Actinic keratosis
5.   "BKL" = Benign keratosis
6.   "DF" = Dermatofibroma
7.   "VASC" = Vascular lesion

 

The data for this task is a subset of: https://challenge2018.isic-archive.com/task3/

The data for this task is inside the `/content/data/img` folder. It contains ~3,800 images named like `ISIC_000000.jpg` and the following label files:

*   `/content/data/img/train.csv`
*   `/content/data/img/val.csv`
*   `/content/data/img/train_small.csv`
*   `/content/data/img/val_small.csv`

The `small` versions are the first 200 lines of each partition and are included for debugging purposes. To save time, ensure your code runs on the `small` versions first.

 

## Task 1a. Explore the training set

**INSTRUCTIONS**: Check for data issues, as we have done in the labs. Check the class distribution and at least 1 other potential data issue. Hint: Look in `explore.py` for a function that can plot the class distribution.

 (Please Answer!)**REPORT**: What did you check for? What data issues are present in this dataset?

!Complete the following code based on the issue

 

import pandas as pd

IMG_CLASS_NAMES = ["MEL", "NV", "BCC", "AKIEC", "BKL", "DF", "VASC"]

train_df = pd.read_csv('/content/data/img/train.csv')
val_df = pd.read_csv('/content/data/img/val.csv')
train_df.head()
from PIL import Image
# Change the filename to view other examples from the dataset
display(Image.open('/content/data/img/ISIC_0024306.jpg'))
import explore

# TODO - Check for data issues
# Hint: You can convert from one-hot to integers with argmax
#       This way you can convert 1, 0, 0, 0, 0, 0, 0  to class 0
#                                0, 1, 0, 0, 0, 0, 0  to class 1
#                                0, 0, 1, 0, 0, 0, 0  to class 2
# so it should be something like the following:
# train_labels = train_df.values[....].argmax(....)
# val_labels = val_df.values[....].argmax(....)
#     - you need to fill in the ... parts with the correct values.
# You should then print output the contents of train_labels to see if
# it matches the contents of train.csv
#
# Next you can plot the class distributions like the following:
# explore.plot_label_distribution(....)
#    - do the above for both the train and val labels.
#
# Following this look for other potential problems with the data
#   You may also think of any other potential problems with the data.
 

 

## Task 1b. Implement Training loop

 

**INSTRUCTIONS**:

*   Implement LesionDataset in `datasets.py`. Use the cell below to test your implementation.
*   Implement the incomplete functions in `train.py` marked as "Task 1b"
*   Go to the [Model Training Cell](#task-1-model-training) at the end of Task 1 and fill in the required code for "Task 1b".

(please answer!)**REPORT**: Why should you *not use* `random_split` in your code here?

!Complete the following code based on the issue

import datasets

ds = datasets.LesionDataset('/content/data/img',
                            '/content/data/img/train.csv')
input, label = ds[0]
print(input)
print(label)
 

## Task 1c. Implement a baseline convolutional neural network

You will implement a baseline convolutional neural network which you can compare results to. This allows you to evaluate any improvements made by hyperparameter tuning or transfer learning.

**INSTRUCTIONS**:

*   Implement a `SimpleBNConv` in `models.py` with:
   *   5 `nn.Conv2d` layers, with 8, 16, 32, 64, 128 output channels respectively, with the following between each convolution layer:
       *   `nn.ReLU()` for the activation function, and
       *   `nn.BatchNorm2d`, and
       *   finally a `nn.MaxPool2d` to downsample by a factor of 2.
*   Use a normalised confusion matrix on the model's validation predictions in `train.py`.
*  Go to the [Model Training Cell](#task-1-model-training) at the end of Task 1 and fill in the required code to train the model.

Training should take about 1 minute/epoch. Validation accuracy should be 60-70%, but UAR should be around 20-40%.

(Please answer!!)

**REPORT**: As training sets get larger, the length of time per epoch also gets larger. Some datasets take over an hour per epoch. This makes it impractical to debug typos in your code since it can take hours after starting for the program to reach new code. Name two ways to significantly reduce how long **each** epoch takes - for debugging purposes - while still using real data and using the real training code.

**REPORT**: Show the confusion matrix and plots of the validation accuracy and UAR in your report, and explain what is going wrong.
 !Complete the following code based on the issue

 

## Task 1d. Account for data issues

**MARKS**: 12 (Code 8, Reports 4)

**INSTRUCTIONS**: Account for the data issues in Task 1a and retrain your model.

(Please Answer!)**REPORT**: How did you account for the data issues? Was it effective? How can you tell? Show another confusion matrix.

**IMPORTANT NOTE**: One of the techniques from the lab will cause a warning in the metric calculation on `train_small.csv`, but will work fine on `train.csv`.

!Complete the following code based on the issue

 

 


## Model Training Cell

Based on what you have implemented in the above sections, you can try to complete the whole training process here.

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.utils.data.sampler import WeightedRandomSampler

import datasets
import models
import train

torch.manual_seed(42)

NUM_EPOCHS = 5
BATCH_SIZE = 64

# Create datasets/loaders
# TODO Task 1b - Create the data loaders from LesionDatasets
# TODO Task 1d - Account for data issues, if applicable
# train_dataset = ...
# val_dataset = ...
# train_loader = ...
# val_loader = ...

# Instantiate model, optimizer and criterion
# TODO Task 1c - Make an instance of your model
# TODO Task 1d - Account for data issues, if applicable
# model = ...


optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Train model
# TODO Task 1c: Set ident_str to a string that identifies this particular
#               training run. Note this line in the training code
#                     exp_name = f"{model.__class__.__name__}_{ident_str}"
#               So it means the the model class name is already included in the
#               exp_name string. You can consider adding other information
#               particular to this training run, e.g. learning rate (lr) used,
#               augmentation (aug) used or not, etc.

train.train_model(model, train_loader, val_loader, optimizer, criterion,
                  IMG_CLASS_NAMES, NUM_EPOCHS, project_name="CSE3001 Assignment Task 1",
                  ident_str= "fill me in here")

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

An Introduction to Programming with C++

Authors: Diane Zak

8th edition

978-1285860114

More Books

Students also viewed these Programming questions

Question

The cout Answered: 1 week ago

Answered: 1 week ago