Transfer Learning with PyTorch
Published:
This blog post aims to teach readers how to use pre-trained models and perform different types of transfer learning. The post is divided into six sections, including an introduction, setup and loading data, creating datasets and data loaders, getting and customizing a pre-trained model, training the model, and evaluating the model.
The post is compatible with Google Colaboratory with Pytorch version 1.12.1+cu113 and can be accessed through this link:
Transfer Learning with PyTorch
- Developed by Armin Norouzi
Compatible with Google Colaboratory
- Objective: Using pretrained model and perfrom different kinds of transfer learning
Table of content:
- Introduction
- Setup and Loading Data
- Create Datasets and DataLoaders
- Get and Customise a Pretrained Model
- Train Model
- Evaluate Model
Introduction
There are two main benefits to using transfer learning:
- Can leverage an existing neural network architecture proven to work on problems similar to our own.
- Can leverage a working neural network architecture which has already learned patterns on similar data to our own. This often results in achieving great results with less custom data.
In other words, instead of training our own models from scratch on our own datasets, we can take the patterns a model has learned from datasets such as ImageNet and use them as the foundation of our own. Doing this often leads to getting great results with less data.
Using GPU is highly recommended in transfer learning.
# Are we using a GPU?
!nvidia-smi
Thu Sep 8 15:59:37 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 36C P0 35W / 250W | 1431MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Where to find pretrained models
The world of deep learning is an amazing place.
So amazing that many people around the world share their work.
Often, code and pretrained models for the latest state-of-the-art research is released within a few days of publishing.
And there are several places you can find pretrained models to use for your own problems.
Location | What’s there? | Link(s) |
---|---|---|
PyTorch domain libraries | Each of the PyTorch domain libraries (torchvision , torchtext ) come with pretrained models of some form. The models there work right within PyTorch. | torchvision.models , torchtext.models , torchaudio.models , torchrec.models |
HuggingFace Hub | A series of pretrained models on many different domains (vision, text, audio and more) from organizations around the world. There’s plenty of different datasets too. | https://huggingface.co/models, https://huggingface.co/datasets |
timm (PyTorch Image Models) library | Almost all of the latest and greatest computer vision models in PyTorch code as well as plenty of other helpful computer vision features. | https://github.com/rwightman/pytorch-image-models |
Paperswithcode | A collection of the latest state-of-the-art machine learning papers with code implementations attached. You can also find benchmarks here of model performance on different tasks. | https://paperswithcode.com/ |
Food101: Smaller data set with three class
Setup and Loading Data
Importing neccessary libraries
# For this notebook to run with updated APIs, we need torch 1.12+ and torchvision 0.13+
try:
import torch
import torchvision
assert int(torch.__version__.split(".")[1]) >= 12, "torch version should be 1.12+"
assert int(torchvision.__version__.split(".")[1]) >= 13, "torchvision version should be 0.13+"
print(f"torch version: {torch.__version__}")
print(f"torchvision version: {torchvision.__version__}")
except:
print(f"[INFO] torch/torchvision versions not as required, installing nightly versions.")
!pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
import torch
import torchvision
print(f"torch version: {torch.__version__}")
print(f"torchvision version: {torchvision.__version__}")
torch version: 1.12.1+cu113
torchvision version: 0.13.1+cu113
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision
from torch import nn
from torchvision import transforms
from torchvision import datasets, transforms
# Try to get torchinfo, install it if it doesn't work
try:
from torchinfo import summary
except:
print("[INFO] Couldn't find torchinfo... installing it.")
!pip install -q torchinfo
from torchinfo import summary
Helper function
"""
Contains functionality for creating PyTorch DataLoaders for
image classification data.
"""
import os
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
NUM_WORKERS = os.cpu_count()
def create_dataloaders(
train_dir: str,
test_dir: str,
transform: transforms.Compose,
batch_size: int,
num_workers: int=NUM_WORKERS
):
"""Creates training and testing DataLoaders.
Takes in a training directory and testing directory path and turns
them into PyTorch Datasets and then into PyTorch DataLoaders.
Args:
train_dir: Path to training directory.
test_dir: Path to testing directory.
transform: torchvision transforms to perform on training and testing data.
batch_size: Number of samples per batch in each of the DataLoaders.
num_workers: An integer for number of workers per DataLoader.
Returns:
A tuple of (train_dataloader, test_dataloader, class_names).
Where class_names is a list of the target classes.
Example usage:
train_dataloader, test_dataloader, class_names = \
= create_dataloaders(train_dir=path/to/train_dir,
test_dir=path/to/test_dir,
transform=some_transform,
batch_size=32,
num_workers=4)
"""
# Use ImageFolder to create dataset(s)
train_data = datasets.ImageFolder(train_dir, transform=transform)
test_data = datasets.ImageFolder(test_dir, transform=transform)
# Get class names
class_names = train_data.classes
# Turn images into data loaders
train_dataloader = DataLoader(
train_data,
batch_size=batch_size,
shuffle=True,
num_workers=num_workers,
pin_memory=True,
)
test_dataloader = DataLoader(
test_data,
batch_size=batch_size,
shuffle=False,
num_workers=num_workers,
pin_memory=True,
)
return train_dataloader, test_dataloader, class_names
"""
Contains functions for training and testing a PyTorch model.
"""
import torch
from tqdm.auto import tqdm
from typing import Dict, List, Tuple
def train_step(model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
optimizer: torch.optim.Optimizer,
device: torch.device) -> Tuple[float, float]:
"""Trains a PyTorch model for a single epoch.
Turns a target PyTorch model to training mode and then
runs through all of the required training steps (forward
pass, loss calculation, optimizer step).
Args:
model: A PyTorch model to be trained.
dataloader: A DataLoader instance for the model to be trained on.
loss_fn: A PyTorch loss function to minimize.
optimizer: A PyTorch optimizer to help minimize the loss function.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A tuple of training loss and training accuracy metrics.
In the form (train_loss, train_accuracy). For example:
(0.1112, 0.8743)
"""
# Put model in train mode
model.train()
# Setup train loss and train accuracy values
train_loss, train_acc = 0, 0
# Loop through data loader data batches
for batch, (X, y) in enumerate(dataloader):
# Send data to target device
X, y = X.to(device), y.to(device)
# 1. Forward pass
y_pred = model(X)
# 2. Calculate and accumulate loss
loss = loss_fn(y_pred, y)
train_loss += loss.item()
# 3. Optimizer zero grad
optimizer.zero_grad()
# 4. Loss backward
loss.backward()
# 5. Optimizer step
optimizer.step()
# Calculate and accumulate accuracy metric across all batches
y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
train_acc += (y_pred_class == y).sum().item()/len(y_pred)
# Adjust metrics to get average loss and accuracy per batch
train_loss = train_loss / len(dataloader)
train_acc = train_acc / len(dataloader)
return train_loss, train_acc
def test_step(model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
device: torch.device) -> Tuple[float, float]:
"""Tests a PyTorch model for a single epoch.
Turns a target PyTorch model to "eval" mode and then performs
a forward pass on a testing dataset.
Args:
model: A PyTorch model to be tested.
dataloader: A DataLoader instance for the model to be tested on.
loss_fn: A PyTorch loss function to calculate loss on the test data.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A tuple of testing loss and testing accuracy metrics.
In the form (test_loss, test_accuracy). For example:
(0.0223, 0.8985)
"""
# Put model in eval mode
model.eval()
# Setup test loss and test accuracy values
test_loss, test_acc = 0, 0
# Turn on inference context manager
with torch.inference_mode():
# Loop through DataLoader batches
for batch, (X, y) in enumerate(dataloader):
# Send data to target device
X, y = X.to(device), y.to(device)
# 1. Forward pass
test_pred_logits = model(X)
# 2. Calculate and accumulate loss
loss = loss_fn(test_pred_logits, y)
test_loss += loss.item()
# Calculate and accumulate accuracy
test_pred_labels = test_pred_logits.argmax(dim=1)
test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))
# Adjust metrics to get average loss and accuracy per batch
test_loss = test_loss / len(dataloader)
test_acc = test_acc / len(dataloader)
return test_loss, test_acc
def train(model: torch.nn.Module,
train_dataloader: torch.utils.data.DataLoader,
test_dataloader: torch.utils.data.DataLoader,
optimizer: torch.optim.Optimizer,
loss_fn: torch.nn.Module,
epochs: int,
device: torch.device) -> Dict[str, List]:
"""Trains and tests a PyTorch model.
Passes a target PyTorch models through train_step() and test_step()
functions for a number of epochs, training and testing the model
in the same epoch loop.
Calculates, prints and stores evaluation metrics throughout.
Args:
model: A PyTorch model to be trained and tested.
train_dataloader: A DataLoader instance for the model to be trained on.
test_dataloader: A DataLoader instance for the model to be tested on.
optimizer: A PyTorch optimizer to help minimize the loss function.
loss_fn: A PyTorch loss function to calculate loss on both datasets.
epochs: An integer indicating how many epochs to train for.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A dictionary of training and testing loss as well as training and
testing accuracy metrics. Each metric has a value in a list for
each epoch.
In the form: {train_loss: [...],
train_acc: [...],
test_loss: [...],
test_acc: [...]}
For example if training for epochs=2:
{train_loss: [2.0616, 1.0537],
train_acc: [0.3945, 0.3945],
test_loss: [1.2641, 1.5706],
test_acc: [0.3400, 0.2973]}
"""
# Create empty results dictionary
results = {"train_loss": [],
"train_acc": [],
"test_loss": [],
"test_acc": []
}
# Make sure model on target device
model.to(device)
# Loop through training and testing steps for a number of epochs
for epoch in tqdm(range(epochs)):
train_loss, train_acc = train_step(model=model,
dataloader=train_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
device=device)
test_loss, test_acc = test_step(model=model,
dataloader=test_dataloader,
loss_fn=loss_fn,
device=device)
# Print out what's happening
print(
f"Epoch: {epoch+1} | "
f"train_loss: {train_loss:.4f} | "
f"train_acc: {train_acc:.4f} | "
f"test_loss: {test_loss:.4f} | "
f"test_acc: {test_acc:.4f}"
)
# Update results dictionary
results["train_loss"].append(train_loss)
results["train_acc"].append(train_acc)
results["test_loss"].append(test_loss)
results["test_acc"].append(test_acc)
# Return the filled results at the end of the epochs
return results
# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device
'cuda'
# Create function to unzip a zipfile into current working directory
# (since we're going to be downloading and unzipping a few files)
import zipfile
def unzip_data(filename):
"""
Unzips filename into the current working directory.
Args:
filename (str): a filepath to a target zip folder to be unzipped.
"""
zip_ref = zipfile.ZipFile(filename, "r")
zip_ref.extractall()
zip_ref.close()
# Walk through an image classification directory and find out how many files (images)
# are in each subdirectory.
import os
def walk_through_dir(dir_path):
"""
Walks through dir_path returning its contents.
Args:
dir_path (str): target directory
Returns:
A print out of:
number of subdiretories in dir_path
number of images (files) in each subdirectory
name of each subdirectory
"""
for dirpath, dirnames, filenames in os.walk(dir_path):
print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")
Downloading data:
A series of images of pizza, steak and sushi in standard image classification format.
# Download and unzip 10 classes of data with all images
!wget https://gitlab.com/arminny/ml_course_datasets/-/raw/main/pizza_steak_sushi.zip
unzip_data("pizza_steak_sushi.zip")
# Setup data directories
train_dir = "train/"
test_dir = "test/"
--2022-09-08 15:59:37-- https://gitlab.com/arminny/ml_course_datasets/-/raw/main/pizza_steak_sushi.zip
Resolving gitlab.com (gitlab.com)... 172.65.251.78, 2606:4700:90:0:f22e:fbec:5bed:a9b9
Connecting to gitlab.com (gitlab.com)|172.65.251.78|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15737296 (15M) [application/octet-stream]
Saving to: ‘pizza_steak_sushi.zip.3’
pizza_steak_sushi.z 100%[===================>] 15.01M 77.7MB/s in 0.2s
2022-09-08 15:59:37 (77.7 MB/s) - ‘pizza_steak_sushi.zip.3’ saved [15737296/15737296]
# How many images are we working with now?
walk_through_dir("train")
There are 3 directories and 0 images in 'train'.
There are 0 directories and 78 images in 'train/pizza'.
There are 0 directories and 75 images in 'train/steak'.
There are 0 directories and 72 images in 'train/sushi'.
# How many images are we working with now?
walk_through_dir("test")
There are 3 directories and 0 images in 'test'.
There are 0 directories and 25 images in 'test/pizza'.
There are 0 directories and 19 images in 'test/steak'.
There are 0 directories and 31 images in 'test/sushi'.
Create Datasets and DataLoaders
# Get a set of pretrained model weights
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # .DEFAULT = best available weights from pretraining on ImageNet
weights
EfficientNet_B0_Weights.IMAGENET1K_V1
And now to access the transforms assosciated with our weights
, we can use the transforms()
method.
This is essentially saying “get the data transforms that were used to train the EfficientNet_B0_Weights
on ImageNet”.
# Get the transforms used to create our pretrained weights
auto_transforms = weights.transforms()
auto_transforms
ImageClassification(
crop_size=[224]
resize_size=[256]
mean=[0.485, 0.456, 0.406]
std=[0.229, 0.224, 0.225]
interpolation=InterpolationMode.BICUBIC
)
The benefit of automatically creating a transform through weights.transforms()
is that you ensure you’re using the same data transformation as the pretrained model used when it was trained.
However, the tradeoff of using automatically created transforms is a lack of customization.
We can use auto_transforms
to create DataLoaders with create_dataloaders()
just as before.
# Create training and testing DataLoaders as well as get a list of class names
train_dataloader, test_dataloader, class_names = create_dataloaders(train_dir=train_dir,
test_dir=test_dir,
transform=auto_transforms, # perform same data transforms on our own data as the pretrained model
batch_size=32) # set mini-batch size to 32
train_dataloader, test_dataloader, class_names
(<torch.utils.data.dataloader.DataLoader at 0x7f3e08919450>,
<torch.utils.data.dataloader.DataLoader at 0x7f3e088b2750>,
['pizza', 'steak', 'sushi'])
Get and Customise a Pretrained Model
Exploring the documentation, you’ll find plenty of common computer vision architecture backbones such as:
Architecuture backbone | Code |
---|---|
ResNet’s | torchvision.models.resnet18() , torchvision.models.resnet50() … |
VGG (similar to what we used for TinyVGG) | torchvision.models.vgg16() |
EfficientNet’s | torchvision.models.efficientnet_b0() , torchvision.models.efficientnet_b1() … |
VisionTransformer (ViT’s) | torchvision.models.vit_b_16() , torchvision.models.vit_b_32() … |
ConvNeXt | torchvision.models.convnext_tiny() , torchvision.models.convnext_small() … |
More available in torchvision.models | torchvision.models... |
The pretrained model we’re going to be using is torchvision.models.efficientnet_b0()
.
We can setup the EfficientNet_B0
pretrained ImageNet weights using the same code as we used to create the transforms.
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # .DEFAULT = best available weights for ImageNet
This means the model has already been trained on millions of images and has a good base representation of image data.
The PyTorch version of this pretrained model is capable of achieving ~77.7% accuracy across ImageNet’s 1000 classes.
We’ll also send it to the target device.
# Setup the model with pretrained weights and send it to the target device (torchvision v0.13+)
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # .DEFAULT = best available weights
model = torchvision.models.efficientnet_b0(weights=weights).to(device)
# Print a summary using torchinfo (uncomment for actual output)
summary(model=model,
input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
# col_names=["input_size"], # uncomment for smaller output
col_names=["input_size", "output_size", "num_params", "trainable"],
col_width=20,
row_settings=["var_names"]
)
============================================================================================================================================
Layer (type (var_name)) Input Shape Output Shape Param # Trainable
============================================================================================================================================
EfficientNet (EfficientNet) [32, 3, 224, 224] [32, 1000] -- True
├─Sequential (features) [32, 3, 224, 224] [32, 1280, 7, 7] -- True
│ └─Conv2dNormActivation (0) [32, 3, 224, 224] [32, 32, 112, 112] -- True
│ │ └─Conv2d (0) [32, 3, 224, 224] [32, 32, 112, 112] 864 True
│ │ └─BatchNorm2d (1) [32, 32, 112, 112] [32, 32, 112, 112] 64 True
│ │ └─SiLU (2) [32, 32, 112, 112] [32, 32, 112, 112] -- --
│ └─Sequential (1) [32, 32, 112, 112] [32, 16, 112, 112] -- True
│ │ └─MBConv (0) [32, 32, 112, 112] [32, 16, 112, 112] 1,448 True
│ └─Sequential (2) [32, 16, 112, 112] [32, 24, 56, 56] -- True
│ │ └─MBConv (0) [32, 16, 112, 112] [32, 24, 56, 56] 6,004 True
│ │ └─MBConv (1) [32, 24, 56, 56] [32, 24, 56, 56] 10,710 True
│ └─Sequential (3) [32, 24, 56, 56] [32, 40, 28, 28] -- True
│ │ └─MBConv (0) [32, 24, 56, 56] [32, 40, 28, 28] 15,350 True
│ │ └─MBConv (1) [32, 40, 28, 28] [32, 40, 28, 28] 31,290 True
│ └─Sequential (4) [32, 40, 28, 28] [32, 80, 14, 14] -- True
│ │ └─MBConv (0) [32, 40, 28, 28] [32, 80, 14, 14] 37,130 True
│ │ └─MBConv (1) [32, 80, 14, 14] [32, 80, 14, 14] 102,900 True
│ │ └─MBConv (2) [32, 80, 14, 14] [32, 80, 14, 14] 102,900 True
│ └─Sequential (5) [32, 80, 14, 14] [32, 112, 14, 14] -- True
│ │ └─MBConv (0) [32, 80, 14, 14] [32, 112, 14, 14] 126,004 True
│ │ └─MBConv (1) [32, 112, 14, 14] [32, 112, 14, 14] 208,572 True
│ │ └─MBConv (2) [32, 112, 14, 14] [32, 112, 14, 14] 208,572 True
│ └─Sequential (6) [32, 112, 14, 14] [32, 192, 7, 7] -- True
│ │ └─MBConv (0) [32, 112, 14, 14] [32, 192, 7, 7] 262,492 True
│ │ └─MBConv (1) [32, 192, 7, 7] [32, 192, 7, 7] 587,952 True
│ │ └─MBConv (2) [32, 192, 7, 7] [32, 192, 7, 7] 587,952 True
│ │ └─MBConv (3) [32, 192, 7, 7] [32, 192, 7, 7] 587,952 True
│ └─Sequential (7) [32, 192, 7, 7] [32, 320, 7, 7] -- True
│ │ └─MBConv (0) [32, 192, 7, 7] [32, 320, 7, 7] 717,232 True
│ └─Conv2dNormActivation (8) [32, 320, 7, 7] [32, 1280, 7, 7] -- True
│ │ └─Conv2d (0) [32, 320, 7, 7] [32, 1280, 7, 7] 409,600 True
│ │ └─BatchNorm2d (1) [32, 1280, 7, 7] [32, 1280, 7, 7] 2,560 True
│ │ └─SiLU (2) [32, 1280, 7, 7] [32, 1280, 7, 7] -- --
├─AdaptiveAvgPool2d (avgpool) [32, 1280, 7, 7] [32, 1280, 1, 1] -- --
├─Sequential (classifier) [32, 1280] [32, 1000] -- True
│ └─Dropout (0) [32, 1280] [32, 1280] -- --
│ └─Linear (1) [32, 1280] [32, 1000] 1,281,000 True
============================================================================================================================================
Total params: 5,288,548
Trainable params: 5,288,548
Non-trainable params: 0
Total mult-adds (G): 12.35
============================================================================================================================================
Input size (MB): 19.27
Forward/backward pass size (MB): 3452.35
Params size (MB): 21.15
Estimated Total Size (MB): 3492.77
============================================================================================================================================
Our efficientnet_b0
comes in three main parts:
features
- A collection of convolutional layers and other various activation layers to learn a base representation of vision data (this base representation/collection of layers is often referred to as features or feature extractor, “the base layers of the model learn the different features of images”).avgpool
- Takes the average of the output of thefeatures
layer(s) and turns it into a feature vector.classifier
- Turns the feature vector into a vector with the same dimensionality as the number of required output classes (sinceefficientnet_b0
is pretrained on ImageNet and because ImageNet has 1000 classes,out_features=1000
is the default).
3.4 Freezing the base model and changing the output layer to suit our needs
The process of transfer learning usually goes: freeze some base layers of a pretrained model (typically the features
section) and then adjust the output layers (also called head/classifier layers) to suit your needs.
# Freeze all base layers in the "features" section of the model (the feature extractor) by setting requires_grad=False
for param in model.features.parameters():
param.requires_grad = False
Feature extractor layers frozen!
Let’s now adjust the output layer or the classifier
portion of our pretrained model to our needs.
Right now our pretrained model has out_features=1000
because there are 1000 classes in ImageNet.
However, we don’t have 1000 classes, we only have three, pizza, steak and sushi.
We can change the classifier
portion of our model by creating a new series of layers.
The current classifier
consists of:
(classifier): Sequential(
(0): Dropout(p=0.2, inplace=True)
(1): Linear(in_features=1280, out_features=1000, bias=True)
And we’ll keep in_features=1280
for our Linear
output layer but we’ll change the out_features
value to the length of our class_names
(len(['pizza', 'steak', 'sushi']) = 3
).
Our new classifier
layer should be on the same device as our model
.
# Set the manual seeds
torch.manual_seed(42)
torch.cuda.manual_seed(42)
# Get the length of class_names (one output unit for each class)
output_shape = len(class_names)
# Recreate the classifier layer and seed it to the target device
model.classifier = torch.nn.Sequential(
torch.nn.Dropout(p=0.2, inplace=True),
torch.nn.Linear(in_features=1280,
out_features=output_shape, # same number of output units as our number of classes
bias=True)).to(device)
# # Do a summary *after* freezing the features and changing the output classifier layer (uncomment for actual output)
summary(model,
input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape" (batch_size, color_channels, height, width)
verbose=0,
col_names=["input_size", "output_size", "num_params", "trainable"],
col_width=20,
row_settings=["var_names"]
)
============================================================================================================================================
Layer (type (var_name)) Input Shape Output Shape Param # Trainable
============================================================================================================================================
EfficientNet (EfficientNet) [32, 3, 224, 224] [32, 3] -- Partial
├─Sequential (features) [32, 3, 224, 224] [32, 1280, 7, 7] -- False
│ └─Conv2dNormActivation (0) [32, 3, 224, 224] [32, 32, 112, 112] -- False
│ │ └─Conv2d (0) [32, 3, 224, 224] [32, 32, 112, 112] (864) False
│ │ └─BatchNorm2d (1) [32, 32, 112, 112] [32, 32, 112, 112] (64) False
│ │ └─SiLU (2) [32, 32, 112, 112] [32, 32, 112, 112] -- --
│ └─Sequential (1) [32, 32, 112, 112] [32, 16, 112, 112] -- False
│ │ └─MBConv (0) [32, 32, 112, 112] [32, 16, 112, 112] (1,448) False
│ └─Sequential (2) [32, 16, 112, 112] [32, 24, 56, 56] -- False
│ │ └─MBConv (0) [32, 16, 112, 112] [32, 24, 56, 56] (6,004) False
│ │ └─MBConv (1) [32, 24, 56, 56] [32, 24, 56, 56] (10,710) False
│ └─Sequential (3) [32, 24, 56, 56] [32, 40, 28, 28] -- False
│ │ └─MBConv (0) [32, 24, 56, 56] [32, 40, 28, 28] (15,350) False
│ │ └─MBConv (1) [32, 40, 28, 28] [32, 40, 28, 28] (31,290) False
│ └─Sequential (4) [32, 40, 28, 28] [32, 80, 14, 14] -- False
│ │ └─MBConv (0) [32, 40, 28, 28] [32, 80, 14, 14] (37,130) False
│ │ └─MBConv (1) [32, 80, 14, 14] [32, 80, 14, 14] (102,900) False
│ │ └─MBConv (2) [32, 80, 14, 14] [32, 80, 14, 14] (102,900) False
│ └─Sequential (5) [32, 80, 14, 14] [32, 112, 14, 14] -- False
│ │ └─MBConv (0) [32, 80, 14, 14] [32, 112, 14, 14] (126,004) False
│ │ └─MBConv (1) [32, 112, 14, 14] [32, 112, 14, 14] (208,572) False
│ │ └─MBConv (2) [32, 112, 14, 14] [32, 112, 14, 14] (208,572) False
│ └─Sequential (6) [32, 112, 14, 14] [32, 192, 7, 7] -- False
│ │ └─MBConv (0) [32, 112, 14, 14] [32, 192, 7, 7] (262,492) False
│ │ └─MBConv (1) [32, 192, 7, 7] [32, 192, 7, 7] (587,952) False
│ │ └─MBConv (2) [32, 192, 7, 7] [32, 192, 7, 7] (587,952) False
│ │ └─MBConv (3) [32, 192, 7, 7] [32, 192, 7, 7] (587,952) False
│ └─Sequential (7) [32, 192, 7, 7] [32, 320, 7, 7] -- False
│ │ └─MBConv (0) [32, 192, 7, 7] [32, 320, 7, 7] (717,232) False
│ └─Conv2dNormActivation (8) [32, 320, 7, 7] [32, 1280, 7, 7] -- False
│ │ └─Conv2d (0) [32, 320, 7, 7] [32, 1280, 7, 7] (409,600) False
│ │ └─BatchNorm2d (1) [32, 1280, 7, 7] [32, 1280, 7, 7] (2,560) False
│ │ └─SiLU (2) [32, 1280, 7, 7] [32, 1280, 7, 7] -- --
├─AdaptiveAvgPool2d (avgpool) [32, 1280, 7, 7] [32, 1280, 1, 1] -- --
├─Sequential (classifier) [32, 1280] [32, 3] -- True
│ └─Dropout (0) [32, 1280] [32, 1280] -- --
│ └─Linear (1) [32, 1280] [32, 3] 3,843 True
============================================================================================================================================
Total params: 4,011,391
Trainable params: 3,843
Non-trainable params: 4,007,548
Total mult-adds (G): 12.31
============================================================================================================================================
Input size (MB): 19.27
Forward/backward pass size (MB): 3452.09
Params size (MB): 16.05
Estimated Total Size (MB): 3487.41
============================================================================================================================================
Train Model
Now we’ve got a pretraiend model that’s semi-frozen and has a customised classifier
, how about we see transfer learning in action?
To begin training, let’s create a loss function and an optimizer.
Because we’re still working with multi-class classification, we’ll use nn.CrossEntropyLoss()
for the loss function.
And we’ll stick with torch.optim.Adam()
as our optimizer with lr=0.001
.
# Define loss and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Set the random seeds
torch.manual_seed(42)
torch.cuda.manual_seed(42)
# Start the timer
from timeit import default_timer as timer
start_time = timer()
# Setup training and save the results
results = train(model=model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
optimizer=optimizer,
loss_fn=loss_fn,
epochs=15,
device=device)
# End the timer and print out how long it took
end_time = timer()
print(f"[INFO] Total training time: {end_time-start_time:.3f} seconds")
0%| | 0/15 [00:00<?, ?it/s]
Epoch: 1 | train_loss: 1.0929 | train_acc: 0.4023 | test_loss: 0.9125 | test_acc: 0.5502
Epoch: 2 | train_loss: 0.8703 | train_acc: 0.7773 | test_loss: 0.7900 | test_acc: 0.8153
Epoch: 3 | train_loss: 0.7648 | train_acc: 0.8008 | test_loss: 0.7433 | test_acc: 0.8561
Epoch: 4 | train_loss: 0.7114 | train_acc: 0.7578 | test_loss: 0.6344 | test_acc: 0.8655
Epoch: 5 | train_loss: 0.6252 | train_acc: 0.7930 | test_loss: 0.6238 | test_acc: 0.8864
Epoch: 6 | train_loss: 0.5770 | train_acc: 0.8984 | test_loss: 0.5715 | test_acc: 0.8759
Epoch: 7 | train_loss: 0.5259 | train_acc: 0.9141 | test_loss: 0.5457 | test_acc: 0.8759
Epoch: 8 | train_loss: 0.5250 | train_acc: 0.8047 | test_loss: 0.5300 | test_acc: 0.8769
Epoch: 9 | train_loss: 0.5667 | train_acc: 0.8125 | test_loss: 0.5080 | test_acc: 0.8456
Epoch: 10 | train_loss: 0.4847 | train_acc: 0.8164 | test_loss: 0.4390 | test_acc: 0.9062
Epoch: 11 | train_loss: 0.4315 | train_acc: 0.9336 | test_loss: 0.4617 | test_acc: 0.8759
Epoch: 12 | train_loss: 0.4594 | train_acc: 0.8242 | test_loss: 0.4923 | test_acc: 0.8352
Epoch: 13 | train_loss: 0.4249 | train_acc: 0.9297 | test_loss: 0.4129 | test_acc: 0.8561
Epoch: 14 | train_loss: 0.3555 | train_acc: 0.9453 | test_loss: 0.4275 | test_acc: 0.8561
Epoch: 15 | train_loss: 0.3434 | train_acc: 0.9648 | test_loss: 0.4260 | test_acc: 0.8352
[INFO] Total training time: 39.095 seconds
# Plot loss curves of a model
def plot_loss_curves(results):
"""Plots training curves of a results dictionary.
Args:
results (dict): dictionary containing list of values, e.g.
{"train_loss": [...],
"train_acc": [...],
"test_loss": [...],
"test_acc": [...]}
"""
loss = results["train_loss"]
test_loss = results["test_loss"]
accuracy = results["train_acc"]
test_accuracy = results["test_acc"]
epochs = range(len(results["train_loss"]))
plt.figure(figsize=(15, 7))
# Plot loss
plt.subplot(1, 2, 1)
plt.plot(epochs, loss, label="train_loss")
plt.plot(epochs, test_loss, label="test_loss")
plt.title("Loss")
plt.xlabel("Epochs")
plt.legend()
# Plot accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs, accuracy, label="train_accuracy")
plt.plot(epochs, test_accuracy, label="test_accuracy")
plt.title("Accuracy")
plt.xlabel("Epochs")
plt.legend()
# Plot the loss curves of our model
plot_loss_curves(results)
Evaluate Model
from typing import List, Tuple
from PIL import Image
# 1. Take in a trained model, class names, image path, image size, a transform and target device
def pred_and_plot_image(model: torch.nn.Module,
image_path: str,
class_names: List[str],
image_size: Tuple[int, int] = (224, 224),
transform: torchvision.transforms = None,
device: torch.device=device):
# 2. Open image
img = Image.open(image_path)
# 3. Create transformation for image (if one doesn't exist)
if transform is not None:
image_transform = transform
else:
image_transform = transforms.Compose([
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
### Predict on image ###
# 4. Make sure the model is on the target device
model.to(device)
# 5. Turn on model evaluation mode and inference mode
model.eval()
with torch.inference_mode():
# 6. Transform and add an extra dimension to image (model requires samples in [batch_size, color_channels, height, width])
transformed_image = image_transform(img).unsqueeze(dim=0)
# 7. Make a prediction on image with an extra dimension and send it to the target device
target_image_pred = model(transformed_image.to(device))
# 8. Convert logits -> prediction probabilities (using torch.softmax() for multi-class classification)
target_image_pred_probs = torch.softmax(target_image_pred, dim=1)
# 9. Convert prediction probabilities -> prediction labels
target_image_pred_label = torch.argmax(target_image_pred_probs, dim=1)
# 10. Plot image with predicted label and probability
plt.figure()
plt.imshow(img)
plt.title(f"Pred: {class_names[target_image_pred_label]} | Prob: {target_image_pred_probs.max():.3f}")
plt.axis(False);
# Get a random list of image paths from test set
import random
from pathlib import Path
num_images_to_plot = 3
test_image_path_list = list(Path(test_dir).glob("*/*.jpg")) # get list all image paths from test data
test_image_path_sample = random.sample(population=test_image_path_list, # go through all of the test image paths
k=num_images_to_plot) # randomly select 'k' image paths to pred and plot
# Make predictions on and plot the images
for image_path in test_image_path_sample:
pred_and_plot_image(model=model,
image_path=image_path,
class_names=class_names,
# transform=weights.transforms(), # optionally pass in a specified transform from our pretrained model weights
image_size=(224, 224))
References