Convolutional Neural Networks with Pytorch

24 minute read

Published:

This post introduces the reader to creating and training Convolutional Neural Networks models for image classification. This post covers two datasets, MNIST and CIFAR, with step-by-step instructions on building and training the models.

The post is compatible with Google Colaboratory with Pytorch version 1.12.1+cu113 and can be accessed through this link:

Open In Colab

Convolutional Neural Networks with Pytorch

  • Developed by Armin Norouzi
  • Compatible with Google Colaboratory- Pytorch version 1.12.1+cu113

  • Objective: Create full Convolutional Neural Networks and train it for image classification

Table of content:

  1. CNN model - MNIST dataset
  2. CNN model - CIFAR dataset

CNN model - MNIST dataset

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torchvision.utils import make_grid

import numpy as np
import pandas as pd
import seaborn as sn  # for heatmaps
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
%matplotlib inline

Load the MNIST dataset

PyTorch makes the MNIST dataset available through torchvision. The first time it’s called, the dataset will be downloaded onto your computer to the path specified. From that point, torchvision will always look for a local copy before attempting another download.

Define transform

As part of the loading process, we can apply multiple transformations (reshape, convert to tensor, normalize, etc.) to the incoming data.
For this exercise we only need to convert images to tensors.

transform = transforms.ToTensor()

Load the training and testing set

train_data = datasets.MNIST(root='../Data', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='../Data', train=False, download=True, transform=transform)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../Data/MNIST/raw/train-images-idx3-ubyte.gz



  0%|          | 0/9912422 [00:00<?, ?it/s]


Extracting ../Data/MNIST/raw/train-images-idx3-ubyte.gz to ../Data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../Data/MNIST/raw/train-labels-idx1-ubyte.gz



  0%|          | 0/28881 [00:00<?, ?it/s]


Extracting ../Data/MNIST/raw/train-labels-idx1-ubyte.gz to ../Data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../Data/MNIST/raw/t10k-images-idx3-ubyte.gz



  0%|          | 0/1648877 [00:00<?, ?it/s]


Extracting ../Data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../Data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../Data/MNIST/raw/t10k-labels-idx1-ubyte.gz



  0%|          | 0/4542 [00:00<?, ?it/s]


Extracting ../Data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../Data/MNIST/raw
train_data
Dataset MNIST
    Number of datapoints: 60000
    Root location: ../Data
    Split: Train
    StandardTransform
Transform: ToTensor()
test_data
Dataset MNIST
    Number of datapoints: 10000
    Root location: ../Data
    Split: Test
    StandardTransform
Transform: ToTensor()

Calling the first record from train_data returns a two-item tuple. The first item is our 28x28 tensor representing the image. The second is a label, in this case the number “5”.

image, label = train_data[0]
print('Shape:', image.shape, '\nLabel:', label)
Shape: torch.Size([1, 28, 28])
Label: 5

View the image

Matplotlib can interpret pixel values through a variety of colormaps.

plt.imshow(train_data[0][0].reshape((28,28)), cmap="gray");

png

plt.imshow(train_data[0][0].reshape((28,28)), cmap="gist_yarg");

png

Batch loading with DataLoader

Our training set contains 60,000 records. Lets load training data in batches using DataLoader. When working with images, we want relatively small batches; a batch size of 4 is not uncommon.

torch.manual_seed(101)  # for consistent results

train_loader = DataLoader(train_data, batch_size=16, shuffle=True)

test_loader = DataLoader(test_data, batch_size=16, shuffle=False)

In the cell above, train_data is a PyTorch Dataset object (an object that supports data loading and sampling).
The batch_size is the number of records to be processed at a time. If it’s not evenly divisible into the dataset, then the final batch contains the remainder.
Setting shuffle to True means that the dataset will be shuffled after each epoch.

NOTE: DataLoader takes an optional num_workers parameter that sets up how many subprocesses to use for data loading. This behaves differently with different operating systems so we've omitted it here. See the docs for more information.

View a batch of images

Once we’ve defined a DataLoader, we can create a grid of images using torchvision.utils.make_grid

from torchvision.utils import make_grid

# Grab the first batch of images
for images,labels in train_loader:
    break

# Print the first 12 labels
print('Labels: ', labels.numpy())

# Print the first 12 images
im = make_grid(images, nrow=12)  # the default nrow is 8
plt.figure(figsize=(10,4))
# We need to transpose the images from CWH to WHC
plt.imshow(np.transpose(im.numpy(), (1, 2, 0)));
Labels:  [7 2 3 5 8 5 3 6 9 9 1 3 5 5 4 5]

png

Define a convolutional model

How Conv2d layers

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • stride (int or tuple, optional) – Stride of the convolution. Default: 1

  • padding (int, tuple or str, optional) – Padding added to all four sides of the input. Default: 0

  • padding_mode (string, optional)'zeros', 'reflect', 'replicate' or 'circular'. Default: ‘zeros’

  • dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1

  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

  • bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

# Define layers
conv1 = nn.Conv2d(1, 6, 3, 1)
# 1 -- > As we have gray scale images, we have only one input chanel
# 6 -- > This is number of filters, we have 6 filters
# 3 -- > we have 3 by 3 filter
# 1 -- > 1 is stride
conv2 = nn.Conv2d(6, 16, 3, 1)
# 6 -- > as we had 6 filter in conv1, input is 6
# 16 -- > This is number of filters, we have 16 filters
# 3 -- > we have 3 by 3 filter
# 1 -- > 1 is stride
# Grab the first MNIST record
for i, (X_train, y_train) in enumerate(train_data):
    break
# Create a rank-4 tensor to be passed into the model
# (train_loader will have done this already)
x = X_train.view(1,1,28,28) # x is one image
print(x.shape)
torch.Size([1, 1, 28, 28])
# Perform the first convolution/activation
x = F.relu(conv1(x))
print(x.shape)
torch.Size([1, 6, 26, 26])
  • 1 = as we have one single images
  • 6 = as we had 6 filter
  • 26 * 26 = because we used zero padding so we are loosing soem information from boarder
# Run the first pooling layer
x = F.max_pool2d(x, 2, 2)
print(x.shape)
torch.Size([1, 6, 13, 13])
  • Maxpolling layer 2 mean we are reducing size 26 * 26 by 2 * 2 filter
# Perform the second convolution/activation
x = F.relu(conv2(x))
print(x.shape)
torch.Size([1, 16, 11, 11])
  • 1 = as we have one single images
  • 16 = as we had 16 filter
  • 11 * 11 = because we used zero padding so we are loosing soem information from boarder
# Run the second pooling layer
x = F.max_pool2d(x, 2, 2)
print(x.shape)
torch.Size([1, 16, 5, 5])
  • Maxpolling layer 2 mean we are reducing size 11 * 11 by 5 * 5 filter
  • ` 11 / 2 is 5.5 but maxpolling round down it --> 11 // 2 = 5 `
# Flatten the data
x = x.view(-1, 5*5*16)
print(x.shape)
torch.Size([1, 400])
  • -1–> means keep the firs dimention of images, in our case is 1 as it is for one signle images
  • 16, 5, 5 comes from last layer before flatten

Building model

let’s but all togheter

class ConvolutionalNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        # Set up the convolutional layers with torch.nn.Conv2d()
        # The first layer has one input channel (the grayscale color channel).
        # We'll assign 6 output channels for feature extraction. We'll set our
        # kernel size to 3 to make a 3x3 filter, and set the step size to 1.
        self.conv1 = nn.Conv2d(1, 6, 3, 1)

        # The second layer will take our 6 input channels and deliver 16 output channels.
        self.conv2 = nn.Conv2d(6, 16, 3, 1)

        # Set up the fully connected layers with torch.nn.Linear()

        # The input size of (5x5x16) is determined by the effect of our kernels on the input image size.
        # A 3x3 filter applied to a 28x28 image leaves a 1-pixel edge on all four sides. In one layer
        # the size changes from 28x28 to 26x26. We could address this with zero-padding, but since an
        # MNIST image is mostly black at the edges, we should be safe ignoring these pixels. We'll apply
        # the kernel twice, and apply pooling layers twice, so our resulting output will be
        # (((28−2)/2)−2)/2=5.5  which rounds down to 5 pixels per side.
        self.fc1 = nn.Linear(5*5*16, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84,10)

    def forward(self, X):

        # Activations can be applied to the convolutions in one line using F.relu() and pooling is done
        # using F.max_pool2d()
        # PyTorch handle pooling as activation rather than layer, that's the reason we've added here
        X = F.relu(self.conv1(X))
        X = F.max_pool2d(X, 2, 2)
        X = F.relu(self.conv2(X))
        X = F.max_pool2d(X, 2, 2)
        # Flatten the data for the fully connected layers:
        X = X.view(-1, 5*5*16)
        X = F.relu(self.fc1(X))
        X = F.relu(self.fc2(X))
        X = self.fc3(X)
        return F.log_softmax(X, dim=1)
torch.manual_seed(42)
model = ConvolutionalNetwork()
model
ConvolutionalNetwork(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

Including the bias terms for each layer, the total number of parameters being trained is:

$\quad\begin{split}(1\times6\times3\times3)+6+(6\times16\times3\times3)+16+(400\times120)+120+(120\times84)+84+(84\times10)+10 &=
54+6+864+16+48000+120+10080+84+840+10 &= 60,074\end{split}$

def count_parameters(model):
    ''' This function used for counting number of parameters

    Arg:
      Model (PyTorch object)

    Return:
      print number of paramteres in each layer and sum of laerning parameter number
    '''
    params = [p.numel() for p in model.parameters() if p.requires_grad]
    for item in params:
        print(f'{item:>6}')
    print(f'______\n{sum(params):>6}')
count_parameters(model)
    54
     6
   864
    16
 48000
   120
 10080
    84
   840
    10
______
 60074

Define loss function & optimizer

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Train the model

This time we’ll feed the data directly into the model without flattening it first.

import time
start_time = time.time()

epochs = 10
train_losses = []
test_losses = []
train_correct = []
test_correct = []

for i in range(epochs):
    trn_corr = 0
    tst_corr = 0

    # Run the training batches
    for b, (X_train, y_train) in enumerate(train_loader):
        b += 1

        # Apply the model
        y_pred = model(X_train)  # we don't flatten X-train here
        loss = criterion(y_pred, y_train)

        # Tally the number of correct predictions
        predicted = torch.max(y_pred.data, 1)[1]
        batch_corr = (predicted == y_train).sum()
        trn_corr += batch_corr

        # Update parameters
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Print interim results
        if b%1000 == 0:
            print(f'epoch: {i:2}  batch: {b:4} [{10*b:6}/60000]  loss: {loss.item():10.8f}')

    train_losses.append(loss.detach().numpy())
    train_correct.append(trn_corr.detach().numpy())

    # Run the testing batches
    with torch.no_grad():
        for b, (X_test, y_test) in enumerate(test_loader):

            # Apply the model
            y_val = model(X_test)

            # Tally the number of correct predictions
            predicted = torch.max(y_val.data, 1)[1]
            tst_corr += (predicted == y_test).sum()

    loss = criterion(y_val, y_test)
    test_losses.append(loss.detach().numpy())
    test_correct.append(tst_corr.detach().numpy())

print(f'\nDuration: {time.time() - start_time:.0f} seconds') # print the time elapsed
epoch:  0  batch: 1000 [ 10000/60000]  loss: 0.66229165
epoch:  0  batch: 2000 [ 20000/60000]  loss: 0.01290415
epoch:  0  batch: 3000 [ 30000/60000]  loss: 0.09723970
epoch:  1  batch: 1000 [ 10000/60000]  loss: 0.05072124
epoch:  1  batch: 2000 [ 20000/60000]  loss: 0.01193699
epoch:  1  batch: 3000 [ 30000/60000]  loss: 0.00154264
epoch:  2  batch: 1000 [ 10000/60000]  loss: 0.01364101
epoch:  2  batch: 2000 [ 20000/60000]  loss: 0.01508053
epoch:  2  batch: 3000 [ 30000/60000]  loss: 0.01658336
epoch:  3  batch: 1000 [ 10000/60000]  loss: 0.00120270
epoch:  3  batch: 2000 [ 20000/60000]  loss: 0.00438918
epoch:  3  batch: 3000 [ 30000/60000]  loss: 0.00234172
epoch:  4  batch: 1000 [ 10000/60000]  loss: 0.00397162
epoch:  4  batch: 2000 [ 20000/60000]  loss: 0.07158402
epoch:  4  batch: 3000 [ 30000/60000]  loss: 0.00043703
epoch:  5  batch: 1000 [ 10000/60000]  loss: 0.13094078
epoch:  5  batch: 2000 [ 20000/60000]  loss: 0.08714615
epoch:  5  batch: 3000 [ 30000/60000]  loss: 0.00052878
epoch:  6  batch: 1000 [ 10000/60000]  loss: 0.00039457
epoch:  6  batch: 2000 [ 20000/60000]  loss: 0.00076895
epoch:  6  batch: 3000 [ 30000/60000]  loss: 0.01859490
epoch:  7  batch: 1000 [ 10000/60000]  loss: 0.00468324
epoch:  7  batch: 2000 [ 20000/60000]  loss: 0.00004987
epoch:  7  batch: 3000 [ 30000/60000]  loss: 0.08571053
epoch:  8  batch: 1000 [ 10000/60000]  loss: 0.00032430
epoch:  8  batch: 2000 [ 20000/60000]  loss: 0.00018516
epoch:  8  batch: 3000 [ 30000/60000]  loss: 0.00064786
epoch:  9  batch: 1000 [ 10000/60000]  loss: 0.00034374
epoch:  9  batch: 2000 [ 20000/60000]  loss: 0.00575564
epoch:  9  batch: 3000 [ 30000/60000]  loss: 0.00000851

Duration: 275 seconds

Plot the loss and accuracy comparisons

plt.plot(train_losses, label='training loss')
plt.plot(test_losses, label='validation loss')
plt.title('Loss at the end of each epoch')
plt.legend()
plt.show();

plt.plot([t/600 for t in train_correct], label='training accuracy')
plt.plot([t/100 for t in test_correct], label='validation accuracy')
plt.title('Accuracy at the end of each epoch')
plt.legend()
plt.show();

png

png

Evaluate Test Data

# Extract the data all at once, not in batches
test_load_all = DataLoader(test_data, batch_size=10000, shuffle=False)

with torch.no_grad():
    correct = 0
    for X_test, y_test in test_load_all:
        y_val = model(X_test)  # we don't flatten the data this time
        predicted = torch.max(y_val,1)[1]
        correct += (predicted == y_test).sum()
print(f'Test accuracy: {correct.item()}/{len(test_data)} = {correct.item()*100/(len(test_data)):7.3f}%')
Test accuracy: 9899/10000 =  98.990%

Display the confusion matrix

# print a row of values for reference
np.set_printoptions(formatter=dict(int=lambda x: f'{x:4}'))
print(np.arange(10).reshape(1,10))
print()

# print the confusion matrix
print(confusion_matrix(predicted.view(-1), y_test.view(-1)))
[[   0    1    2    3    4    5    6    7    8    9]]

[[ 974    0    2    0    0    0    3    1    2    0]
 [   0 1132    1    1    2    0    1    5    0    0]
 [   0    1 1019    2    1    0    1    2    2    0]
 [   0    0    1  994    0    3    0    0    1    1]
 [   0    0    1    0  971    0    1    0    0    4]
 [   0    0    0    4    0  887    2    0    0    3]
 [   2    1    0    0    1    1  948    0    0    0]
 [   2    0    7    2    0    1    0 1016    1    6]
 [   2    1    1    5    0    0    2    1  965    2]
 [   0    0    0    2    7    0    0    3    3  993]]

Examine the misses

We can track the index positions of “missed” predictions, and extract the corresponding image and label. We’ll do this in batches to save screen space.

misses = np.array([])
for i in range(len(predicted.view(-1))):
    if predicted[i] != y_test[i]:
        misses = np.append(misses,i).astype('int64')

# Display the number of misses
len(misses)
101
# Set up an iterator to feed batched rows
r = 12   # row size
row = iter(np.array_split(misses,len(misses)//r+1))
nextrow = next(row)
print("Index:", nextrow)
print("Label:", y_test.index_select(0,torch.tensor(nextrow)).numpy())
print("Guess:", predicted.index_select(0,torch.tensor(nextrow)).numpy())

images = X_test.index_select(0,torch.tensor(nextrow))
im = make_grid(images, nrow=r)
plt.figure(figsize=(10,4))
plt.imshow(np.transpose(im.numpy(), (1, 2, 0)));
Index: [  18  247  321  340  381  445  448  495  582  659  740  924]
Label: [   3    4    2    5    3    6    9    8    8    2    4    2]
Guess: [   8    2    7    3    2    0    8    0    2    7    9    7]

png

Run a new image through the model

We can also pass a single image through the model to obtain a prediction. Pick a number from 0 to 9999, assign it to “x”, and we’ll use that value to select a number from the MNIST test set.

x = 2019
plt.figure(figsize=(1,1))
plt.imshow(test_data[x][0].reshape((28,28)), cmap="gist_yarg");

png

model.eval()
with torch.no_grad():
    new_pred = model(test_data[x][0].view(1,1,28,28)).argmax()
print("Predicted value:",new_pred.item())
Predicted value: 9

CNN model - CIFAR dataset

Load the CIFAR dataset

The CIFAR-10 dataset is similar to MNIST, except that instead of one color channel (grayscale) there are three channels (RGB).
Where an MNIST image has a size of (1,28,28), CIFAR images are (3,32,32). There are 10 categories an image may fall under: 0. airplane

  1. automobile
  2. bird
  3. cat
  4. deer
  5. dog
  6. frog
  7. horse
  8. ship
  9. truck

As with the previous code along, make sure to watch the theory lectures! You’ll want to be comfortable with:

  • convolutional layers
  • filters/kernels
  • pooling
  • depth, stride and zero-padding

PyTorch makes the CIFAR-10 train and test datasets available through torchvision. The first time they’re called, the datasets will be downloaded onto your computer to the path specified. From that point, torchvision will always look for a local copy before attempting another download.
The set contains 50,000 train and 10,000 test images.

Refer to the previous section for explanations of transformations, batch sizes and DataLoader.

transform = transforms.ToTensor()

train_data = datasets.CIFAR10(root='../Data', train=True, download=True, transform=transform)
test_data = datasets.CIFAR10(root='../Data', train=False, download=True, transform=transform)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ../Data/cifar-10-python.tar.gz



  0%|          | 0/170498071 [00:00<?, ?it/s]


Extracting ../Data/cifar-10-python.tar.gz to ../Data
Files already downloaded and verified
train_data
Dataset CIFAR10
    Number of datapoints: 50000
    Root location: ../Data
    Split: Train
    StandardTransform
Transform: ToTensor()
test_data
Dataset CIFAR10
    Number of datapoints: 10000
    Root location: ../Data
    Split: Test
    StandardTransform
Transform: ToTensor()
torch.manual_seed(101)  # for reproducible results

train_loader = DataLoader(train_data, batch_size=10, shuffle=True)
test_loader = DataLoader(test_data, batch_size=10, shuffle=False)

Define strings for labels

We can call the labels whatever we want, so long as they appear in the order of ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. Here we’re using 5-character labels padded with spaces so that our reports line up later.

class_names = ['plane', '  car', ' bird', '  cat', ' deer', '  dog', ' frog', 'horse', ' ship', 'truck']

View the image

Matplotlib can interpret pixel values through a variety of colormaps.

np.set_printoptions(formatter=dict(int=lambda x: f'{x:5}')) # to widen the printed array

# Grab the first batch of 10 images
for images,labels in train_loader:
    break

# Print the labels
print('Label:', labels.numpy())
print('Class: ', *np.array([class_names[i] for i in labels]))

# Print the images
im = make_grid(images, nrow=5)  # the default nrow is 8
plt.figure(figsize=(10,4))
plt.imshow(np.transpose(im.numpy(), (1, 2, 0)));
Label: [    1     5     8     1     6     1     6     3     7     9]
Class:    car   dog  ship   car  frog   car  frog   cat horse truck

png

Define a convolutional model

Building model

  • take in 3-channel images instead of 1-channel
  • adjust the size of the fully connected input

Our first convolutional layer will have 3 input channels, 6 output channels, a kernel size of 3 (resulting in a 3x3 filter), and a stride length of 1 pixel.
These are passed in as nn.Conv2d(3,6,3,1)</tt

class ConvolutionalNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 3, 1)  # changed from (1, 6, 5, 1)
        self.conv2 = nn.Conv2d(6, 16, 3, 1)
        self.fc1 = nn.Linear(6*6*16, 120)   # changed from (4*4*16) to fit 32x32 images with 3x3 filters
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, X):
        X = F.relu(self.conv1(X))
        X = F.max_pool2d(X, 2, 2)
        X = F.relu(self.conv2(X))
        X = F.max_pool2d(X, 2, 2)
        X = X.view(-1, 6*6*16)
        X = F.relu(self.fc1(X))
        X = F.relu(self.fc2(X))
        X = self.fc3(X)
        return F.log_softmax(X, dim=1)
Why (6x6x16) instead of (5x5x16)?
With MNIST the kernels and pooling layers resulted in $\;(((28−2)/2)−2)/2=5.5 \;$ which rounds down to 5 pixels per side.
With CIFAR the result is $\;(((32-2)/2)-2)/2 = 6.5\;$ which rounds down to 6 pixels per side.
torch.manual_seed(101)
model = ConvolutionalNetwork()
model
ConvolutionalNetwork(
  (conv1): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
count_parameters(model)
   162
     6
   864
    16
 69120
   120
 10080
    84
   840
    10
______
 81302

Define loss function & optimizer

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Train the model

This time we’ll feed the data directly into the model without flattening it first.

import time
start_time = time.time()

epochs = 10
train_losses = []
test_losses = []
train_correct = []
test_correct = []

for i in range(epochs):
    trn_corr = 0
    tst_corr = 0

    # Run the training batches
    for b, (X_train, y_train) in enumerate(train_loader):
        b += 1

        # Apply the model
        y_pred = model(X_train)  # we don't flatten X-train here
        loss = criterion(y_pred, y_train)

        # Tally the number of correct predictions
        predicted = torch.max(y_pred.data, 1)[1]
        batch_corr = (predicted == y_train).sum()
        trn_corr += batch_corr

        # Update parameters
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Print interim results
        if b%1000 == 0:
            print(f'epoch: {i:2}  batch: {b:4} [{10*b:6}/60000]  loss: {loss.item():10.8f}')

    train_losses.append(loss.detach().numpy())
    train_correct.append(trn_corr.detach().numpy())

    # Run the testing batches
    with torch.no_grad():
        for b, (X_test, y_test) in enumerate(test_loader):

            # Apply the model
            y_val = model(X_test)

            # Tally the number of correct predictions
            predicted = torch.max(y_val.data, 1)[1]
            tst_corr += (predicted == y_test).sum()

    loss = criterion(y_val, y_test)
    test_losses.append(loss.detach().numpy())
    test_correct.append(tst_corr.detach().numpy())

print(f'\nDuration: {time.time() - start_time:.0f} seconds') # print the time elapsed
epoch:  0  batch: 1000 [ 10000/60000]  loss: 1.74091792
epoch:  0  batch: 2000 [ 20000/60000]  loss: 2.28505325
epoch:  0  batch: 3000 [ 30000/60000]  loss: 1.41110420
epoch:  0  batch: 4000 [ 40000/60000]  loss: 1.77556062
epoch:  0  batch: 5000 [ 50000/60000]  loss: 0.87544328
epoch:  1  batch: 1000 [ 10000/60000]  loss: 1.40881562
epoch:  1  batch: 2000 [ 20000/60000]  loss: 1.13322639
epoch:  1  batch: 3000 [ 30000/60000]  loss: 1.45552599
epoch:  1  batch: 4000 [ 40000/60000]  loss: 1.10780263
epoch:  1  batch: 5000 [ 50000/60000]  loss: 1.01513064
epoch:  2  batch: 1000 [ 10000/60000]  loss: 1.21744430
epoch:  2  batch: 2000 [ 20000/60000]  loss: 0.94388628
epoch:  2  batch: 3000 [ 30000/60000]  loss: 1.29108834
epoch:  2  batch: 4000 [ 40000/60000]  loss: 1.36637890
epoch:  2  batch: 5000 [ 50000/60000]  loss: 1.00823355
epoch:  3  batch: 1000 [ 10000/60000]  loss: 1.27289844
epoch:  3  batch: 2000 [ 20000/60000]  loss: 0.99568701
epoch:  3  batch: 3000 [ 30000/60000]  loss: 0.67097992
epoch:  3  batch: 4000 [ 40000/60000]  loss: 1.37756491
epoch:  3  batch: 5000 [ 50000/60000]  loss: 1.33198154
epoch:  4  batch: 1000 [ 10000/60000]  loss: 0.77316147
epoch:  4  batch: 2000 [ 20000/60000]  loss: 0.96285629
epoch:  4  batch: 3000 [ 30000/60000]  loss: 0.86326140
epoch:  4  batch: 4000 [ 40000/60000]  loss: 1.16941047
epoch:  4  batch: 5000 [ 50000/60000]  loss: 2.18257213
epoch:  5  batch: 1000 [ 10000/60000]  loss: 0.83931684
epoch:  5  batch: 2000 [ 20000/60000]  loss: 0.45692676
epoch:  5  batch: 3000 [ 30000/60000]  loss: 1.55596709
epoch:  5  batch: 4000 [ 40000/60000]  loss: 1.52249289
epoch:  5  batch: 5000 [ 50000/60000]  loss: 0.77832973
epoch:  6  batch: 1000 [ 10000/60000]  loss: 0.58171165
epoch:  6  batch: 2000 [ 20000/60000]  loss: 1.27864909
epoch:  6  batch: 3000 [ 30000/60000]  loss: 1.22215343
epoch:  6  batch: 4000 [ 40000/60000]  loss: 0.99015683
epoch:  6  batch: 5000 [ 50000/60000]  loss: 0.85941613
epoch:  7  batch: 1000 [ 10000/60000]  loss: 0.88422716
epoch:  7  batch: 2000 [ 20000/60000]  loss: 1.56418681
epoch:  7  batch: 3000 [ 30000/60000]  loss: 1.14219129
epoch:  7  batch: 4000 [ 40000/60000]  loss: 0.67993647
epoch:  7  batch: 5000 [ 50000/60000]  loss: 0.89945662
epoch:  8  batch: 1000 [ 10000/60000]  loss: 0.80203950
epoch:  8  batch: 2000 [ 20000/60000]  loss: 0.97332305
epoch:  8  batch: 3000 [ 30000/60000]  loss: 1.23693681
epoch:  8  batch: 4000 [ 40000/60000]  loss: 0.73578262
epoch:  8  batch: 5000 [ 50000/60000]  loss: 0.46964207
epoch:  9  batch: 1000 [ 10000/60000]  loss: 2.03283572
epoch:  9  batch: 2000 [ 20000/60000]  loss: 0.26118031
epoch:  9  batch: 3000 [ 30000/60000]  loss: 1.27609515
epoch:  9  batch: 4000 [ 40000/60000]  loss: 0.99766606
epoch:  9  batch: 5000 [ 50000/60000]  loss: 0.72483909

Duration: 379 seconds

Plot the loss and accuracy comparisons

plt.plot(train_losses, label='training loss')
plt.plot(test_losses, label='validation loss')
plt.title('Loss at the end of each epoch')
plt.legend()
plt.show();

plt.plot([t/600 for t in train_correct], label='training accuracy')
plt.plot([t/100 for t in test_correct], label='validation accuracy')
plt.title('Accuracy at the end of each epoch')
plt.legend()
plt.show();

png

png

Evaluate Test Data

print(test_correct) # contains the results of all 10 epochs
print()
print(f'Test accuracy: {test_correct[-1].item()*100/10000:.3f}%') # print the most recent result as a percent
[array( 4893), array( 5480), array( 5807), array( 5921), array( 5954), array( 6157), array( 6112), array( 6304), array( 6182), array( 6249)]

Test accuracy: 62.490%

Display the confusion matrix

In order to map predictions against ground truth, we need to run the entire test set through the model.
Also, since our model was not as accurate as with MNIST, we’ll use a heatmap to better display the results.

# Create a loader for the entire the test set
test_load_all = DataLoader(test_data, batch_size=10000, shuffle=False)

with torch.no_grad():
    correct = 0
    for X_test, y_test in test_load_all:
        y_val = model(X_test)
        predicted = torch.max(y_val,1)[1]
        correct += (predicted == y_test).sum()

arr = confusion_matrix(y_test.view(-1), predicted.view(-1))
df_cm = pd.DataFrame(arr, class_names, class_names)
plt.figure(figsize = (9,6))
sn.heatmap(df_cm, annot=True, fmt="d", cmap='BuGn')
plt.xlabel("prediction")
plt.ylabel("label (ground truth)")
plt.show();

png

Examine the misses

We can track the index positions of “missed” predictions, and extract the corresponding image and label. We’ll do this in batches to save screen space.

misses = np.array([])
for i in range(len(predicted.view(-1))):
    if predicted[i] != y_test[i]:
        misses = np.append(misses,i).astype('int64')

# Display the number of misses
len(misses)
3751
# Set up an iterator to feed batched rows
r = 12   # row size
row = iter(np.array_split(misses,len(misses)//r+1))
nextrow = next(row)
print("Index:", nextrow)
print("Label:", y_test.index_select(0,torch.tensor(nextrow)).numpy())
print("Guess:", predicted.index_select(0,torch.tensor(nextrow)).numpy())

images = X_test.index_select(0,torch.tensor(nextrow))
im = make_grid(images, nrow=r)
plt.figure(figsize=(10,4))
plt.imshow(np.transpose(im.numpy(), (1, 2, 0)));
Index: [    4    12    16    17    22    24    25    28    30    31    32    33]
Label: [    6     5     5     7     4     5     2     9     6     5     4     5]
Guess: [    4     1     3     3     2     4     4     1     3     3     2     3]

png

References: