Transfer Learning with Tensorflow

57 minute read

Published:

The blog post focuses on using pre-trained models and different types of transfer learning. It is divided into three sections, including an introduction to transfer learning, transfer learning using feature extraction, and transfer learning using fine-tuning.

The post is compatible with Google Colaboratory with TensorFlow version 2.8.2 and can be accessed through this link:

Open In Colab

Transfer Learning with Tensorflow

  • Developed by Armin Norouzi
  • Compatible with Google Colaboratory- Tensorflow 2.8.2

  • Objective: Using pretrained model and perfrom different kinds of transfer learning

Table of content:

  1. Introduction to Trasnfer Learning
  2. Trasnfer Learning using Feature Extraction
  3. Trasnfer Learning using Fine-tuning

Introduction to Trasnfer Learning

There are two main benefits to using transfer learning:

  1. Can leverage an existing neural network architecture proven to work on problems similar to our own.
  2. Can leverage a working neural network architecture which has already learned patterns on similar data to our own. This often results in achieving great results with less custom data.

In other words, instead of training our own models from scratch on our own datasets, we can take the patterns a model has learned from datasets such as ImageNet and use them as the foundation of our own. Doing this often leads to getting great results with less data.

Using GPU is highly recommended in transfer learning.

# Are we using a GPU?
!nvidia-smi
Thu Sep  8 15:32:36 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Helper fucntions


import datetime

def create_tensorboard_callback(dir_name, experiment_name):
  """
  Creates a TensorBoard callback instand to store log files.
  Stores log files with the filepath:
    "dir_name/experiment_name/current_datetime/"
  Args:
    dir_name: target directory to store TensorBoard log files
    experiment_name: name of experiment directory (e.g. efficientnet_model_1)
  """
  log_dir = dir_name + "/" + experiment_name + "/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
  tensorboard_callback = tf.keras.callbacks.TensorBoard(
      log_dir=log_dir
  )
  print(f"Saving TensorBoard log files to: {log_dir}")
  return tensorboard_callback


# Plot the validation and training data separately
import matplotlib.pyplot as plt

def plot_loss_curves(history):
  """
  Returns separate loss curves for training and validation metrics.
  Args:
    history: TensorFlow model History object (see: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/History)
  """
  loss = history.history['loss']
  val_loss = history.history['val_loss']

  accuracy = history.history['accuracy']
  val_accuracy = history.history['val_accuracy']

  epochs = range(len(history.history['loss']))

  # Plot loss
  plt.plot(epochs, loss, label='training_loss')
  plt.plot(epochs, val_loss, label='val_loss')
  plt.title('Loss')
  plt.xlabel('Epochs')
  plt.legend()

  # Plot accuracy
  plt.figure()
  plt.plot(epochs, accuracy, label='training_accuracy')
  plt.plot(epochs, val_accuracy, label='val_accuracy')
  plt.title('Accuracy')
  plt.xlabel('Epochs')
  plt.legend();

def compare_historys(original_history, new_history, initial_epochs=5):
    """
    Compares two TensorFlow model History objects.

    Args:
      original_history: History object from original model (before new_history)
      new_history: History object from continued model training (after original_history)
      initial_epochs: Number of epochs in original_history (new_history plot starts from here)
    """

    # Get original history measurements
    acc = original_history.history["accuracy"]
    loss = original_history.history["loss"]

    val_acc = original_history.history["val_accuracy"]
    val_loss = original_history.history["val_loss"]

    # Combine original history with new history
    total_acc = acc + new_history.history["accuracy"]
    total_loss = loss + new_history.history["loss"]

    total_val_acc = val_acc + new_history.history["val_accuracy"]
    total_val_loss = val_loss + new_history.history["val_loss"]

    # Make plots
    plt.figure(figsize=(8, 8))
    plt.subplot(2, 1, 1)
    plt.plot(total_acc, label='Training Accuracy')
    plt.plot(total_val_acc, label='Validation Accuracy')
    plt.plot([initial_epochs-1, initial_epochs-1],
              plt.ylim(), label='Start Fine Tuning') # reshift plot around epochs
    plt.legend(loc='lower right')
    plt.title('Training and Validation Accuracy')

    plt.subplot(2, 1, 2)
    plt.plot(total_loss, label='Training Loss')
    plt.plot(total_val_loss, label='Validation Loss')
    plt.plot([initial_epochs-1, initial_epochs-1],
              plt.ylim(), label='Start Fine Tuning') # reshift plot around epochs
    plt.legend(loc='upper right')
    plt.title('Training and Validation Loss')
    plt.xlabel('epoch')
    plt.show()


# Create function to unzip a zipfile into current working directory
# (since we're going to be downloading and unzipping a few files)
import zipfile

def unzip_data(filename):
  """
  Unzips filename into the current working directory.
  Args:
    filename (str): a filepath to a target zip folder to be unzipped.
  """
  zip_ref = zipfile.ZipFile(filename, "r")
  zip_ref.extractall()
  zip_ref.close()

# Walk through an image classification directory and find out how many files (images)
# are in each subdirectory.
import os


def walk_through_dir(dir_path):
  """
  Walks through dir_path returning its contents.
  Args:
    dir_path (str): target directory

  Returns:
    A print out of:
      number of subdiretories in dir_path
      number of images (files) in each subdirectory
      name of each subdirectory
  """
  for dirpath, dirnames, filenames in os.walk(dir_path):
    print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

Trasnfer Learning using Feature Extraction

Loading data

# Get data (10% of labels)
import zipfile

# Download data
!wget https://gitlab.com/arminny/ml_course_datasets/-/raw/main/10_food_classes_10_percent.zip

# Unzip the downloaded file
unzip_data("10_food_classes_10_percent.zip")

--2022-09-08 15:32:37--  https://gitlab.com/arminny/ml_course_datasets/-/raw/main/10_food_classes_10_percent.zip
Resolving gitlab.com (gitlab.com)... 172.65.251.78, 2606:4700:90:0:f22e:fbec:5bed:a9b9
Connecting to gitlab.com (gitlab.com)|172.65.251.78|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 168546183 (161M) [application/octet-stream]
Saving to: ‘10_food_classes_10_percent.zip’

10_food_classes_10_ 100%[===================>] 160.74M  48.7MB/s    in 3.3s

2022-09-08 15:32:41 (49.0 MB/s) - ‘10_food_classes_10_percent.zip’ saved [168546183/168546183]
# How many images are we working with now?
walk_through_dir("10_food_classes_10_percent")

There are 2 directories and 0 images in '10_food_classes_10_percent'.
There are 10 directories and 0 images in '10_food_classes_10_percent/test'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/pizza'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/chicken_wings'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/grilled_salmon'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/hamburger'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/chicken_curry'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/fried_rice'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/steak'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/ramen'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/sushi'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/ice_cream'.
There are 10 directories and 0 images in '10_food_classes_10_percent/train'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/pizza'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/chicken_wings'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/grilled_salmon'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/hamburger'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/chicken_curry'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/fried_rice'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/steak'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/ramen'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/sushi'.
There are 0 directories and 75 images in '10_food_classes_10_percent/train/ice_cream'.

Creating data loaders

Now we’ve downloaded the data, let’s use the ImageDataGenerator class along with the flow_from_directory method to load in our images.

# Setup data inputs
from tensorflow.keras.preprocessing.image import ImageDataGenerator

IMAGE_SHAPE = (224, 224)
BATCH_SIZE = 32

train_dir = "10_food_classes_10_percent/train/"
test_dir = "10_food_classes_10_percent/test/"

train_datagen = ImageDataGenerator(rescale=1/255.)
test_datagen = ImageDataGenerator(rescale=1/255.)

print("Training images:")
train_data_10_percent = train_datagen.flow_from_directory(train_dir,
                                               target_size=IMAGE_SHAPE,
                                               batch_size=BATCH_SIZE,
                                               class_mode="categorical")

print("Testing images:")
test_data = train_datagen.flow_from_directory(test_dir,
                                              target_size=IMAGE_SHAPE,
                                              batch_size=BATCH_SIZE,
                                              class_mode="categorical")
Training images:
Found 750 images belonging to 10 classes.
Testing images:
Found 2500 images belonging to 10 classes.

Setting up callbacks

Callbacks are extra functionality you can add to your models to be performed during or after training. Some of the most popular callbacks include:

  • Experiment tracking with TensorBoard - log the performance of multiple models and then view and compare these models in a visual way on TensorBoard (a dashboard for inspecting neural network parameters). Helpful to compare the results of different models on your data.
  • Model checkpointing - save your model as it trains so you can stop training if needed and come back to continue off where you left. Helpful if training takes a long time and can’t be done in one sitting.
  • Early stopping - leave your model training for an arbitrary amount of time and have it stop training automatically when it ceases to improve. Helpful when you’ve got a large dataset and don’t know how long training will take.

The TensorBoard callback can be accessed using tf.keras.callbacks.TensorBoard().

Its main functionality is saving a model’s training performance metrics to a specified log_dir.

By default, logs are recorded every epoch using the update_freq='epoch' parameter. This is a good default since tracking model performance too often can slow down model training.

To track our modelling experiments using TensorBoard, let’s create a function which creates a TensorBoard callback for us.

# Create tensorboard callback (functionized because need to create a new one for each model)
import datetime
import tensorflow as tf

def create_tensorboard_callback(dir_name, experiment_name):
  ''' This function is used to create tensorboard callback

  Arg:
      dir_name: overall logs directory
      experiment_name: particular experiment
      current_timestamp: time the experiment started based on Python's datetime.datetime().now()
  '''
  log_dir = dir_name + "/" + experiment_name + "/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
  tensorboard_callback = tf.keras.callbacks.TensorBoard(
      log_dir=log_dir
  )
  print(f"Saving TensorBoard log files to: {log_dir}")
  return tensorboard_callback

Creating models using TensorFlow Hub

In this section, pre trained model will import from TensorFlow Hub:

  1. ResNetV2 - a state of the art computer vision model architecture from 2016.
  2. EfficientNet - a state of the art computer vision architecture from 2019.

Finding model based on application in tfhub:

  1. Go to tfhub.dev.
  2. Choose your problem domain, e.g. “Image” (we’re using food images).
  3. Select your TF version, which in our case is TF2.
  4. Remove all “Problem domanin” filters except for the problem you’re working on.
  5. The models listed are all models which could potentially be used for your problem.
  6. Select the Architecture tab on TensorFlow Hub and you’ll see a dropdown menu of architecture names appear.
  • The rule of thumb here is generally, names with larger numbers means better performing models. For example, EfficientNetB4 performs better than EfficientNetB0.
  • However, the tradeoff with larger numbers can mean they take longer to compute.
  1. Select EfficientNetB0
  2. Clicking the one titled “efficientnet/b0/feature-vector” brings us to a page with a button that says “Copy URL”. That URL is what we can use to harness the power of EfficientNetB0.
  • Copying the URL should give you something like this: https://tfhub.dev/tensorflow/efficientnet/b0/feature-vector/1

Differnet types of transfer learning:

  1. “As is” transfer learning is when you take a pretrained model as it is and apply it to your task without any changes.

  2. Feature extraction transfer learning is when you take the underlying patterns (also called weights) a pretrained model has learned and adjust its outputs to be more suited to your problem.

  • For example, say the pretrained model you were using had 236 different layers (EfficientNetB0 has 236 layers), but the top layer outputs 1000 classes because it was pretrained on ImageNet. To adjust this to your own problem, you might remove the original activation layer and replace it with your own but with the right number of output classes. The important part here is that only the top few layers become trainable, the rest remain frozen.
  1. Fine-tuning transfer learning is when you take the underlying patterns (also called weights) of a pretrained model and adjust (fine-tune) them to your own problem.

    • This usually means training some, many or all of the layers in the pretrained model. This is useful when you’ve got a large dataset (e.g. 100+ images per class) where your data is slightly different to the data the original model was trained on.

A common workflow is to “freeze” all of the learned patterns in the bottom layers of a pretrained model so they’re untrainable. And then train the top 2-3 layers of so the pretrained model can adjust its outputs to your custom data (feature extraction).

Question: Why train only the top 2-3 layers in feature extraction?

The lower a layer is in a computer vision model as in, the closer it is to the input layer, the larger the features it learn. For example, a bottom layer in a computer vision model to identify images of cats or dogs might learn the outline of legs, where as, layers closer to the output might learn the shape of teeth. Often, you’ll want the larger features (learned patterns are also called features) to remain, since these are similar for both animals, where as, the differences remain in the more fine-grained features.

import tensorflow_hub as hub
from tensorflow.keras import layers

Resnet 50

# Resnet 50 V2 feature vector
resnet_url = "https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4"

# New: EfficientNetB0 feature vector (version 2)
efficientnet_url = "https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet1k_b0/feature_vector/2"
def create_model(model_url, num_classes=10):
  """Takes a TensorFlow Hub URL and creates a Keras Sequential model with it.

  Args:
    model_url (str): A TensorFlow Hub feature extraction URL.
    num_classes (int): Number of output neurons in output layer,
      should be equal to number of target classes, default 10.

  Returns:
    An uncompiled Keras Sequential model with model_url as feature
    extractor layer and Dense output layer with num_classes outputs.
  """
  # Download the pretrained model and save it as a Keras layer
  feature_extractor_layer = hub.KerasLayer(model_url,
                                           trainable=False, # freeze the underlying patterns
                                           name='feature_extraction_layer',
                                           input_shape=IMAGE_SHAPE+(3,)) # define the input image shape

  # Create our own model
  model = tf.keras.Sequential([
    feature_extractor_layer, # use the feature extraction layer as the base
    layers.Dense(num_classes, activation='softmax', name='output_layer') # create our own output layer
  ])

  return model
# Create model
resnet_model = create_model(resnet_url, num_classes=train_data_10_percent.num_classes)

# Compile
resnet_model.compile(loss='categorical_crossentropy',
                     optimizer=tf.keras.optimizers.Adam(),
                     metrics=['accuracy'])

We’ve got the training data ready in train_data_10_percent as well as the test data saved as test_data.

But before we call the fit function, there’s one more thing we’re going to add, a callback. More specifically, a TensorBoard callback so we can track the performance of our model on TensorBoard.

We can add a callback to our model by using the callbacks parameter in the fit function.

In our case, we’ll pass the callbacks parameter the create_tensorboard_callback() we created earlier with some specific inputs so we know what experiments we’re running.

Let’s keep this experiment short and train for 5 epochs.

# Fit the model
resnet_history = resnet_model.fit(train_data_10_percent,
                                  epochs = 5,
                                  steps_per_epoch = len(train_data_10_percent),
                                  validation_data = test_data,
                                  validation_steps = len(test_data),
                                  # Add TensorBoard callback to model (callbacks parameter takes a list)
                                  callbacks=[create_tensorboard_callback(dir_name="tensorflow_hub", # save experiment logs here
                                                                         experiment_name="resnet50V2")]) # name of log files
Saving TensorBoard log files to: tensorflow_hub/resnet50V2/20220908-153304
Epoch 1/5
24/24 [==============================] - 33s 775ms/step - loss: 1.8411 - accuracy: 0.3893 - val_loss: 1.1455 - val_accuracy: 0.6440
Epoch 2/5
24/24 [==============================] - 25s 1s/step - loss: 0.8830 - accuracy: 0.7453 - val_loss: 0.8329 - val_accuracy: 0.7400
Epoch 3/5
24/24 [==============================] - 18s 778ms/step - loss: 0.6080 - accuracy: 0.8333 - val_loss: 0.7413 - val_accuracy: 0.7580
Epoch 4/5
24/24 [==============================] - 17s 739ms/step - loss: 0.4850 - accuracy: 0.8720 - val_loss: 0.7182 - val_accuracy: 0.7624
Epoch 5/5
24/24 [==============================] - 17s 737ms/step - loss: 0.3731 - accuracy: 0.9160 - val_loss: 0.6747 - val_accuracy: 0.7784

It seems that after only 5 epochs, the ResNetV250 feature extraction model was able to blow any of the architectures we made in L04, achieving around 90% accuracy on the training set and nearly 80% accuracy on the test set with only 10 percent of the training images!

That goes to show the power of transfer learning. And it’s one of the main reasons whenever you’re trying to model your own datasets, you should look into what pretrained models already exist.

Let’s check out our model’s training curves using our plot_loss_curves function.

plot_loss_curves(resnet_history)

png

png

# Resnet summary
resnet_model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 feature_extraction_layer (K  (None, 2048)             23564800
 erasLayer)

 output_layer (Dense)        (None, 10)                20490

=================================================================
Total params: 23,585,290
Trainable params: 20,490
Non-trainable params: 23,564,800
_________________________________________________________________

EfficientNetB0

# Create model
efficientnet_model = create_model(model_url=efficientnet_url, # use EfficientNetB0 TensorFlow Hub URL
                                  num_classes=train_data_10_percent.num_classes)

# Compile EfficientNet model
efficientnet_model.compile(loss='categorical_crossentropy',
                           optimizer=tf.keras.optimizers.Adam(),
                           metrics=['accuracy'])

# Fit EfficientNet model
efficientnet_history = efficientnet_model.fit(train_data_10_percent, # only use 10% of training data
                                              epochs=5, # train for 5 epochs
                                              steps_per_epoch=len(train_data_10_percent),
                                              validation_data=test_data,
                                              validation_steps=len(test_data),
                                              callbacks=[create_tensorboard_callback(dir_name="tensorflow_hub",
                                                                                     # Track logs under different experiment name
                                                                                     experiment_name="efficientnetB0")])
Saving TensorBoard log files to: tensorflow_hub/efficientnetB0/20220908-153508
Epoch 1/5
24/24 [==============================] - 26s 743ms/step - loss: 1.9594 - accuracy: 0.3787 - val_loss: 1.4887 - val_accuracy: 0.6248
Epoch 2/5
24/24 [==============================] - 16s 696ms/step - loss: 1.3003 - accuracy: 0.6893 - val_loss: 1.1166 - val_accuracy: 0.7064
Epoch 3/5
24/24 [==============================] - 16s 702ms/step - loss: 1.0095 - accuracy: 0.7427 - val_loss: 0.9440 - val_accuracy: 0.7436
Epoch 4/5
24/24 [==============================] - 17s 714ms/step - loss: 0.8457 - accuracy: 0.7947 - val_loss: 0.8553 - val_accuracy: 0.7584
Epoch 5/5
24/24 [==============================] - 18s 756ms/step - loss: 0.7393 - accuracy: 0.8267 - val_loss: 0.7977 - val_accuracy: 0.7676
plot_loss_curves(efficientnet_history)

png

png

efficientnet_model.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 feature_extraction_layer (K  (None, 1280)             5919312
 erasLayer)

 output_layer (Dense)        (None, 10)                12810

=================================================================
Total params: 5,932,122
Trainable params: 12,810
Non-trainable params: 5,919,312
_________________________________________________________________

Comparing models using TensorBoard

Alright, even though we’ve already compared the performance of our two models by looking at the accuracy scores. But what if you had more than two models?

That’s where an experiment tracking tool like TensorBoard (preinstalled in Google Colab) comes in.

The good thing is, since we set up a TensorBoard callback, all of our model’s training logs have been saved automatically. To visualize them, we can upload the results to TensorBoard.dev.

Uploading your results to TensorBoard.dev enables you to track and share multiple different modelling experiments. So if you needed to show someone your results, you could send them a link to your TensorBoard.dev as well as the accompanying Colab notebook.

Uploading experiments to TensorBoard

To upload a series of TensorFlow logs to TensorBoard, we can use the following command:

# Upload TensorBoard dev records
!tensorboard dev upload --logdir ./tensorflow_hub/ \
  --name "EfficientNetB0 vs. ResNet50V2" \
  --description "Comparing two different TF Hub feature extraction models architectures using 10% of training images" \
  --one_shot
New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/2jpEI1v1SbW3IlqhMfDdTA/

[2022-09-08T15:43:39] Started scanning logdir.
[2022-09-08T15:43:41] Total uploaded: 60 scalars, 0 tensors, 2 binary objects (4.3 MB)
[2022-09-08T15:43:41] Done scanning logdir.


Done. View your TensorBoard at https://tensorboard.dev/experiment/2jpEI1v1SbW3IlqhMfDdTA/

Where:

  • --logdir is the target upload directory
  • --name is the name of the experiment
  • --description is a brief description of the experiment
  • --one_shot exits the TensorBoard uploader once uploading is finished

Every time you upload something to TensorBoad.dev you’ll get a new experiment ID. The experiment ID will look something like this: https://tensorboard.dev/experiment/73taSKxXQeGPQsNBcVvY3g/ (this is the actual experiment from this notebook).

If you upload the same directory again, you’ll get a new experiment ID to go along with it.

This means to track your experiments, you may want to look into how you name your uploads. That way when you find them on TensorBoard.dev you can tell what happened during each experiment (e.g. “efficientnet0_10_percent_data”).

Listing experiments you’ve saved to TensorBoard

To see all of the experiments you’ve uploaded you can use the command:

tensorboard dev list

# Check out experiments
!tensorboard dev list
https://tensorboard.dev/experiment/2jpEI1v1SbW3IlqhMfDdTA/
	Name                 EfficientNetB0 vs. ResNet50V2
	Description          Comparing two different TF Hub feature extraction models architectures using 10% of training images
	Id                   2jpEI1v1SbW3IlqhMfDdTA
	Created              2022-09-08 15:43:39 (15 seconds ago)
	Updated              2022-09-08 15:43:41 (13 seconds ago)
	Runs                 4
	Tags                 5
	Scalars              60
	Tensor bytes         0
	Binary object bytes  4498181
Total: 1 experiment(s)

Deleting experiments from TensorBoard

Remember, all uploads to TensorBoard.dev are public, so to delete an experiment you can use the command:

# Delete an experiment
!tensorboard dev delete --experiment_id 2jpEI1v1SbW3IlqhMfDdTA
Deleted experiment 2jpEI1v1SbW3IlqhMfDdTA.

# Check to see if experiments still exist
!tensorboard dev list
No experiments. Use `tensorboard dev upload` to get started.

Fine-tuning

In fine-tuning transfer learning the pre-trained model weights from another model are unfrozen and tweaked during to better suit your own data.

For feature extraction transfer learning, you may only train the top 1-3 layers of a pre-trained model with your own data, in fine-tuning transfer learning, you might train 1-3+ layers of a pre-trained model (where the ‘+’ indicates that many or all of the layers could be trained).

This section includes:

  • Using the Keras Functional API
  • Data augmentation
  • Running a series of modelling experiments on Food Vision data
    • Model 0: a transfer learning model using the Keras Functional API
    • Model 1: a feature extraction transfer learning model on 1% of the data with data augmentation
    • Model 2: a feature extraction transfer learning model on 10% of the data with data augmentation
    • Model 3: a fine-tuned transfer learning model on 10% of the data
    • Model 4: a fine-tuned transfer learning model on 100% of the data
  • Introduce the ModelCheckpoint callback to save intermediate training results
  • Compare model experiments results using TensorBoard

Load dataset using keras preprocessing

One of the main benefits of using tf.keras.prepreprocessing.image_dataset_from_directory() rather than ImageDataGenerator is that it creates a tf.data.Dataset object rather than a generator. The main advantage of this is the tf.data.Dataset API is much more efficient (faster) than the ImageDataGenerator API which is paramount for larger datasets.

# Create data inputs
import tensorflow as tf
IMG_SIZE = (224, 224) # define image size
train_data_10_percent_keras = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir,
                                                                            image_size=IMG_SIZE,
                                                                            label_mode="categorical", # what type are the labels?
                                                                            batch_size=32) # batch_size is 32 by default, this is generally a good number
test_data_10_percent_keras = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                                           image_size=IMG_SIZE,
                                                                           label_mode="categorical")
Found 750 files belonging to 10 classes.
Found 2500 files belonging to 10 classes.

Models

Model 0:

Building a transfer learning model using the Keras Functional API

We are going to be using the tf.keras.applications module as it contains a series of already trained (on ImageNet) computer vision models as well as the Keras Functional API to construct our model.

We’re going to go through the following steps:

  1. Instantiate a pre-trained base model object by choosing a target model such as EfficientNetB0 from tf.keras.applications, setting the include_top parameter to False (we do this because we’re going to create our own top, which are the output layers for the model).
  2. Set the base model’s trainable attribute to False to freeze all of the weights in the pre-trained model.
  3. Define an input layer for our model, for example, what shape of data should our model expect?
  4. [Optional] Normalize the inputs to our model if it requires. Some computer vision models such as ResNetV250 require their inputs to be between 0 & 1. (The EfficientNet models in the tf.keras.applications module do not require images to be normalized)
  5. Pass the inputs to the base model.
  6. Pool the outputs of the base model into a shape compatible with the output activation layer (turn base model output tensors into same shape as label tensors). This can be done using tf.keras.layers.GlobalAveragePooling2D() or tf.keras.layers.GlobalMaxPooling2D() though the former is more common in practice.
  7. Create an output activation layer using tf.keras.layers.Dense() with the appropriate activation function and number of neurons.
  8. Combine the inputs and outputs layer into a model using tf.keras.Model().
  9. Compile the model using the appropriate loss function and choose of optimizer.
  10. Fit the model for desired number of epochs and with necessary callbacks (in our case, we’ll start off with the TensorBoard callback).
# 1. Create base model with tf.keras.applications
base_model = tf.keras.applications.EfficientNetB0(include_top=False)

# 2. Freeze the base model (so the pre-learned patterns remain)
base_model.trainable = False

# 3. Create inputs into the base model
inputs = tf.keras.layers.Input(shape=(224, 224, 3), name="input_layer")

# 4. If using ResNet50V2, add this to speed up convergence, remove for EfficientNet
# x = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)(inputs)

# 5. Pass the inputs to the base_model (note: using tf.keras.applications, EfficientNet inputs don't have to be normalized)
x = base_model(inputs)
# Check data shape after passing it to base_model
print(f"Shape after base_model: {x.shape}")

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb0_notop.h5
16711680/16705208 [==============================] - 0s 0us/step
16719872/16705208 [==============================] - 0s 0us/step
Shape after base_model: (None, 7, 7, 1280)
# 6. Average pool the outputs of the base model (aggregate all the most important information, reduce number of computations)
x = tf.keras.layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)
print(f"After GlobalAveragePooling2D(): {x.shape}")
After GlobalAveragePooling2D(): (None, 1280)
# 7. Create the output activation layer
outputs = tf.keras.layers.Dense(10, activation="softmax", name="output_layer")(x)

# 8. Combine the inputs with the outputs into a model
model_0 = tf.keras.Model(inputs, outputs)

# 9. Compile the model
model_0.compile(loss='categorical_crossentropy',
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

# 10. Fit the model (we use less steps for validation so it's faster)
history_10_percent = model_0.fit(train_data_10_percent_keras,
                                 epochs=5,
                                 steps_per_epoch=len(train_data_10_percent_keras),
                                 validation_data=test_data_10_percent_keras,
                                 # Go through less of the validation data so epochs are faster (we want faster experiments!)
                                 validation_steps=int(0.25 * len(test_data_10_percent_keras)),
                                 # Track our model's training logs for visualization later
                                 callbacks=[create_tensorboard_callback("transfer_learning", "10_percent_feature_extract")])
Saving TensorBoard log files to: transfer_learning/10_percent_feature_extract/20220908-154439
Epoch 1/5
24/24 [==============================] - 15s 264ms/step - loss: 1.9712 - accuracy: 0.3533 - val_loss: 1.4307 - val_accuracy: 0.6513
Epoch 2/5
24/24 [==============================] - 4s 162ms/step - loss: 1.1733 - accuracy: 0.7440 - val_loss: 0.9911 - val_accuracy: 0.7845
Epoch 3/5
24/24 [==============================] - 5s 183ms/step - loss: 0.8505 - accuracy: 0.8080 - val_loss: 0.8082 - val_accuracy: 0.7977
Epoch 4/5
24/24 [==============================] - 5s 185ms/step - loss: 0.6894 - accuracy: 0.8347 - val_loss: 0.7080 - val_accuracy: 0.8158
Epoch 5/5
24/24 [==============================] - 5s 183ms/step - loss: 0.5875 - accuracy: 0.8667 - val_loss: 0.6411 - val_accuracy: 0.8273
base_model.summary()
Model: "efficientnetb0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to
==================================================================================================
 input_1 (InputLayer)           [(None, None, None,  0           []
                                 3)]

 rescaling (Rescaling)          (None, None, None,   0           ['input_1[0][0]']
                                3)

 normalization (Normalization)  (None, None, None,   7           ['rescaling[0][0]']
                                3)

 stem_conv_pad (ZeroPadding2D)  (None, None, None,   0           ['normalization[0][0]']
                                3)

 stem_conv (Conv2D)             (None, None, None,   864         ['stem_conv_pad[0][0]']
                                32)

 stem_bn (BatchNormalization)   (None, None, None,   128         ['stem_conv[0][0]']
                                32)

 stem_activation (Activation)   (None, None, None,   0           ['stem_bn[0][0]']
                                32)

 block1a_dwconv (DepthwiseConv2  (None, None, None,   288        ['stem_activation[0][0]']
 D)                             32)

 block1a_bn (BatchNormalization  (None, None, None,   128        ['block1a_dwconv[0][0]']
 )                              32)

 block1a_activation (Activation  (None, None, None,   0          ['block1a_bn[0][0]']
 )                              32)

 block1a_se_squeeze (GlobalAver  (None, 32)          0           ['block1a_activation[0][0]']
 agePooling2D)

 block1a_se_reshape (Reshape)   (None, 1, 1, 32)     0           ['block1a_se_squeeze[0][0]']

 block1a_se_reduce (Conv2D)     (None, 1, 1, 8)      264         ['block1a_se_reshape[0][0]']

 block1a_se_expand (Conv2D)     (None, 1, 1, 32)     288         ['block1a_se_reduce[0][0]']

 block1a_se_excite (Multiply)   (None, None, None,   0           ['block1a_activation[0][0]',
                                32)                               'block1a_se_expand[0][0]']

 block1a_project_conv (Conv2D)  (None, None, None,   512         ['block1a_se_excite[0][0]']
                                16)

 block1a_project_bn (BatchNorma  (None, None, None,   64         ['block1a_project_conv[0][0]']
 lization)                      16)

 block2a_expand_conv (Conv2D)   (None, None, None,   1536        ['block1a_project_bn[0][0]']
                                96)

 block2a_expand_bn (BatchNormal  (None, None, None,   384        ['block2a_expand_conv[0][0]']
 ization)                       96)

 block2a_expand_activation (Act  (None, None, None,   0          ['block2a_expand_bn[0][0]']
 ivation)                       96)

 block2a_dwconv_pad (ZeroPaddin  (None, None, None,   0          ['block2a_expand_activation[0][0]
 g2D)                           96)                              ']

 block2a_dwconv (DepthwiseConv2  (None, None, None,   864        ['block2a_dwconv_pad[0][0]']
 D)                             96)

 block2a_bn (BatchNormalization  (None, None, None,   384        ['block2a_dwconv[0][0]']
 )                              96)

 block2a_activation (Activation  (None, None, None,   0          ['block2a_bn[0][0]']
 )                              96)

 block2a_se_squeeze (GlobalAver  (None, 96)          0           ['block2a_activation[0][0]']
 agePooling2D)

 block2a_se_reshape (Reshape)   (None, 1, 1, 96)     0           ['block2a_se_squeeze[0][0]']

 block2a_se_reduce (Conv2D)     (None, 1, 1, 4)      388         ['block2a_se_reshape[0][0]']

 block2a_se_expand (Conv2D)     (None, 1, 1, 96)     480         ['block2a_se_reduce[0][0]']

 block2a_se_excite (Multiply)   (None, None, None,   0           ['block2a_activation[0][0]',
                                96)                               'block2a_se_expand[0][0]']

 block2a_project_conv (Conv2D)  (None, None, None,   2304        ['block2a_se_excite[0][0]']
                                24)

 block2a_project_bn (BatchNorma  (None, None, None,   96         ['block2a_project_conv[0][0]']
 lization)                      24)

 block2b_expand_conv (Conv2D)   (None, None, None,   3456        ['block2a_project_bn[0][0]']
                                144)

 block2b_expand_bn (BatchNormal  (None, None, None,   576        ['block2b_expand_conv[0][0]']
 ization)                       144)

 block2b_expand_activation (Act  (None, None, None,   0          ['block2b_expand_bn[0][0]']
 ivation)                       144)

 block2b_dwconv (DepthwiseConv2  (None, None, None,   1296       ['block2b_expand_activation[0][0]
 D)                             144)                             ']

 block2b_bn (BatchNormalization  (None, None, None,   576        ['block2b_dwconv[0][0]']
 )                              144)

 block2b_activation (Activation  (None, None, None,   0          ['block2b_bn[0][0]']
 )                              144)

 block2b_se_squeeze (GlobalAver  (None, 144)         0           ['block2b_activation[0][0]']
 agePooling2D)

 block2b_se_reshape (Reshape)   (None, 1, 1, 144)    0           ['block2b_se_squeeze[0][0]']

 block2b_se_reduce (Conv2D)     (None, 1, 1, 6)      870         ['block2b_se_reshape[0][0]']

 block2b_se_expand (Conv2D)     (None, 1, 1, 144)    1008        ['block2b_se_reduce[0][0]']

 block2b_se_excite (Multiply)   (None, None, None,   0           ['block2b_activation[0][0]',
                                144)                              'block2b_se_expand[0][0]']

 block2b_project_conv (Conv2D)  (None, None, None,   3456        ['block2b_se_excite[0][0]']
                                24)

 block2b_project_bn (BatchNorma  (None, None, None,   96         ['block2b_project_conv[0][0]']
 lization)                      24)

 block2b_drop (Dropout)         (None, None, None,   0           ['block2b_project_bn[0][0]']
                                24)

 block2b_add (Add)              (None, None, None,   0           ['block2b_drop[0][0]',
                                24)                               'block2a_project_bn[0][0]']

 block3a_expand_conv (Conv2D)   (None, None, None,   3456        ['block2b_add[0][0]']
                                144)

 block3a_expand_bn (BatchNormal  (None, None, None,   576        ['block3a_expand_conv[0][0]']
 ization)                       144)

 block3a_expand_activation (Act  (None, None, None,   0          ['block3a_expand_bn[0][0]']
 ivation)                       144)

 block3a_dwconv_pad (ZeroPaddin  (None, None, None,   0          ['block3a_expand_activation[0][0]
 g2D)                           144)                             ']

 block3a_dwconv (DepthwiseConv2  (None, None, None,   3600       ['block3a_dwconv_pad[0][0]']
 D)                             144)

 block3a_bn (BatchNormalization  (None, None, None,   576        ['block3a_dwconv[0][0]']
 )                              144)

 block3a_activation (Activation  (None, None, None,   0          ['block3a_bn[0][0]']
 )                              144)

 block3a_se_squeeze (GlobalAver  (None, 144)         0           ['block3a_activation[0][0]']
 agePooling2D)

 block3a_se_reshape (Reshape)   (None, 1, 1, 144)    0           ['block3a_se_squeeze[0][0]']

 block3a_se_reduce (Conv2D)     (None, 1, 1, 6)      870         ['block3a_se_reshape[0][0]']

 block3a_se_expand (Conv2D)     (None, 1, 1, 144)    1008        ['block3a_se_reduce[0][0]']

 block3a_se_excite (Multiply)   (None, None, None,   0           ['block3a_activation[0][0]',
                                144)                              'block3a_se_expand[0][0]']

 block3a_project_conv (Conv2D)  (None, None, None,   5760        ['block3a_se_excite[0][0]']
                                40)

 block3a_project_bn (BatchNorma  (None, None, None,   160        ['block3a_project_conv[0][0]']
 lization)                      40)

 block3b_expand_conv (Conv2D)   (None, None, None,   9600        ['block3a_project_bn[0][0]']
                                240)

 block3b_expand_bn (BatchNormal  (None, None, None,   960        ['block3b_expand_conv[0][0]']
 ization)                       240)

 block3b_expand_activation (Act  (None, None, None,   0          ['block3b_expand_bn[0][0]']
 ivation)                       240)

 block3b_dwconv (DepthwiseConv2  (None, None, None,   6000       ['block3b_expand_activation[0][0]
 D)                             240)                             ']

 block3b_bn (BatchNormalization  (None, None, None,   960        ['block3b_dwconv[0][0]']
 )                              240)

 block3b_activation (Activation  (None, None, None,   0          ['block3b_bn[0][0]']
 )                              240)

 block3b_se_squeeze (GlobalAver  (None, 240)         0           ['block3b_activation[0][0]']
 agePooling2D)

 block3b_se_reshape (Reshape)   (None, 1, 1, 240)    0           ['block3b_se_squeeze[0][0]']

 block3b_se_reduce (Conv2D)     (None, 1, 1, 10)     2410        ['block3b_se_reshape[0][0]']

 block3b_se_expand (Conv2D)     (None, 1, 1, 240)    2640        ['block3b_se_reduce[0][0]']

 block3b_se_excite (Multiply)   (None, None, None,   0           ['block3b_activation[0][0]',
                                240)                              'block3b_se_expand[0][0]']

 block3b_project_conv (Conv2D)  (None, None, None,   9600        ['block3b_se_excite[0][0]']
                                40)

 block3b_project_bn (BatchNorma  (None, None, None,   160        ['block3b_project_conv[0][0]']
 lization)                      40)

 block3b_drop (Dropout)         (None, None, None,   0           ['block3b_project_bn[0][0]']
                                40)

 block3b_add (Add)              (None, None, None,   0           ['block3b_drop[0][0]',
                                40)                               'block3a_project_bn[0][0]']

 block4a_expand_conv (Conv2D)   (None, None, None,   9600        ['block3b_add[0][0]']
                                240)

 block4a_expand_bn (BatchNormal  (None, None, None,   960        ['block4a_expand_conv[0][0]']
 ization)                       240)

 block4a_expand_activation (Act  (None, None, None,   0          ['block4a_expand_bn[0][0]']
 ivation)                       240)

 block4a_dwconv_pad (ZeroPaddin  (None, None, None,   0          ['block4a_expand_activation[0][0]
 g2D)                           240)                             ']

 block4a_dwconv (DepthwiseConv2  (None, None, None,   2160       ['block4a_dwconv_pad[0][0]']
 D)                             240)

 block4a_bn (BatchNormalization  (None, None, None,   960        ['block4a_dwconv[0][0]']
 )                              240)

 block4a_activation (Activation  (None, None, None,   0          ['block4a_bn[0][0]']
 )                              240)

 block4a_se_squeeze (GlobalAver  (None, 240)         0           ['block4a_activation[0][0]']
 agePooling2D)

 block4a_se_reshape (Reshape)   (None, 1, 1, 240)    0           ['block4a_se_squeeze[0][0]']

 block4a_se_reduce (Conv2D)     (None, 1, 1, 10)     2410        ['block4a_se_reshape[0][0]']

 block4a_se_expand (Conv2D)     (None, 1, 1, 240)    2640        ['block4a_se_reduce[0][0]']

 block4a_se_excite (Multiply)   (None, None, None,   0           ['block4a_activation[0][0]',
                                240)                              'block4a_se_expand[0][0]']

 block4a_project_conv (Conv2D)  (None, None, None,   19200       ['block4a_se_excite[0][0]']
                                80)

 block4a_project_bn (BatchNorma  (None, None, None,   320        ['block4a_project_conv[0][0]']
 lization)                      80)

 block4b_expand_conv (Conv2D)   (None, None, None,   38400       ['block4a_project_bn[0][0]']
                                480)

 block4b_expand_bn (BatchNormal  (None, None, None,   1920       ['block4b_expand_conv[0][0]']
 ization)                       480)

 block4b_expand_activation (Act  (None, None, None,   0          ['block4b_expand_bn[0][0]']
 ivation)                       480)

 block4b_dwconv (DepthwiseConv2  (None, None, None,   4320       ['block4b_expand_activation[0][0]
 D)                             480)                             ']

 block4b_bn (BatchNormalization  (None, None, None,   1920       ['block4b_dwconv[0][0]']
 )                              480)

 block4b_activation (Activation  (None, None, None,   0          ['block4b_bn[0][0]']
 )                              480)

 block4b_se_squeeze (GlobalAver  (None, 480)         0           ['block4b_activation[0][0]']
 agePooling2D)

 block4b_se_reshape (Reshape)   (None, 1, 1, 480)    0           ['block4b_se_squeeze[0][0]']

 block4b_se_reduce (Conv2D)     (None, 1, 1, 20)     9620        ['block4b_se_reshape[0][0]']

 block4b_se_expand (Conv2D)     (None, 1, 1, 480)    10080       ['block4b_se_reduce[0][0]']

 block4b_se_excite (Multiply)   (None, None, None,   0           ['block4b_activation[0][0]',
                                480)                              'block4b_se_expand[0][0]']

 block4b_project_conv (Conv2D)  (None, None, None,   38400       ['block4b_se_excite[0][0]']
                                80)

 block4b_project_bn (BatchNorma  (None, None, None,   320        ['block4b_project_conv[0][0]']
 lization)                      80)

 block4b_drop (Dropout)         (None, None, None,   0           ['block4b_project_bn[0][0]']
                                80)

 block4b_add (Add)              (None, None, None,   0           ['block4b_drop[0][0]',
                                80)                               'block4a_project_bn[0][0]']

 block4c_expand_conv (Conv2D)   (None, None, None,   38400       ['block4b_add[0][0]']
                                480)

 block4c_expand_bn (BatchNormal  (None, None, None,   1920       ['block4c_expand_conv[0][0]']
 ization)                       480)

 block4c_expand_activation (Act  (None, None, None,   0          ['block4c_expand_bn[0][0]']
 ivation)                       480)

 block4c_dwconv (DepthwiseConv2  (None, None, None,   4320       ['block4c_expand_activation[0][0]
 D)                             480)                             ']

 block4c_bn (BatchNormalization  (None, None, None,   1920       ['block4c_dwconv[0][0]']
 )                              480)

 block4c_activation (Activation  (None, None, None,   0          ['block4c_bn[0][0]']
 )                              480)

 block4c_se_squeeze (GlobalAver  (None, 480)         0           ['block4c_activation[0][0]']
 agePooling2D)

 block4c_se_reshape (Reshape)   (None, 1, 1, 480)    0           ['block4c_se_squeeze[0][0]']

 block4c_se_reduce (Conv2D)     (None, 1, 1, 20)     9620        ['block4c_se_reshape[0][0]']

 block4c_se_expand (Conv2D)     (None, 1, 1, 480)    10080       ['block4c_se_reduce[0][0]']

 block4c_se_excite (Multiply)   (None, None, None,   0           ['block4c_activation[0][0]',
                                480)                              'block4c_se_expand[0][0]']

 block4c_project_conv (Conv2D)  (None, None, None,   38400       ['block4c_se_excite[0][0]']
                                80)

 block4c_project_bn (BatchNorma  (None, None, None,   320        ['block4c_project_conv[0][0]']
 lization)                      80)

 block4c_drop (Dropout)         (None, None, None,   0           ['block4c_project_bn[0][0]']
                                80)

 block4c_add (Add)              (None, None, None,   0           ['block4c_drop[0][0]',
                                80)                               'block4b_add[0][0]']

 block5a_expand_conv (Conv2D)   (None, None, None,   38400       ['block4c_add[0][0]']
                                480)

 block5a_expand_bn (BatchNormal  (None, None, None,   1920       ['block5a_expand_conv[0][0]']
 ization)                       480)

 block5a_expand_activation (Act  (None, None, None,   0          ['block5a_expand_bn[0][0]']
 ivation)                       480)

 block5a_dwconv (DepthwiseConv2  (None, None, None,   12000      ['block5a_expand_activation[0][0]
 D)                             480)                             ']

 block5a_bn (BatchNormalization  (None, None, None,   1920       ['block5a_dwconv[0][0]']
 )                              480)

 block5a_activation (Activation  (None, None, None,   0          ['block5a_bn[0][0]']
 )                              480)

 block5a_se_squeeze (GlobalAver  (None, 480)         0           ['block5a_activation[0][0]']
 agePooling2D)

 block5a_se_reshape (Reshape)   (None, 1, 1, 480)    0           ['block5a_se_squeeze[0][0]']

 block5a_se_reduce (Conv2D)     (None, 1, 1, 20)     9620        ['block5a_se_reshape[0][0]']

 block5a_se_expand (Conv2D)     (None, 1, 1, 480)    10080       ['block5a_se_reduce[0][0]']

 block5a_se_excite (Multiply)   (None, None, None,   0           ['block5a_activation[0][0]',
                                480)                              'block5a_se_expand[0][0]']

 block5a_project_conv (Conv2D)  (None, None, None,   53760       ['block5a_se_excite[0][0]']
                                112)

 block5a_project_bn (BatchNorma  (None, None, None,   448        ['block5a_project_conv[0][0]']
 lization)                      112)

 block5b_expand_conv (Conv2D)   (None, None, None,   75264       ['block5a_project_bn[0][0]']
                                672)

 block5b_expand_bn (BatchNormal  (None, None, None,   2688       ['block5b_expand_conv[0][0]']
 ization)                       672)

 block5b_expand_activation (Act  (None, None, None,   0          ['block5b_expand_bn[0][0]']
 ivation)                       672)

 block5b_dwconv (DepthwiseConv2  (None, None, None,   16800      ['block5b_expand_activation[0][0]
 D)                             672)                             ']

 block5b_bn (BatchNormalization  (None, None, None,   2688       ['block5b_dwconv[0][0]']
 )                              672)

 block5b_activation (Activation  (None, None, None,   0          ['block5b_bn[0][0]']
 )                              672)

 block5b_se_squeeze (GlobalAver  (None, 672)         0           ['block5b_activation[0][0]']
 agePooling2D)

 block5b_se_reshape (Reshape)   (None, 1, 1, 672)    0           ['block5b_se_squeeze[0][0]']

 block5b_se_reduce (Conv2D)     (None, 1, 1, 28)     18844       ['block5b_se_reshape[0][0]']

 block5b_se_expand (Conv2D)     (None, 1, 1, 672)    19488       ['block5b_se_reduce[0][0]']

 block5b_se_excite (Multiply)   (None, None, None,   0           ['block5b_activation[0][0]',
                                672)                              'block5b_se_expand[0][0]']

 block5b_project_conv (Conv2D)  (None, None, None,   75264       ['block5b_se_excite[0][0]']
                                112)

 block5b_project_bn (BatchNorma  (None, None, None,   448        ['block5b_project_conv[0][0]']
 lization)                      112)

 block5b_drop (Dropout)         (None, None, None,   0           ['block5b_project_bn[0][0]']
                                112)

 block5b_add (Add)              (None, None, None,   0           ['block5b_drop[0][0]',
                                112)                              'block5a_project_bn[0][0]']

 block5c_expand_conv (Conv2D)   (None, None, None,   75264       ['block5b_add[0][0]']
                                672)

 block5c_expand_bn (BatchNormal  (None, None, None,   2688       ['block5c_expand_conv[0][0]']
 ization)                       672)

 block5c_expand_activation (Act  (None, None, None,   0          ['block5c_expand_bn[0][0]']
 ivation)                       672)

 block5c_dwconv (DepthwiseConv2  (None, None, None,   16800      ['block5c_expand_activation[0][0]
 D)                             672)                             ']

 block5c_bn (BatchNormalization  (None, None, None,   2688       ['block5c_dwconv[0][0]']
 )                              672)

 block5c_activation (Activation  (None, None, None,   0          ['block5c_bn[0][0]']
 )                              672)

 block5c_se_squeeze (GlobalAver  (None, 672)         0           ['block5c_activation[0][0]']
 agePooling2D)

 block5c_se_reshape (Reshape)   (None, 1, 1, 672)    0           ['block5c_se_squeeze[0][0]']

 block5c_se_reduce (Conv2D)     (None, 1, 1, 28)     18844       ['block5c_se_reshape[0][0]']

 block5c_se_expand (Conv2D)     (None, 1, 1, 672)    19488       ['block5c_se_reduce[0][0]']

 block5c_se_excite (Multiply)   (None, None, None,   0           ['block5c_activation[0][0]',
                                672)                              'block5c_se_expand[0][0]']

 block5c_project_conv (Conv2D)  (None, None, None,   75264       ['block5c_se_excite[0][0]']
                                112)

 block5c_project_bn (BatchNorma  (None, None, None,   448        ['block5c_project_conv[0][0]']
 lization)                      112)

 block5c_drop (Dropout)         (None, None, None,   0           ['block5c_project_bn[0][0]']
                                112)

 block5c_add (Add)              (None, None, None,   0           ['block5c_drop[0][0]',
                                112)                              'block5b_add[0][0]']

 block6a_expand_conv (Conv2D)   (None, None, None,   75264       ['block5c_add[0][0]']
                                672)

 block6a_expand_bn (BatchNormal  (None, None, None,   2688       ['block6a_expand_conv[0][0]']
 ization)                       672)

 block6a_expand_activation (Act  (None, None, None,   0          ['block6a_expand_bn[0][0]']
 ivation)                       672)

 block6a_dwconv_pad (ZeroPaddin  (None, None, None,   0          ['block6a_expand_activation[0][0]
 g2D)                           672)                             ']

 block6a_dwconv (DepthwiseConv2  (None, None, None,   16800      ['block6a_dwconv_pad[0][0]']
 D)                             672)

 block6a_bn (BatchNormalization  (None, None, None,   2688       ['block6a_dwconv[0][0]']
 )                              672)

 block6a_activation (Activation  (None, None, None,   0          ['block6a_bn[0][0]']
 )                              672)

 block6a_se_squeeze (GlobalAver  (None, 672)         0           ['block6a_activation[0][0]']
 agePooling2D)

 block6a_se_reshape (Reshape)   (None, 1, 1, 672)    0           ['block6a_se_squeeze[0][0]']

 block6a_se_reduce (Conv2D)     (None, 1, 1, 28)     18844       ['block6a_se_reshape[0][0]']

 block6a_se_expand (Conv2D)     (None, 1, 1, 672)    19488       ['block6a_se_reduce[0][0]']

 block6a_se_excite (Multiply)   (None, None, None,   0           ['block6a_activation[0][0]',
                                672)                              'block6a_se_expand[0][0]']

 block6a_project_conv (Conv2D)  (None, None, None,   129024      ['block6a_se_excite[0][0]']
                                192)

 block6a_project_bn (BatchNorma  (None, None, None,   768        ['block6a_project_conv[0][0]']
 lization)                      192)

 block6b_expand_conv (Conv2D)   (None, None, None,   221184      ['block6a_project_bn[0][0]']
                                1152)

 block6b_expand_bn (BatchNormal  (None, None, None,   4608       ['block6b_expand_conv[0][0]']
 ization)                       1152)

 block6b_expand_activation (Act  (None, None, None,   0          ['block6b_expand_bn[0][0]']
 ivation)                       1152)

 block6b_dwconv (DepthwiseConv2  (None, None, None,   28800      ['block6b_expand_activation[0][0]
 D)                             1152)                            ']

 block6b_bn (BatchNormalization  (None, None, None,   4608       ['block6b_dwconv[0][0]']
 )                              1152)

 block6b_activation (Activation  (None, None, None,   0          ['block6b_bn[0][0]']
 )                              1152)

 block6b_se_squeeze (GlobalAver  (None, 1152)        0           ['block6b_activation[0][0]']
 agePooling2D)

 block6b_se_reshape (Reshape)   (None, 1, 1, 1152)   0           ['block6b_se_squeeze[0][0]']

 block6b_se_reduce (Conv2D)     (None, 1, 1, 48)     55344       ['block6b_se_reshape[0][0]']

 block6b_se_expand (Conv2D)     (None, 1, 1, 1152)   56448       ['block6b_se_reduce[0][0]']

 block6b_se_excite (Multiply)   (None, None, None,   0           ['block6b_activation[0][0]',
                                1152)                             'block6b_se_expand[0][0]']

 block6b_project_conv (Conv2D)  (None, None, None,   221184      ['block6b_se_excite[0][0]']
                                192)

 block6b_project_bn (BatchNorma  (None, None, None,   768        ['block6b_project_conv[0][0]']
 lization)                      192)

 block6b_drop (Dropout)         (None, None, None,   0           ['block6b_project_bn[0][0]']
                                192)

 block6b_add (Add)              (None, None, None,   0           ['block6b_drop[0][0]',
                                192)                              'block6a_project_bn[0][0]']

 block6c_expand_conv (Conv2D)   (None, None, None,   221184      ['block6b_add[0][0]']
                                1152)

 block6c_expand_bn (BatchNormal  (None, None, None,   4608       ['block6c_expand_conv[0][0]']
 ization)                       1152)

 block6c_expand_activation (Act  (None, None, None,   0          ['block6c_expand_bn[0][0]']
 ivation)                       1152)

 block6c_dwconv (DepthwiseConv2  (None, None, None,   28800      ['block6c_expand_activation[0][0]
 D)                             1152)                            ']

 block6c_bn (BatchNormalization  (None, None, None,   4608       ['block6c_dwconv[0][0]']
 )                              1152)

 block6c_activation (Activation  (None, None, None,   0          ['block6c_bn[0][0]']
 )                              1152)

 block6c_se_squeeze (GlobalAver  (None, 1152)        0           ['block6c_activation[0][0]']
 agePooling2D)

 block6c_se_reshape (Reshape)   (None, 1, 1, 1152)   0           ['block6c_se_squeeze[0][0]']

 block6c_se_reduce (Conv2D)     (None, 1, 1, 48)     55344       ['block6c_se_reshape[0][0]']

 block6c_se_expand (Conv2D)     (None, 1, 1, 1152)   56448       ['block6c_se_reduce[0][0]']

 block6c_se_excite (Multiply)   (None, None, None,   0           ['block6c_activation[0][0]',
                                1152)                             'block6c_se_expand[0][0]']

 block6c_project_conv (Conv2D)  (None, None, None,   221184      ['block6c_se_excite[0][0]']
                                192)

 block6c_project_bn (BatchNorma  (None, None, None,   768        ['block6c_project_conv[0][0]']
 lization)                      192)

 block6c_drop (Dropout)         (None, None, None,   0           ['block6c_project_bn[0][0]']
                                192)

 block6c_add (Add)              (None, None, None,   0           ['block6c_drop[0][0]',
                                192)                              'block6b_add[0][0]']

 block6d_expand_conv (Conv2D)   (None, None, None,   221184      ['block6c_add[0][0]']
                                1152)

 block6d_expand_bn (BatchNormal  (None, None, None,   4608       ['block6d_expand_conv[0][0]']
 ization)                       1152)

 block6d_expand_activation (Act  (None, None, None,   0          ['block6d_expand_bn[0][0]']
 ivation)                       1152)

 block6d_dwconv (DepthwiseConv2  (None, None, None,   28800      ['block6d_expand_activation[0][0]
 D)                             1152)                            ']

 block6d_bn (BatchNormalization  (None, None, None,   4608       ['block6d_dwconv[0][0]']
 )                              1152)

 block6d_activation (Activation  (None, None, None,   0          ['block6d_bn[0][0]']
 )                              1152)

 block6d_se_squeeze (GlobalAver  (None, 1152)        0           ['block6d_activation[0][0]']
 agePooling2D)

 block6d_se_reshape (Reshape)   (None, 1, 1, 1152)   0           ['block6d_se_squeeze[0][0]']

 block6d_se_reduce (Conv2D)     (None, 1, 1, 48)     55344       ['block6d_se_reshape[0][0]']

 block6d_se_expand (Conv2D)     (None, 1, 1, 1152)   56448       ['block6d_se_reduce[0][0]']

 block6d_se_excite (Multiply)   (None, None, None,   0           ['block6d_activation[0][0]',
                                1152)                             'block6d_se_expand[0][0]']

 block6d_project_conv (Conv2D)  (None, None, None,   221184      ['block6d_se_excite[0][0]']
                                192)

 block6d_project_bn (BatchNorma  (None, None, None,   768        ['block6d_project_conv[0][0]']
 lization)                      192)

 block6d_drop (Dropout)         (None, None, None,   0           ['block6d_project_bn[0][0]']
                                192)

 block6d_add (Add)              (None, None, None,   0           ['block6d_drop[0][0]',
                                192)                              'block6c_add[0][0]']

 block7a_expand_conv (Conv2D)   (None, None, None,   221184      ['block6d_add[0][0]']
                                1152)

 block7a_expand_bn (BatchNormal  (None, None, None,   4608       ['block7a_expand_conv[0][0]']
 ization)                       1152)

 block7a_expand_activation (Act  (None, None, None,   0          ['block7a_expand_bn[0][0]']
 ivation)                       1152)

 block7a_dwconv (DepthwiseConv2  (None, None, None,   10368      ['block7a_expand_activation[0][0]
 D)                             1152)                            ']

 block7a_bn (BatchNormalization  (None, None, None,   4608       ['block7a_dwconv[0][0]']
 )                              1152)

 block7a_activation (Activation  (None, None, None,   0          ['block7a_bn[0][0]']
 )                              1152)

 block7a_se_squeeze (GlobalAver  (None, 1152)        0           ['block7a_activation[0][0]']
 agePooling2D)

 block7a_se_reshape (Reshape)   (None, 1, 1, 1152)   0           ['block7a_se_squeeze[0][0]']

 block7a_se_reduce (Conv2D)     (None, 1, 1, 48)     55344       ['block7a_se_reshape[0][0]']

 block7a_se_expand (Conv2D)     (None, 1, 1, 1152)   56448       ['block7a_se_reduce[0][0]']

 block7a_se_excite (Multiply)   (None, None, None,   0           ['block7a_activation[0][0]',
                                1152)                             'block7a_se_expand[0][0]']

 block7a_project_conv (Conv2D)  (None, None, None,   368640      ['block7a_se_excite[0][0]']
                                320)

 block7a_project_bn (BatchNorma  (None, None, None,   1280       ['block7a_project_conv[0][0]']
 lization)                      320)

 top_conv (Conv2D)              (None, None, None,   409600      ['block7a_project_bn[0][0]']
                                1280)

 top_bn (BatchNormalization)    (None, None, None,   5120        ['top_conv[0][0]']
                                1280)

 top_activation (Activation)    (None, None, None,   0           ['top_bn[0][0]']
                                1280)

==================================================================================================
Total params: 4,049,571
Trainable params: 0
Non-trainable params: 4,049,571
__________________________________________________________________________________________________

You can see how each of the different layers have a certain number of parameters each. Since we are using a pre-trained model, you can think of all of these parameters are patterns the base model has learned on another dataset. And because we set base_model.trainable = False, these patterns remain as they are during training (they’re frozen and don’t get updated).

# Check summary of model constructed with Functional API
model_0.summary()
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0

 efficientnetb0 (Functional)  (None, None, None, 1280)  4049571

 global_average_pooling_laye  (None, 1280)             0
 r (GlobalAveragePooling2D)

 output_layer (Dense)        (None, 10)                12810

=================================================================
Total params: 4,062,381
Trainable params: 12,810
Non-trainable params: 4,049,571
_________________________________________________________________

You can see how the output shape started out as (None, 224, 224, 3) for the input layer (the shape of our images) but was transformed to be (None, 10) by the output layer (the shape of our labels), where None is the placeholder for the batch size.

Notice too, the only trainable parameters in the model are those in the output layer.

# Check out our model's training curves
plot_loss_curves(history_10_percent)

png

png

The tf.keras.layers.GlobalAveragePooling2D() layer transforms a 4D tensor into a 2D tensor by averaging the values across the inner-axes.

model 1

Use feature extraction transfer learning on 1% of the training data with data augmentation.

# Download and unzip data
!wget https://gitlab.com/arminny/ml_course_datasets/-/raw/main/10_food_classes_1_percent.zip
unzip_data("10_food_classes_1_percent.zip")

# Create training and test dirs
train_dir_1_percent = "10_food_classes_1_percent/train/"
test_dir = "10_food_classes_1_percent/test/"
--2022-09-08 15:45:20--  https://gitlab.com/arminny/ml_course_datasets/-/raw/main/10_food_classes_1_percent.zip
Resolving gitlab.com (gitlab.com)... 172.65.251.78, 2606:4700:90:0:f22e:fbec:5bed:a9b9
Connecting to gitlab.com (gitlab.com)|172.65.251.78|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 133612354 (127M) [application/octet-stream]
Saving to: ‘10_food_classes_1_percent.zip’

10_food_classes_1_p 100%[===================>] 127.42M  63.7MB/s    in 2.0s

2022-09-08 15:45:22 (63.7 MB/s) - ‘10_food_classes_1_percent.zip’ saved [133612354/133612354]
# Walk through 1 percent data directory and list number of files
walk_through_dir("10_food_classes_1_percent")
There are 2 directories and 0 images in '10_food_classes_1_percent'.
There are 10 directories and 0 images in '10_food_classes_1_percent/test'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/pizza'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/chicken_wings'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/grilled_salmon'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/hamburger'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/chicken_curry'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/fried_rice'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/steak'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/ramen'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/sushi'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/ice_cream'.
There are 10 directories and 0 images in '10_food_classes_1_percent/train'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/pizza'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/chicken_wings'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/grilled_salmon'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/hamburger'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/chicken_curry'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/fried_rice'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/steak'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/ramen'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/sushi'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/ice_cream'.
import tensorflow as tf
IMG_SIZE = (224, 224)
train_data_1_percent = tf.keras.preprocessing.image_dataset_from_directory(train_dir_1_percent,
                                                                           label_mode="categorical",
                                                                           batch_size=32, # default
                                                                           image_size=IMG_SIZE)
test_data = tf.keras.preprocessing.image_dataset_from_directory(test_dir,
                                                                label_mode="categorical",
                                                                image_size=IMG_SIZE)
Found 70 files belonging to 10 classes.
Found 2500 files belonging to 10 classes.

Adding data augmentation right into the model

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

# Create a data augmentation stage with horizontal flipping, rotations, zooms
data_augmentation = keras.Sequential([
  preprocessing.RandomFlip("horizontal"),
  preprocessing.RandomRotation(0.2),
  preprocessing.RandomZoom(0.2),
  preprocessing.RandomHeight(0.2),
  preprocessing.RandomWidth(0.2),
  # preprocessing.Rescaling(1./255) # keep for ResNet50V2, remove for EfficientNetB0
], name ="data_augmentation")
# Setup input shape and base model, freezing the base model layers
input_shape = (224, 224, 3)
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable = False

# Create input layer
inputs = layers.Input(shape=input_shape, name="input_layer")

# Add in data augmentation Sequential model as a layer
x = data_augmentation(inputs)

# Give base_model inputs (after augmentation) and don't train it
x = base_model(x, training=False)

# Pool output features of base model
x = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)

# Put a dense layer on as the output
outputs = layers.Dense(10, activation="softmax", name="output_layer")(x)

# Make a model with inputs and outputs
model_1 = keras.Model(inputs, outputs)

# Compile the model
model_1.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

# Fit the model
history_1_percent = model_1.fit(train_data_1_percent,
                    epochs=5,
                    steps_per_epoch=len(train_data_1_percent),
                    validation_data=test_data,
                    validation_steps=int(0.25* len(test_data)), # validate for less steps
                    # Track model training logs
                    callbacks=[create_tensorboard_callback("transfer_learning", "1_percent_data_aug")])

# Check out model summary
model_1.summary()

# How does the model go with a data augmentation layer with 1% of data
plot_loss_curves(history_1_percent)
Saving TensorBoard log files to: transfer_learning/1_percent_data_aug/20220908-154806
Epoch 1/5
3/3 [==============================] - 13s 2s/step - loss: 2.4299 - accuracy: 0.1000 - val_loss: 2.2001 - val_accuracy: 0.2039
Epoch 2/5
3/3 [==============================] - 3s 1s/step - loss: 2.2319 - accuracy: 0.1429 - val_loss: 2.0808 - val_accuracy: 0.2664
Epoch 3/5
3/3 [==============================] - 3s 1s/step - loss: 2.0314 - accuracy: 0.3143 - val_loss: 1.9581 - val_accuracy: 0.3586
Epoch 4/5
3/3 [==============================] - 3s 1s/step - loss: 1.7995 - accuracy: 0.4714 - val_loss: 1.8675 - val_accuracy: 0.4293
Epoch 5/5
3/3 [==============================] - 3s 1s/step - loss: 1.6567 - accuracy: 0.6286 - val_loss: 1.7885 - val_accuracy: 0.4753
Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0

 data_augmentation (Sequenti  (None, 224, 224, 3)      0
 al)

 efficientnetb0 (Functional)  (None, None, None, 1280)  4049571

 global_average_pooling_laye  (None, 1280)             0
 r (GlobalAveragePooling2D)

 output_layer (Dense)        (None, 10)                12810

=================================================================
Total params: 4,062,381
Trainable params: 12,810
Non-trainable params: 4,049,571
_________________________________________________________________

png

png

It looks like the metrics on both datasets would improve if we kept training for more epochs. But we’ll leave that for now, we’ve got more experiments to do!

model 2

Use feature extraction transfer learning on 10% of the training data with data augmentation.

# Get 10% of the data of the 10 classes (uncomment if you haven't gotten "10_food_classes_10_percent.zip" already)
# !wget https://gitlab.com/arminny/ml_course_datasets/-/raw/main/10_food_classes_10_percent.zip
# unzip_data("10_food_classes_10_percent.zip")

train_dir_10_percent = "10_food_classes_10_percent/train/"
test_dir = "10_food_classes_10_percent/test/"
# Setup data inputs
import tensorflow as tf
IMG_SIZE = (224, 224)
train_data_10_percent = tf.keras.preprocessing.image_dataset_from_directory(train_dir_10_percent,
                                                                            label_mode="categorical",
                                                                            image_size=IMG_SIZE)
# Note: the test data is the same as the previous experiment, we could
# skip creating this, but we'll leave this here to practice.
test_data = tf.keras.preprocessing.image_dataset_from_directory(test_dir,
                                                                label_mode="categorical",
                                                                image_size=IMG_SIZE)
Found 750 files belonging to 10 classes.
Found 2500 files belonging to 10 classes.
# Create a functional model with data augmentation
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras.models import Sequential

# Build data augmentation layer
data_augmentation = Sequential([
  preprocessing.RandomFlip('horizontal'),
  preprocessing.RandomHeight(0.2),
  preprocessing.RandomWidth(0.2),
  preprocessing.RandomZoom(0.2),
  preprocessing.RandomRotation(0.2),
  # preprocessing.Rescaling(1./255) # keep for ResNet50V2, remove for EfficientNet
], name="data_augmentation")

# Setup the input shape to our model
input_shape = (224, 224, 3)

# Create a frozen base model
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable = False

# Create input and output layers
inputs = layers.Input(shape=input_shape, name="input_layer") # create input layer
x = data_augmentation(inputs) # augment our training images
x = base_model(x, training=False) # pass augmented images to base model but keep it in inference mode, so batchnorm layers don't get updated: https://keras.io/guides/transfer_learning/#build-a-model
x = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)
outputs = layers.Dense(10, activation="softmax", name="output_layer")(x)
model_2 = tf.keras.Model(inputs, outputs)

# Compile
model_2.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(lr=0.001), # use Adam optimizer with base learning rate
              metrics=["accuracy"])
/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/adam.py:105: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  super(Adam, self).__init__(name, **kwargs)
# Setup checkpoint path
checkpoint_path = "ten_percent_model_checkpoints_weights/checkpoint.ckpt" # note: remember saving directly to Colab is temporary

# Create a ModelCheckpoint callback that saves the model's weights only
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                         save_weights_only=True, # set to False to save the entire model
                                                         save_best_only=False, # set to True to save only the best model instead of a model every epoch
                                                         save_freq="epoch", # save every epoch
                                                         verbose=1)
# Fit the model saving checkpoints every epoch
initial_epochs = 5
history_10_percent_data_aug = model_2.fit(train_data_10_percent,
                                          epochs=initial_epochs,
                                          validation_data=test_data,
                                          validation_steps=int(0.25 * len(test_data)), # do less steps per validation (quicker)
                                          callbacks=[create_tensorboard_callback("transfer_learning", "10_percent_data_aug"),
                                                     checkpoint_callback])
Saving TensorBoard log files to: transfer_learning/10_percent_data_aug/20220908-154838
Epoch 1/5
24/24 [==============================] - ETA: 0s - loss: 1.9781 - accuracy: 0.3520
Epoch 1: saving model to ten_percent_model_checkpoints_weights/checkpoint.ckpt
24/24 [==============================] - 16s 340ms/step - loss: 1.9781 - accuracy: 0.3520 - val_loss: 1.4744 - val_accuracy: 0.6579
Epoch 2/5
24/24 [==============================] - ETA: 0s - loss: 1.3687 - accuracy: 0.6640
Epoch 2: saving model to ten_percent_model_checkpoints_weights/checkpoint.ckpt
24/24 [==============================] - 6s 247ms/step - loss: 1.3687 - accuracy: 0.6640 - val_loss: 1.0692 - val_accuracy: 0.7681
Epoch 3/5
24/24 [==============================] - ETA: 0s - loss: 1.0699 - accuracy: 0.7453
Epoch 3: saving model to ten_percent_model_checkpoints_weights/checkpoint.ckpt
24/24 [==============================] - 6s 250ms/step - loss: 1.0699 - accuracy: 0.7453 - val_loss: 0.8675 - val_accuracy: 0.7829
Epoch 4/5
24/24 [==============================] - ETA: 0s - loss: 0.9157 - accuracy: 0.7840
Epoch 4: saving model to ten_percent_model_checkpoints_weights/checkpoint.ckpt
24/24 [==============================] - 6s 252ms/step - loss: 0.9157 - accuracy: 0.7840 - val_loss: 0.7320 - val_accuracy: 0.8240
Epoch 5/5
24/24 [==============================] - ETA: 0s - loss: 0.8032 - accuracy: 0.7973
Epoch 5: saving model to ten_percent_model_checkpoints_weights/checkpoint.ckpt
24/24 [==============================] - 6s 246ms/step - loss: 0.8032 - accuracy: 0.7973 - val_loss: 0.6505 - val_accuracy: 0.8388
# Plot model loss curves
plot_loss_curves(history_10_percent_data_aug)

png

png

Looking at these, our model’s performance with 10% of the data and data augmentation isn’t as good as the model with 10% of the data without data augmentation (see model_0 results above), however the curves are trending in the right direction, meaning if we decided to train for longer, its metrics would likely improve.

model 3

Use fine-tuning transfer learning on 10% of the training data with data augmentation.

This means all of the layers in the base model (EfficientNetB0) were frozen during training.

For our next experiment we’re going to switch to fine-tuning transfer learning. This means we’ll be using the same base model except we’ll be unfreezing some of its layers (ones closest to the top) and running the model for a few more epochs.

The idea with fine-tuning is to start customizing the pre-trained model more to our own data.

# Layers in loaded model
model_2.layers
[<keras.engine.input_layer.InputLayer at 0x7f2d42ecbe50>,
 <keras.engine.sequential.Sequential at 0x7f2fb42d4610>,
 <keras.engine.functional.Functional at 0x7f2fb3ff0c10>,
 <keras.layers.pooling.GlobalAveragePooling2D at 0x7f2fb436ac50>,
 <keras.layers.core.dense.Dense at 0x7f2fb3d8bed0>]
for layer in model_2.layers:
  print(layer.trainable)
True
True
False
True
True
model_2.summary()
Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0

 data_augmentation (Sequenti  (None, 224, 224, 3)      0
 al)

 efficientnetb0 (Functional)  (None, None, None, 1280)  4049571

 global_average_pooling_laye  (None, 1280)             0
 r (GlobalAveragePooling2D)

 output_layer (Dense)        (None, 10)                12810

=================================================================
Total params: 4,062,381
Trainable params: 12,810
Non-trainable params: 4,049,571
_________________________________________________________________
# How many layers are trainable in our base model?
print(len(model_2.layers[2].trainable_variables)) # layer at index 2 is the EfficientNetB0 layer (the base model)
0
# Check which layers are tuneable (trainable)
for layer_number, layer in enumerate(base_model.layers):
  if layer.trainable:
    print(layer_number, layer.name, layer.trainable)

Now to fine-tune the base model to our own data, we’re going to unfreeze the top 10 layers and continue training our model for another 5 epochs.

This means all of the base model’s layers except for the last 10 will remain frozen and untrainable. And the weights in the remaining unfrozen layers will be updated during training.

Ideally, we should see the model’s performance improve.

base_model.trainable = True

# Freeze all layers except for the
for layer in base_model.layers[:-10]:
  layer.trainable = False

# Recompile the model (always recompile after any adjustments to a model)
model_2.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(lr=0.0001), # lr is 10x lower than before for fine-tuning
              metrics=["accuracy"])
# Check which layers are tuneable (trainable)
for layer_number, layer in enumerate(base_model.layers):
  if layer.trainable:
    print(layer_number, layer.name, layer.trainable)
227 block7a_se_squeeze True
228 block7a_se_reshape True
229 block7a_se_reduce True
230 block7a_se_expand True
231 block7a_se_excite True
232 block7a_project_conv True
233 block7a_project_bn True
234 top_conv True
235 top_bn True
236 top_activation True

It seems all layers except for the last 10 are frozen and untrainable. This means only the last 10 layers of the base model along with the output layer will have their weights updated during training.

Every time you make a change to your models, you need to recompile them.

In our case, we’re using the exact same loss, optimizer and metrics as before, except this time the learning rate for our optimizer will be 10x smaller than before (0.0001 instead of Adam’s default of 0.001).

We do this so the model doesn’t try to overwrite the existing weights in the pretrained model too fast. In other words, we want learning to be more gradual

print(len(model_2.trainable_variables))
12

We’re going to continue training on from where our previous model finished. Since it trained for 5 epochs, our fine-tuning will begin on the epoch 5 and continue for another 5 epochs.

To do this, we can use the initial_epoch parameter of the fit() method. We’ll pass it the last epoch of the previous model’s training history (history_10_percent_data_aug.epoch[-1]).

# Fine tune for another 5 epochs
fine_tune_epochs = initial_epochs + 5

# Refit the model (same as model_2 except with more trainable layers)
history_fine_10_percent_data_aug = model_2.fit(train_data_10_percent,
                                               epochs=fine_tune_epochs,
                                               validation_data=test_data,
                                               initial_epoch=history_10_percent_data_aug.epoch[-1], # start from previous last epoch
                                               validation_steps=int(0.25 * len(test_data)),
                                               callbacks=[create_tensorboard_callback("transfer_learning", "10_percent_fine_tune_last_10")]) # name experiment appropriately
Saving TensorBoard log files to: transfer_learning/10_percent_fine_tune_last_10/20220908-154932
Epoch 5/10
24/24 [==============================] - 16s 317ms/step - loss: 0.7021 - accuracy: 0.7853 - val_loss: 0.5748 - val_accuracy: 0.8141
Epoch 6/10
24/24 [==============================] - 9s 349ms/step - loss: 0.5649 - accuracy: 0.8347 - val_loss: 0.5267 - val_accuracy: 0.8289
Epoch 7/10
24/24 [==============================] - 7s 246ms/step - loss: 0.5049 - accuracy: 0.8480 - val_loss: 0.4481 - val_accuracy: 0.8470
Epoch 8/10
24/24 [==============================] - 6s 228ms/step - loss: 0.4680 - accuracy: 0.8600 - val_loss: 0.4741 - val_accuracy: 0.8339
Epoch 9/10
24/24 [==============================] - 6s 218ms/step - loss: 0.4201 - accuracy: 0.8640 - val_loss: 0.4773 - val_accuracy: 0.8322
Epoch 10/10
24/24 [==============================] - 6s 233ms/step - loss: 0.3926 - accuracy: 0.8907 - val_loss: 0.4779 - val_accuracy: 0.8438
compare_historys(original_history=history_10_percent_data_aug,
                 new_history=history_fine_10_percent_data_aug,
                 initial_epochs=5)

png

model 4

Use fine-tuning transfer learning on 100% of the training data with data augmentation.

# Download and unzip 10 classes of data with all images
!wget https://gitlab.com/arminny/ml_course_datasets/-/raw/main/10_food_classes_all_data.zip
unzip_data("10_food_classes_all_data.zip")

# Setup data directories
train_dir = "10_food_classes_all_data/train/"
test_dir = "10_food_classes_all_data/test/"
--2022-09-08 15:50:30--  https://gitlab.com/arminny/ml_course_datasets/-/raw/main/10_food_classes_all_data.zip
Resolving gitlab.com (gitlab.com)... 172.65.251.78, 2606:4700:90:0:f22e:fbec:5bed:a9b9
Connecting to gitlab.com (gitlab.com)|172.65.251.78|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 519183241 (495M) [application/octet-stream]
Saving to: ‘10_food_classes_all_data.zip’

10_food_classes_all 100%[===================>] 495.13M  65.5MB/s    in 8.7s

2022-09-08 15:50:39 (56.6 MB/s) - ‘10_food_classes_all_data.zip’ saved [519183241/519183241]
# Setup data inputs
import tensorflow as tf
IMG_SIZE = (224, 224)
train_data_10_classes_full = tf.keras.preprocessing.image_dataset_from_directory(train_dir,
                                                                                 label_mode="categorical",
                                                                                 image_size=IMG_SIZE)

# Note: this is the same test dataset we've been using for the previous modelling experiments
test_data = tf.keras.preprocessing.image_dataset_from_directory(test_dir,
                                                                label_mode="categorical",
                                                                image_size=IMG_SIZE)
Found 7500 files belonging to 10 classes.
Found 2500 files belonging to 10 classes.
# How many images are we working with now?
walk_through_dir("10_food_classes_all_data")
There are 2 directories and 0 images in '10_food_classes_all_data'.
There are 10 directories and 0 images in '10_food_classes_all_data/test'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/pizza'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/chicken_wings'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/grilled_salmon'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/hamburger'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/chicken_curry'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/fried_rice'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/steak'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/ramen'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/sushi'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/ice_cream'.
There are 10 directories and 0 images in '10_food_classes_all_data/train'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/pizza'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/chicken_wings'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/grilled_salmon'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/hamburger'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/chicken_curry'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/fried_rice'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/steak'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/ramen'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/sushi'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/ice_cream'.
# Load model from checkpoint, that way we can fine-tune from the same stage the 10 percent data model was fine-tuned from
model_2.load_weights(checkpoint_path) # revert model back to saved weights
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f2d3fc84310>
# After loading the weights, this should have gone down (no fine-tuning)
model_2.evaluate(test_data)
79/79 [==============================] - 7s 84ms/step - loss: 0.6921 - accuracy: 0.8172





[0.6921055316925049, 0.8172000050544739]
# Check which layers are tuneable in the whole model
for layer_number, layer in enumerate(model_2.layers):
  print(layer_number, layer.name, layer.trainable)
0 input_layer True
1 data_augmentation True
2 efficientnetb0 True
3 global_average_pooling_layer True
4 output_layer True
# Check which layers are tuneable in the base model
for layer_number, layer in enumerate(base_model.layers):
  if layer.trainable:
    print(layer_number, layer.name, layer.trainable)
227 block7a_se_squeeze True
228 block7a_se_reshape True
229 block7a_se_reduce True
230 block7a_se_expand True
231 block7a_se_excite True
232 block7a_project_conv True
233 block7a_project_bn True
234 top_conv True
235 top_bn True
236 top_activation True
# Compile
model_2.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(lr=0.0001), # divide learning rate by 10 for fine-tuning
                metrics=["accuracy"])
# Continue to train and fine-tune the model to our data
fine_tune_epochs = initial_epochs + 5

history_fine_10_classes_full = model_2.fit(train_data_10_classes_full,
                                           epochs=fine_tune_epochs,
                                           initial_epoch=history_10_percent_data_aug.epoch[-1],
                                           validation_data=test_data,
                                           validation_steps=int(0.25 * len(test_data)),
                                           callbacks=[create_tensorboard_callback("transfer_learning", "full_10_classes_fine_tune_last_10")])
Saving TensorBoard log files to: transfer_learning/full_10_classes_fine_tune_last_10/20220908-155056
Epoch 5/10
235/235 [==============================] - 38s 129ms/step - loss: 0.7312 - accuracy: 0.7664 - val_loss: 0.4120 - val_accuracy: 0.8668
Epoch 6/10
235/235 [==============================] - 28s 117ms/step - loss: 0.5914 - accuracy: 0.8031 - val_loss: 0.4009 - val_accuracy: 0.8684
Epoch 7/10
235/235 [==============================] - 27s 113ms/step - loss: 0.5305 - accuracy: 0.8291 - val_loss: 0.3562 - val_accuracy: 0.8734
Epoch 8/10
235/235 [==============================] - 27s 111ms/step - loss: 0.4836 - accuracy: 0.8399 - val_loss: 0.3527 - val_accuracy: 0.8799
Epoch 9/10
235/235 [==============================] - 26s 109ms/step - loss: 0.4496 - accuracy: 0.8525 - val_loss: 0.3574 - val_accuracy: 0.8849
Epoch 10/10
235/235 [==============================] - 26s 108ms/step - loss: 0.4188 - accuracy: 0.8604 - val_loss: 0.3353 - val_accuracy: 0.8931
# How did fine-tuning go with more data?
compare_historys(original_history=history_10_percent_data_aug,
                 new_history=history_fine_10_classes_full,
                 initial_epochs=5)

png

Looks like that extra data helped! Those curves are looking great. And if we trained for longer, they might even keep improving.

Viewing our experiment data on TensorBoard

# View tensorboard logs of transfer learning modelling experiments (should be 4 models)
# Upload TensorBoard dev records
!tensorboard dev upload --logdir ./transfer_learning \
  --name "Transfer learning experiments" \
  --description "A series of different transfer learning experiments with varying amounts of data and fine-tuning" \
  --one_shot # exits the uploader when upload has finished
New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/52ljO1lFRn6S5D1sK1ULTQ/

[2022-09-08T15:55:00] Started scanning logdir.
[2022-09-08T15:55:05] Total uploaded: 162 scalars, 0 tensors, 5 binary objects (4.1 MB)
[2022-09-08T15:55:05] Done scanning logdir.


Done. View your TensorBoard at https://tensorboard.dev/experiment/52ljO1lFRn6S5D1sK1ULTQ/
# View previous experiments
!tensorboard dev list
https://tensorboard.dev/experiment/52ljO1lFRn6S5D1sK1ULTQ/
	Name                 Transfer learning experiments
	Description          A series of different transfer learning experiments with varying amounts of data and fine-tuning
	Id                   52ljO1lFRn6S5D1sK1ULTQ
	Created              2022-09-08 15:55:00 (1 minute ago)
	Updated              2022-09-08 15:55:05 (1 minute ago)
	Runs                 10
	Tags                 5
	Scalars              162
	Tensor bytes         0
	Binary object bytes  4308795
Total: 1 experiment(s)
# Remove previous experiments
!tensorboard dev delete --experiment_id 52ljO1lFRn6S5D1sK1ULTQ
Deleted experiment 52ljO1lFRn6S5D1sK1ULTQ.

References: