In part 1 of this tutorial, we developed some foundation building blocks as classes in our journey to developing a transfer learning solution in PyTorch. Specifically, we built datasets and DataLoaders for train, validation, and testing using PyTorch API, and ended up building a fully connected class on top of PyTorch's core NN module. We also trained and tested a fully connected model to classify handwritten digits in the famous MNIST dataset and achieved impressive results. 

Now we are ready to create our transfer learning class derived from our base network class. Implementing transfer learning will turn out to be incredibly simple now with all the required machinery already in place.

Create a Transfer Learning Class Derived from the Base Class

The transfer learning class is based on the torchvision.models module that contains support for downloading and using several pre-trained network architectures for computer vision. We are going to add support for three models:

  • Densenet121, which we simply call DenseNet
  • Resent34 and ResNet50, respectively

We have the option to use the pre-trained versions of models (passing pre-trained = True which is default anyway) where we obtain the architecture + weights, or just the architectures without weights and train them from scratch. Most of the pre-trained versions available in torchvision.models have been trained on ImageNet with 1,000 output classes. We want to adapt the selected model to our use case. For example, for CIFAR-10, we only need 10 classes, so our output should be set to 10 instead of 1,000.

Each model can be considered as composed of two parts:

  • The convolutional neural network backbone (a CNN architecture with several blocks comprising of convolutions with varying number of filters, non-linearities, max or average pooling layers, batch normalizations, dropout layers, etc.)
  • A head with a fully connected classifier at the output end

In most cases, the output layer does not have any fully connected hidden layers. However, we have the option to replace the classifier layer with our own, and add more hidden layers by replacing the output layer with our own. We may easily use our own FC class (defined in Part 1 of this tutorial) for this purpose. 

On the other hand, we may choose to just change the number of outputs without adding any additional hidden layers. Henceforth, we are going to use our own FC class and replace the original model's output layer with an FC object. This will give us the flexibility to pass any additional hidden layer if we want.

The code for our new class 'TransferNetworkImg' (derived from out base class 'Network') is quite simple. You just have to pay attention to two functions:

  • set_transfer_model which sets the transfer model from torchvision.models

  • set_model_head which sets the FC layer on the model after removing the original classifier or FC layer.

Setting The Classifier

Note that the classifier at the head of each model is named differently in each torchvision model. There are better ways to handle it, such as using a predefined dictionary in a file and loading it and looking up the classifier field for each model type to get the output layer's name. However, in the following code, we are just using simple if/else statements. If you'd like, you are welcome to create your own versions of this class by creating such a dictionary.

Setting the FC model is done in set_model_head. Since we need to call the FC constructor, we need to pass anything that is required to create our FC class object successfully. We are doing that by passing a dictionary called 'head' to our transfer learning class. To successfully create an FC model, we need to pass a fixed number of inputs to its constructor since it is a requirement of our nn.Linear layer used in FC Networks. Luckily, the nn.Linear class in PyTorch stores its number of inputs in an attribute called 'in_features.' We can grab that from the original classifier layer in the transferred model (DenseNet, ResNet, etc.) and pass it as argument to our FC constructor.

Freezing And Unfreezing Layers

When using transfer learning models, it is important to decide whether we want to retrain all the layers (including convolutions and fully-connected) from scratch on our dataset. For reasonably large datasets such as CIFAR-10, it makes sense to retrain the whole network. However, please note that retraining all layers does not mean we are going to start from random weights. We will start with pre-trained weights of each layer and continue from there, but we will be calculating the gradients for all layers and updating all weights. In other words, the model starts learning while keeping the knowledge it gained for identifying images when it trained on the previous dataset (ImageNet in most cases). Think of it like a child that we trained on a specific thing and we don't want it to throw away all knowledge when continuing to look at new data.

On the other hand, we may want to keep the weights frozen for the backbone while retraining on the head only. This is a common scenario when we have trained the network on new data for a while and now our backbone knows both ImageNet and our new dataset (CIFAR-10 in our case). In the last case, we may want to do only predictions and keep all weights including the backbone and the head frozen. This is only good for prediction and evaluation, and not for training, since there is no point in training if we don't want to do back propagation and not update anything.

We will now write a function to freeze weights while keeping the head unfrozen by default using the PyTorch tensor's requires_grad flag. This flag is available in all tensors and we want to set it as True or False for weight tensors (which can be obtained via parameters() method of any model derived from nn.Module).

Adding Support For Freeze And Unfreeze In Our Base Class

We need to add support for freeze and unfreeze in our base class.

In [ ]: 

class Network(nn.Module):
   ...
   def freeze(self):
        for param in self.model.parameters():
            param.requires_grad = False
        
        
    def unfreeze(self):
        for param in self.model.parameters():
            param.requires_grad = True

Transfer Learning Class

In [ ]: 

from torchvision import models
  • set_model_params() calls the parent's method as before and sets additional attributes specific to this class (head and model_type set to 'transfer')
  • freeze() first freezes all parameters by calling the base class's freeze() method which we have added to the network class and then unfreezes the head's (classifier attributes) parameters based on the passed flag. Note that we are calling our head as classifier. We would need to add more code if we ever want to handle the case of regression as well in future. We have added two methods, freeze() and unfreeze(), to our base class appropriately.
  • set_model_head() calls the FC constructor using the head dictionary. It grabs the in_features from the appropriate attribute of the original model's classifier (FC layer).
  • We need to check if the model was saved and loaded from a checkpoint because, in the later case, the model's head-end object shall have num_inputs attribute instead of the original in_features because it would contain our FC object and that has num_inputs in place of in_features.

In [ ]: 

class TransferNetworkImg(Network):
    def __init__(self,
                 model_name='DenseNet',
                 lr=0.003,
                 criterion_name ='NLLLoss',
                 optimizer_name = 'Adam',
                 dropout_p=0.2,
                 pretrained=True,
                 device=None,
                 best_accuracy=0.,
                 best_accuracy_file ='best_accuracy.pth',
                 chkpoint_file ='chkpoint_file',
                 head={}):

        
        super().__init__(device=device)
        
        self.model_type = 'transfer'
        
        self.set_transfer_model(model_name,pretrained=pretrained)    
        
        if head is not None:
            self.set_model_head(model_name = model_name,
                                 head = head,
                                 optimizer_name = optimizer_name,
                                 criterion_name = criterion_name,
                                 lr = lr,
                                 dropout_p = dropout_p,
                                 device = device
                                )
            
        self.set_model_params(criterion_name,
                              optimizer_name,
                              lr,
                              dropout_p,
                              model_name,
                              best_accuracy,
                              best_accuracy_file,
                              chkpoint_file,
                              head)
        def set_model_params(self,criterion_name,
                         optimizer_name,
                         lr,
                         dropout_p,
                         model_name,
                         best_accuracy,
                         best_accuracy_file,
                         chkpoint_file,
                         head):
        
        print('Transfer: best accuracy = {:.3f}'.format(best_accuracy))
        
        super(TransferNetworkImg, self).set_model_params(
                                              criterion_name,
                                              optimizer_name,
                                              lr,
                                              dropout_p,
                                              model_name,
                                              best_accuracy,
                                              best_accuracy_file,
                                              chkpoint_file
                                              )
        self.head = head
        self.model_type = 'transfer'
    def forward(self,x):
        return self.model(x)
        
    def get_model_params(self):
        params = super(TransferNetworkImg, self).get_model_params()
        params['head'] = self.head
        params['model_type'] = self.model_type
        params['device'] = self.device
        return params
    def freeze(self,train_classifier=True):
        super(TransferNetworkImg, self).freeze()
        if train_classifier:
            for param in self.model.classifier.parameters():
                param.requires_grad = True
            
                
    def set_transfer_model(self,mname,pretrained=True):   
        self.model = None
        if mname.lower() == 'densenet':
            self.model = models.densenet121(pretrained=pretrained)
            
        elif mname.lower() == 'resnet34':
            self.model = models.resnet34(pretrained=pretrained)
            
        elif mname.lower() == 'resnet50':
            self.model = models.resnet50(pretrained=pretrained)
              
        if self.model is not None:
            print('set_transfer_model: self.Model set to {}'.format(mname))
        else:
            print('set_transfer_model:Model {} not supported'.format(mname))
    def set_model_head(self,
                        model_name = 'DenseNet',
                        head = {'num_inputs':128,
                                'num_outputs':10,
                                'layers':[],
                                'class_names':{}
                               },
                         optimizer_name = 'Adam',
                         criterion_name = 'NLLLoss',
                         lr = 0.003,
                         dropout_p = 0.2,
                         device = None):
        
        self.num_outputs = head['num_outputs']
        
        if model_name.lower() == 'densenet':
            if hasattr(self.model,'classifier'):
                in_features =  self.model.classifier.in_features
            else:
                in_features = self.model.classifier.num_inputs
                
            self.model.classifier = FC(num_inputs=in_features,
                                       num_outputs=head['num_outputs'],
                                       layers = head['layers'],
                                       class_names = head['class_names'],
                                       non_linearity = head['non_linearity'],
                                       model_type = head['model_type'],
                                       model_name = head['model_name'],
                                       dropout_p = dropout_p,
                                       optimizer_name = optimizer_name,
                                       lr = lr,
                                       criterion_name = criterion_name,
                                       device=device
                                      )
            
        elif model_name.lower() == 'resnet50' or model_name.lower() == 'resnet34':
            if hasattr(self.model,'fc'):
                in_features =  self.model.fc.in_features
            else:
                in_features = self.model.fc.num_inputs
                
            self.model.fc = FC(num_inputs=in_features,
                               num_outputs=head['num_outputs'],
                               layers = head['layers'],
                               class_names = head['class_names'],
                               non_linearity = head['non_linearity'],
                               model_type = head['model_type'],
                               model_name = head['model_name'],
                               dropout_p = dropout_p,
                               optimizer_name = optimizer_name,
                               lr = lr,
                               criterion_name = self.criterion_name,
                               device=device
                              )
         
        self.head = head
        
        print('{}: setting head: inputs: {} hidden:{} outputs: {}'.format(model_name,
                                                                   in_features,
                                                                   head['layers'],
                                                                   head['num_outputs']))
    
    def _get_dropout(self):
        if self.model_name.lower() == 'densenet':
            return self.model.classifier._get_dropout()
        
        elif self.model_name.lower() == 'resnet50' or self.model_name.lower() == 'resnet34':
            return self.model.fc._get_dropout()
        
            
    def _set_dropout(self,p=0.2):
        
        if self.model_name.lower() == 'densenet':
            if self.model.classifier is not None:
                print('DenseNet: setting head (FC) dropout prob to {:.3f}'.format(p))
                self.model.classifier._set_dropout(p=p)
                
        elif self.model_name.lower() == 'resnet50' or self.model_name.lower() == 'resnet34':
            if self.model.fc is not None:
                print('ResNet: setting head (FC) dropout prob to {:.3f}'.format(p))
                self.model.fc._set_dropout(p=p)

Adding Support For Transfer Learning Model To Load_chkpoint Utility

We need to add the case for our TransferNetworkImg case in load_chkpoint function. The main addition is the storage and retrieval of head along with other params and also adding support for passing the retrieved head to the constructor.

In [ ]: 

def load_chkpoint(chkpoint_file):
        
    restored_data = torch.load(chkpoint_file)

    params = restored_data['params']
    print('load_chkpoint: best accuracy = {:.3f}'.format(params['best_accuracy']))  
    
    if params['model_type'].lower() == 'classifier':
        net = FC( num_inputs=params['num_inputs'],
                  num_outputs=params['num_outputs'],
                  layers=params['layers'],
                  device=params['device'],
                  criterion_name = params['criterion_name'],
                  optimizer_name = params['optimizer_name'],
                  model_name = params['model_name'],
                  lr = params['lr'],
                  dropout_p = params['dropout_p'],
                  best_accuracy = params['best_accuracy'],
                  best_accuracy_file = params['best_accuracy_file'],
                  chkpoint_file = params['chkpoint_file'],
                  class_names =  params['class_names']
          )
    elif params['model_type'].lower() == 'transfer':
        net = TransferNetworkImg(criterion_name = params['criterion_name'],
                                 optimizer_name = params['optimizer_name'],
                                 model_name = params['model_name'],
                                 lr = params['lr'],
                                 device=params['device'],
                                 dropout_p = params['dropout_p'],
                                 best_accuracy = params['best_accuracy'],
                                 best_accuracy_file = params['best_accuracy_file'],
                                 chkpoint_file = params['chkpoint_file'],
                                 head = params['head']
                               )
    
        


    net.load_state_dict(torch.load(params['best_accuracy_file']))

    net.to(params['device'])
    
    return net

Train Two Different Pre-trained, Transferred Models on CIFAR-10 Dataset

Before we move on to testing and experimentation, we should move our code to .py files and import it as modules. This makes it much more convenient and we don't have to rerun all the notebook cells every time we reset the Python kernel of our notebook to empty the GPU memory for a fresh run.

I have created four files:

  1. model.py (contains the core Network class)
  2. fc.py (contains the FC class)
  3. cv_model.py (contains the TransferNetworkImg class)
  4. utils.py (contains all the utility functions not belonging to any class)

We create these files in a folder called mylib and import all of them.

We also should use a special directive of our Jupyter notebook that makes it monitor and reload all the imported files in a cell that change on the disk. This will come in handy if we modify any of the files for any reason, e.g. to fix a bug.

In [ ]: 

from mylib.utils import *
from mylib.model import *
from mylib.cv_model import *
from mylib.fc import *
from mylib.chkpoint import *

%load_ext autoreload
%autoreload 2

Testing And Experimentation

In the following cells, we are going to perform the following steps in a sequence:

  • Create our classes dictionary as well as the head dictionary to pass to the transfer learning object's constructor
  • Create a transfer learning object for DenseNet
  • Unfreeze it
  • Fit it to train for 3 epochs
  • Save the checkpoint
  • Load it back into another variable
  • Unfreeze again and repeat with 3 more epochs
  • Save the checkpoint again
  • Reload into another variable
  • Freeze this time and retrain for 3 more epochs
  • Save the model again

In [6]: 

classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck']
class_dict = {k:v for k,v in enumerate(classes)}

head={
       'num_outputs':10,
       'layers':[],
        'class_names':class_dict,
         'non_linearity':'relu',
         'model_type':'classifier',
         'model_name':'FC'
     }

In [ ]: 

transfer_densenet = TransferNetworkImg(model_name='DenseNet',
                   optimizer_name = 'Adadelta',               
                   best_accuracy_file ='densenet_best_accuracy_cifar10.pth',
                   chkpoint_file ='densenet_cifar10_chkpoint_file',
                   head = head
                   )

In [23]: 

transfer_densenet.unfreeze()

In [ ]: 

transfer_densenet.fit(trainloader,validloader,epochs=3,print_every=200)

 updating best accuracy: previous best = 80.490 new best = 85.920

In [25]: 

transfer_densenet.save_chkpoint()

Out [25]: 

get_model_params: best accuracy = 85.920
get_model_params: chkpoint file = densenet_cifar10_chkpoint_file
checkpoint created successfully in densenet_cifar10_chkpoint_file

In [5]: 

transfer_densenet2 = load_chkpoint('densenet_cifar10_chkpoint_file')

Out [25]: 

load_chkpoint: best accuracy = 85.920
/home/farhan/.conda/envs/dreamai/lib/python3.7/site-packages/torchvision-0.2.1-py3.7.egg/torchvision/models/densenet.
py:212: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaiming_normal_. set_transfer_model: self.Model set to DenseNet setting optim Ada Delta DenseNet: setting head: inputs: 1024 hidden:[] outputs: 10 Transfer: best accuracy = 85.920 setting optim Ada Delta

In [7]: 

transfer_densenet2.unfreeze()

In [ ]: 

transfer_densenet2.fit(trainloader,validloader,epochs=3,print_every=200)

In [9]: 

transfer_densenet2.save_chkpoint()

Out [9]: 

get_model_params: best accuracy = 90.770
get_model_params: chkpoint file = densenet_cifar10_chkpoint_file
checkpoint created successfully in densenet_cifar10_chkpoint_file

This time we have crossed 90% accuracy after unfreezing and training for another 3 epochs.

In [4]: 

transfer_densenet3 = load_chkpoint('densenet_cifar10_chkpoint_file')

Out [4]: 

load_chkpoint: best accuracy = 90.770
/home/farhan/.conda/envs/dreamai/lib/python3.7/site-packages/torchvision-0.2.1-py3.7.egg/torchvision/models/densenet.
py:212: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaiming_normal_. set_transfer_model: self.Model set to DenseNet setting optim Ada Delta DenseNet: setting head: inputs: 1024 hidden:[] outputs: 10 Transfer: best accuracy = 90.770 setting optim Ada Delta

In [ ]: 

transfer_densenet3.freeze()
transfer_densenet3.fit(trainloader,validloader,epochs=3,print_every=200)

updating best accuracy: previous best = 94.940 new best = 95.080

In [6]: 

transfer_densenet3.save_chkpoint()

Out [6]: 

get_model_params: best accuracy = 95.080
get_model_params: chkpoint file = densenet_cifar10_chkpoint_file
checkpoint created successfully in densenet_cifar10_chkpoint_file

After 9 epochs—6 with unfreeze and 3 with freeze—we are at 95.08%.

Let's repeat the same steps with ResNet34.

In [7]: 

transfer_resnet = TransferNetworkImg(model_name='ResNet34',
                   optimizer_name = 'Adadelta',               
                   best_accuracy_file ='resnet34_best_accuracy_cifar10.pth',
                   chkpoint_file ='resnet34_cifar10_chkpoint_file',
                   head = head
                   )

Out [6]: 

set_transfer_model: self.Model set to ResNet34
setting optim Ada Delta
ResNet34: setting head: inputs: 512 hidden:[] outputs: 10
Transfer: best accuracy = 0.000
setting optim Ada Delta

In [ ]: 

transfer_resnet.unfreeze()
transfer_resnet.fit(trainloader,validloader,epochs=3,print_every=200)

updating best accuracy: previous best = 82.700 new best = 86.040

In [9]: 

transfer_resnet.save_chkpoint()

Out [9]: 

get_model_params: best accuracy = 86.040
get_model_params: chkpoint file = resnet34_cifar10_chkpoint_file
checkpoint created successfully in resnet34_cifar10_chkpoint_file

In [5]: 

transfer_resnet2 = load_chkpoint('resnet34_cifar10_chkpoint_file')

Out [5]: 

load_chkpoint: best accuracy = 86.040
set_transfer_model: self.Model set to ResNet34
setting optim Ada Delta
ResNet34: setting head: inputs: 512 hidden:[] outputs: 10
Transfer: best accuracy = 86.040
setting optim Ada Delta

In [ ]: 

transfer_resnet2.unfreeze()
transfer_resnet2.fit(trainloader,validloader,epochs=3,print_every=200)

updating best accuracy: previous best = 89.400 new best = 89.640

In [7]: 

transfer_resnet2.save_chkpoint()

Out [7]: 

get_model_params: best accuracy = 89.640
get_model_params: chkpoint file = resnet34_cifar10_chkpoint_file
checkpoint created successfully in resnet34_cifar10_chkpoint_file

In [4]: 

transfer_resnet3 = load_chkpoint('resnet34_cifar10_chkpoint_file')

Out [4]: 

load_chkpoint: best accuracy = 89.640
set_transfer_model: self.Model set to ResNet34
setting optim Ada Delta
ResNet34: setting head: inputs: 512 hidden:[] outputs: 10
Transfer: best accuracy = 89.640
setting optim Ada Delta

In [ ]: 

transfer_resnet3.freeze()
transfer_resnet3.fit(trainloader,validloader,epochs=3,print_every=200)

updating best accuracy: previous best = 94.280 new best = 94.580

In [6]: 

transfer_resnet3.save_chkpoint()

Out [6]: 

get_model_params: best accuracy = 94.580
get_model_params: chkpoint file = resnet34_cifar10_chkpoint_file
checkpoint created successfully in resnet34_cifar10_chkpoint_file

Evaluate and Predict on Test Set with Individual Models and Ensemble

We load both files for the final DenseNet and ResNet models and evaluate on the test set.

In [ ]: 

transfer_densenet = load_chkpoint('densenet_cifar10_chkpoint_file')

Transfer: best accuracy = 95.08

In [7]: 

transfer_densenet.evaluate(testloader)

Out [7]: 

(93.0,
 [('airplane', 94.69999999999999),
  ('automobile', 96.5),
  ('bird', 90.4),
  ('cat', 84.5),
  ('deer', 94.3),
  ('dog', 89.5),
  ('frog', 94.5),
  ('horse', 94.69999999999999),
  ('ship', 94.5),
  ('truck', 96.39999999999999)])

In [ ]: 

transfer_resnet = load_chkpoint('resnet34_cifar10_chkpoint_file')

Transfer: best accuracy = 94.580

In [10]: 

transfer_resnet.evaluate(testloader)

Out [10]: 

(92.52,
 [('airplane', 94.5),
  ('automobile', 96.8),
  ('bird', 89.3),
  ('cat', 82.89999999999999),
  ('deer', 94.0),
  ('dog', 87.2),
  ('frog', 95.6),
  ('horse', 95.0),
  ('ship', 94.19999999999999),
  ('truck', 95.7)])

Ensembling Multiple Models To Improve Accuracy

We have tested and evaluated two different "transferred" models. Both models seem to perform almost equally well on this dataset. We might wonder what would happen if we somehow combine the results of both models to make our final prediction. Combining two or more models together is called ensemble learning.

You might have heard the term in traditional machine learning approaches using random forests and gradient boosted decision trees. Here, we are talking about using two or more deep learning models to try to achieve better accuracy. For more information, please see Elements of Statistical Learning.

The intuition behind ensembling is that one model might have misclassified a specific example while predicting, but one or more of the others might have predicted correctly. Our final prediction accuracy would likely improve if we somehow combine the predictions.

One simple way to combine the predictions is to give weights to each model's predictions based on heuristics, such as:

  • Simple averaging of predicted values (e.g. probabilities) of different ensembles
  • Assigning different weights to each member of an ensemble based on its performance on the validation set
  • Assign weights based on our experience with the model in general on multiple datasets. If one model performs better in majority of cases, we should give its prediction more weight.

A generalized way to create an ensemble could be to create an ensemble model class derived from our base network class just like for transfer learning and FC. We don't need to have fit and train methods since the members of our ensemble are expected to be pre-trained outside the ensemble itself. However, implementing the predict and evaluate methods in the ensemble class makes sense.

We could pass the model objects to the ensemble along with their weights while constructing it. In a highly desirable scenario, we would like a machine learning model of some sort to learn those weights themselves, but to keep things simpler (at least in this tutorial), we will pass weights by using heuristics as discussed. Then, we can write, evaluate, and predict methods so that they call each individual member's corresponding methods and multiply them by the model's given weight, and add the weighted predictions to make the final one.

Below is the relevant code of such a class.   

In [ ]: 

class EnsembleModel(Network):
    def __init__(self,models):
        self.criterion = None
        super().__init__()
        self.models = models
         if sum(model[1] for model in models) != 1.0:
            raise ValueError('Weights of Ensemble must sum to 1')
            
        
    def evaluate(self,testloader,metric='accuracy'):
        from collections import defaultdict
        #evaluations = defaultdict(float)
        #num_classes = self.models[0][0].num_outputs
        class_correct = defaultdict(int)
        class_totals = defaultdict(int)

        class_names = self.models[0][0].class_names  
        with torch.no_grad():
            
            for inputs, labels in testloader:
                ps_list = []  
            for model in self.models:
                    model[0].eval()
                    model[0].to(model[0].device)
                    inputs, labels = inputs.to(model[0].device), labels.to(model[0].device)
                    outputs = model[0].forward(inputs)
                    ps = torch.exp(outputs)
                    ps = ps * model[1] # multiply by model's weight
                    ps_list.append(ps)
                    
                final_ps = ps_list[0]
                for i in range(1,len(ps_list)):
                    final_ps = final_ps + ps_list[i]
                _, final_preds = torch.max(final_ps, 1)
                #print(final_preds)
                update_classwise_accuracies(final_preds,labels,class_correct,class_totals)
        
       
        
        return get_accuracies(class_names,class_correct,class_totals)
   def predict(self,inputs,topk=1):
        ps_list = []  
        for model in self.models:
            model[0].eval()
            model[0].to(model[0].device)
            with torch.no_grad():
                inputs = inputs.to(model[0].device)
                outputs = model[0].forward(inputs)
                ps_list.append(torch.exp(outputs)*model[1])
       
        final_ps = ps_list[0]
        for i in range(1,len(ps_list)):
            final_ps = final_ps + ps_list[i]
        
        _,top = final_ps.topk(topk, dim=1)
            
        return top
    
    def forward(self,x):
        outputs = []
        for model in self.models:
             outputs.append(model[0].forward(x))
        return outputs

The constructor expects a list of models and each member of the list to be a tuple. The first element of tuple must be the pre-trained model object and the second the weight of the model. The weights must sum to 1 so that our predictions are a weighted sum of the predictions of all models for each class. We call the forward method of each model inside the loop and multiply the predicted values by the model's weight. Then we build a list of the predicted probabilities and then go through the list performing the sum of those values. Finally, we get the weighted sum of all predictions as the ensemble predicted values for each class and then their max just like we do for our regular models to get the final prediction for this image.

Evaluating With Ensemble Models

Let's create an ensemble object and give 0.5 weight to both of our models since they don't differ by much, and observe the improvement in performance (if any).

In [9]: 

ensemble = EnsembleModel([(transfer_densenet,0.5),(transfer_resnet,0.5)])
ensemble.evaluate(testloader)

Out [9]: 

(94.22,
 [('airplane', 95.6),
  ('automobile', 97.8),
  ('bird', 92.0),
  ('cat', 86.6),
  ('deer', 96.0),
  ('dog', 89.8),
  ('frog', 96.2),
  ('horse', 96.5),
  ('ship', 95.3),
  ('truck', 96.39999999999999)])

We observe that there is significant improvement when using the ensemble method.

Predict on Kaggle's Much Larger Test Set

The original CIFAR-10 dataset has 60,000 images, 50,000 in the train set and 10,000 in the test set. However, Kaggle has provided a huge dataset of 300,000 images to test CIFAR-10. Here is what the Kaggle website says about these images:

"To discourage certain forms of cheating (such as hand labeling) we have added 290,000 junk images in the test set. These images are ignored in the scoring. We have also made trivial modifications to the official 10,000 test images to prevent looking them up by file hash. These modifications should not appreciably affect the scoring. You should predict labels for all 300,000 images."

Unzipping this test dataset once it has downloaded takes an enormous amount of time (several hours on my machine).

Creating Our Own Custom Dataset For Kaggle Test Images

In order to handle this dataset, we have written our own custom dataset class derived from the base dataset class of PyTorch. We then pass this dataset object to the PyTorch DataLoader. This makes handling this large dataset much more convenient. It also gives us good practice in creating our own dataset for images.

Below is the code for our own custom dataset class. The code is pretty straightforward. A few things to note:

  • A typical customer dataset contains an " init " method, a " getitem" method to convert it into an iterator, and an " len " method to make the Python's len() function work on the dataset.

  • Our custom dataset class assumes that information about the dataset image files is contained in a CSV file.

  • The image ids are contained in column 0 of the file while image filename (path) is in column 1 and image's label (if available in the file) as a text, e.g. bird, plane etc. is contained in column 2.
  • Our Kaggle test set has no labels since Kaggle uses it for scoring the competition and, therefore, does not provide labels for test sets.
  • We use Pandas Dataframe to handle the CSV file and then create the actual image set.

In [7]: 

import pandas as pd
from PIL import Image

class ImageDataset(Dataset):

    def __init__(self, csv_path,
                 transforms=None,
                 labels_=False):
       
        self.labels = None
        self.transforms = None
        self.df = pd.read_csv(csv_path)
        
        self.ids = np.asarray(self.df.iloc[:, 0])
        
        self.images = np.asarray(self.df.iloc[:, 1])
        
        if labels:
            self.labels = np.asarray(self.df.iloc[:, 1])
            self.data_len = len(self.df.index)
        if transforms is not None:
            self.transforms = transforms
            
        #print(self.data_len)
    def __getitem__(self, index):
        
        image_name = self.images[index]
        id_ = self.ids[index]
        img_ = Image.open(image_name)
        if self.transforms is not None:
            img_ = self.transforms(img_)[:3,:,:]
            label = 0
        if self.labels is not None:
            label = self.labels[index]
            return (id_,img_,label)

    def __len__(self):
        return self.data_len

We read the CSV file in a Pandas DataFrame and extract image ids, image file-paths, and labels (if present) from its columns, assuming that they are contained in columns 0, 1, and 2 respectively. In the __ getitem__ we have to read a single image from its file according to the index requested in. Remember that this function shall be called by the DataLoader when creating a batch. It would call it for each iteration of the batch construction loop. We return the image and position index on each such call.

We also apply transforms on each channel of the image. [:3,:,:] means all rows and columns of all three channels if any transforms were given in the constructor. Just to keep the API consistent and always return a two-valued tuple, we return 0 as label with each image even if there is no label. The assumption is that the caller of this method knows if a label is to be expected.

Extracting Metadata Of The Test Set Into A CSV File

In order to use our custom dataset class with a PyTorch DataLoader, we need to create the CSV file we want to pass to the dataset constructor. Kaggle hasn't really given us a CSV file—all we have is a folder of images. To create the CSV, file we need to parse the image file names and store the image names (ids) in the first column, and the path in the second column of our CV file.

Below is a function that uses Python's glob module, with Pandas Dataframe and some Python string searching functions to create such a CSV file. 

In [ ]: 

def create_csv_from_folder(folder_path,outfile,cols=['id','path']):
    
    f = glob.glob(folder_path+'/*.*')
    
    ids = []
    for elem in f:
        t = elem[elem.rfind('/')+1:]
        ids.append(t[:t.rfind('.')])
    data = {cols[0]:ids,cols[1]:f}    
    df = pd.DataFrame(data,columns=cols)
    df.to_csv(outfile,index=False)

Using this function, we can create our CSV file, passing it the folder and the desired output CSV filename as arguments. We have placed our test images from Kaggle in the cifar10-test folder.

In [ ]: 

create_csv_from_folder('cifar10-test','cifar10-test.csv')

We can test our code to see a sample of contents from the CSV file.

In [8]: 

df = pd.read_csv('cifar10-test.csv')
df[:10]

Out [8]: 

  id path
0 11798 cifar10-test/11798.png
1 298292 cifar10-test/298292.png
2 222281 cifar10-test/222281.png
3 165990 cifar10-test/165990.png
4 100937 cifar10-test/100937.png
5 197039 cifar10-test/197039.png
6 59773 cifar10-test/59773.png
7 52364 cifar10-test/52364.png
8 240916  cifar10-test/240916.png
9 244203  cifar10-test/244203.png

In [9]: 

len(df.index)

Out [9]: 

300000

Testing and Preparing the submission file for kaggle test set 

Now all we need to do is create our custom dataset and a DataLoader to perform evaluation on it using our models.

In [15]: 

test_transform_cifar10 = transforms.Compose([transforms.Resize((224,224)),
                                     transforms.ToTensor(),
                                     transforms.Normalize(cifar10_mean,cifar10_std)
                                    ])

cifar10_test_dset = ImageDataset('cifar10-test.csv',transforms=test_transform_cifar10)
len(cifar10_test_dset)

Out [15]: 

300000

As expected, the dataset has 300,000 images.

In [17]: 

cifar10_test_dset.df[:10]

Out [17]: 

  id path
0 11798 cifar10-test/11798.png
1 298292 cifar10-test/298292.png
2 222281 cifar10-test/222281.png
3 165990 cifar10-test/165990.png
4 100937 cifar10-test/100937.png
5 197039 cifar10-test/197039.png
6 59773 cifar10-test/59773.png
7 52364 cifar10-test/52364.png
8 240916  cifar10-test/240916.png
9  244203  cifar10-test/244203.png

In [18]: 

cifar10_test_testloader = DataLoader(cifar10_test_dset, batch_size=50,num_workers=0)

In [19]: 

dataiter = iter(cifar10_test_testloader)
id_,images_,_ = dataiter.next()

In [20]: 

images_.shape

Out [20]: 

torch.Size([50, 3, 224, 224]) 

As expected, our DataLoader's one batch has correct dimensions.

To submit to Kaggle, we need to create a CSV file with image-id (name) in the first column and label in the second (see the competition webpage here for the sample submission file).

The easiest way to do that is again to use Pandas DataFrame to prepare the results and the file.

Below we have the standard DataLoader loop:

  • We get the next batch of data. Remember that our dataset object is returning a 3-tuple (id,image,label). We are ignoring the label in this case since we are always returning 0.
  • We first predict using our ensemble, convert the predictions tensor back to CPU, then convert it to numpy, flatten it using numpy's own flatten method available on numpy arrays, and finally convert to a simple Python list. This gives us the predicted classes for the whole batch.
  • We keep collecting the predictions in our list and the corresponding labels in another list (using a lookup of image_ids into our class dictionary).
  • We finally create a Pandas DataFrame with two required columns and write it as a CSV file to disk. To match the exact required format, we set the index to False.
  • Finally, we sort the values according to ids as the sample file shows us and rewrite the CSV file.

In [ ]: 

predictions = []
image_ids = []
for ids_,images_,_ in cifar10_test_testloader:
    preds_ = ensemble.predict(images_).cpu().numpy().flatten().tolist()
    predictions += [class_dict[pred] for pred in preds_]
    image_ids += ids_.numpy().flatten().tolist()

pd.DataFrame({'id':image_ids,'label':predictions}).to_csv('submission.csv',index=False)

In [ ]: 

df = pd.read_csv('submission.csv')
df = df.sort_values('id')
df.to_csv('submission.csv',index=False)

Conclusion

When I submitted to Kaggle, I got the following results:

  • 0.947 with an ensemble of three models (I added support for another model, not shown in this tutorial).
  • 0.945 with an ensemble of the two models discussed in this tutorial. This would have gotten me to 3rd place on Kaggle and also pretty high on the benchmarks published on several sites for CIFAR-10.
  • It took me less than 75 mins to train all the models and create the ensemble.
  • Of course, if you play around and spend a bit more time, I am sure you can beat the top benchmark.
  • During the course of trying to achieve high accuracy on CIFAR-10, we created a reusable set of classes and utility functions that could be used on any image classification task.

Key Takeaways for High Accuracy

  • Use the mean and std of the image set itself instead of ImageNet to normalize if you have a large enough image set.
  • Use an adaptive learning rate algorithm as optimizer. Adadelta worked best for me on CIFAR-10 as well as flowers, and many others so far.
  • Unfreeze and train for 3 to 5 epochs. Then save, reset the notebook, unfreeze again and retrain for 3 to 5 epochs, then freeze and train for another 2 to 3 epochs, and you should be done.
  • Experiment with additional FC layers and see if it makes any difference.
  • Train multiple models, save them and use them as an ensemble to make final predictions.
  • Perhaps, most importantly, consider applied machine learning to be as much as a software engineering discipline as a statistical and mathematical one. This means that most of the software engineering best practices still apply. Thinking about ML problems this way will lead you to design good reusable components like classes and utility functions to create your own API, even if you are given some code to start with.
  • Refactor the code (if given to you) and put it inside your own classes and modify it accordingly to create clean interfaces and abstractions. This will save huge duplication of effort on other datasets and you may also be able to use your classes as components in other larger projects and tasks.
Farhan Zaidi
Author
Farhan Zaidi

Farhan Zaidi is an artificial intelligence enthusiast and founder of a training and consultancy firm in the machine learning and deep learning space. He has over 25 years of experience as a software architecture, designer, and developer. Currently, he is involved in building a recommendation engine for an IPTV system and developing computer vision based systems that use deep learning for smart cities, security, and surveillance applications.

adam-pocock-headshot
Technical Reviewer
Adam Pocock

Adam Pocock is a researcher in the Machine Learning Research Group at Oracle Labs. He has a PhD in Computer Science focusing on feature selection, and since joining Oracle has worked on scalable bayesian inference, word embeddings, and efficient implementations of feature selection and other ML algorithms.