Survival loss: A neuron death regularizer

Workshop Conference

Date

November 20, 2020

Source / Conference / Filed

Workshop of Physical Agents (WAF)

Authors

Emilio Almazan
Javier Tovar
Alejandro de la Calle

Abstract

We found that combining the L2 regularizer with Adam kills up to 60% of filters in ResNet-110 trained on CIFAR-100 as opposed to combining L2 with Momentum. It does not have a significant impact in terms of accuracy though, where both reach similar values. However, we found that this can be a serious issue if the impaired model is used as a pre-trained model for another more complex dataset (e.g. larger number of categories). This situation actually happens in continual learning. In this paper we conduct a study on the impact of dead filters in continual learning when the dataset increases its difficulty over time and more power from the network is required. Furthermore, we propose a new regularization term referred to as survival loss, that complements L2 to avoid filters to die when combined with Adam. We show that the survival loss improves accuracy in a simulated continual learning set-up, with the prospect of higher improvements as more iterations arrive.