PhD course: Deep Learning with Memory, Federico Becattini

Memory Networks are models equipped with a storage component where information can be written and successively retrieved for any purpose. Simple forms of memory networks like the popular recurrent neural networks (RNN), LSTMs or GRUs, are largely used to process sequential data such as text or video frames, however they have limited storage capabilities and struggle to address long term dependencies. In contrast, recent works, starting from Memory Augmented Neural Networks, overcome storage and computational limitations with the addition of an external element-wise addressable memory. Differently from RNNs, in MANNs, state to state transitions are obtained through read/write operations and a set of independent states is maintained.
Closely related to memory is the concept of attention, since it allows to select individual pieces of information from a storage. Alternative solutions for modeling sequences in fact have been recently developed exploiting attention, such as Transformers.
This course aims at providing an overview of memory-based techniques and their applications. It will cover an explanation of the basic concepts behind recurrent neural networks and will then delve into the advanced details of memory augmented neural networks, their structure and how such models can be trained. A parallelism with transformer networks will also be presented, stressing differences and limitations.
The course will also have hands-on sessions with practical sessions to understand how to implement and train memory-based neural networks.