PhDInfo Seminar: Francesco Marchetti, “The Memory in Deep learning: from Recurrent Neural Network to Memory Network”

(the slides of this and other presentations are available in theĀ PhDinfo restricted area)

PhDinfo seminar held on April 16th, 2021 by Francesco Marchetti


The memory module in Deep Learning architecture is fundamental to solve sequential and
temporal task such as speech recognition, language modelling, sentiment analysis.
The Recurrent Neural Networks (RNN), analysing recurrently each element of a sequential data,
are able to store and compact the information in a compressed hidden state. Because of the
vanishing gradient, the memory in RNN have a short-term capacity.
To remedy this, a new type of recurrent neural network has been developed, Long-Short Term
Memory (LSTM). In addition to the hidden state, a cell state is able to memorize the information
for a longer time.
Both in RNN and in LSTM, the memory is a single dense vector and the ability to address individual
elements is lacking.
To overcome these problems, Memory Networks have been introduced to combine inference
components with a long-term memory. The memory has matrix-shaped structure and is element-
wise addressable.
Neural Turing Machines are the first to use this type of memory to solve algorithmic task.
Different Memory Networks have been developed to solve real task such as MemN2N for
Question Answering and MANTRA for trajectory prediction.