Topic 8: What is LSTM and xLSTM?
we review what we know about LSTM networks and explore their new promising
There is an opinion that the recent success of ChatGPT and transformer-based Large Language Models (LLMs) has sucked out the majority of resources from other deep learning areas, including Recurrent Neural Networks (RNNs). The impressive achievements and commercial potential of LLMs have heavily influenced research priorities, media coverage, and educational trends, leaving fields like RNNs, computer vision, reinforcement learning, and others with much less attention. That’s why the new paper introducing Extended Long Short-Term Memory (xLSTM) excited the ML community: LSTMs are not dead! RNNs are coming back!
According to Sepp Hochreiter, pioneer of deep learning and one of the authors of LSTM, xLSTM excels in time series prediction: "Our xLSTMTime model demonstrates excellent performance against state-of-the-art transformer-based models as well as other recently proposed time series models." That’s big! Let’s review what we know about LSTM networks and explore their new promising development – xLSTM.
In today’s episode, we will cover:
LSTMs are not dead at all: current use-cases
The story of LSTM
What is the Vanishing Gradient Problem?
Popularization and success
LSTM limitations and their overshadowing by Transformers
Introducing xLSTM: Addressing the shortcomings
The Architecture of xLSTM
Evaluations
Applications across domains
Conclusion: The future of sequence modeling with xLSTM
Bonus: Resources
LSTMs are not dead at all: current use-cases
Just to give everything its proper credit, LSTMs are not dead at all. They might be overshadowed but are still in heavy use. Here are a few examples of how LSTMs have been used in our daily lives (for years!):
Traffic prediction in navigation apps: Apps like Google Maps or Waze employ LSTMs to predict traffic patterns. By analyzing historical traffic data, current conditions, and even factors like weather or local events, these models can forecast traffic congestion and suggest the fastest routes in real-time.
Music generation and recommendation: Streaming services like Spotify use LSTMs to analyze your listening history and generate personalized playlists. The LSTM can understand patterns in the types of music you enjoy and predict songs you might like, even accounting for how your tastes change over time.
Predictive text on smartphones: When you're typing a message, LSTMs help predict the next word you're likely to use based on the context of what you've already written. (“Predicting the future of our planet” - that’s the text LSTM just suggested to me).
The story of LSTM
In the early 1990s, researchers were excited about Recurrent Neural Networks (RNNs). These networks were designed to handle sequential data, making them useful for tasks like speech recognition and time-series prediction. But, RNNs had a significant flaw: the vanishing gradient problem.