Revolutionizing Time Series Forecasting: Interview with TimeGPT's creators
It's not an LLM! Azul Garza and Max Mergenthaler talk innovations, open-source, diversity, and pain of wrangling 100 billion data points
Today, we're excited to welcome Azul Garza and Max Mergenthaler, co-founders of Nixtla and researchers behind TimeGPT, the first foundation model designed specifically for time series forecasting (→read the paper here). Time series are used to analyze trends, identify seasonality, forecast future values, and detect unusual patterns, and more. Foundation models are revolutionizing time series forecasting because they are pre-trained on vast amounts of diverse data, enabling them to adapt to new forecasting tasks efficiently. This translates to versatile models that can handle complex data patterns, eliminating the need for custom models for each specific use case. Just recently, giants like Amazon (Chronos), Google (TimesFM), and Salesforce (Moirai) have released their proprietary time series foundation models. More breakthroughs are awaited in 2024!
Great to have you for this interview, Azul and Max. How did you come up with the idea for TimeGPT?
We have been working in the time series space for quite a while, and when we started Nixtla 3 years ago, we started it with the goal to help all practitioners with best-in-class implementations of classical and contemporary algorithms. We began developing various open-source libraries that have now become the Nixtlaverse. This is the most comprehensive open-source time series project to date, including statistical, machine learning and deep learning models, and we’ve been excited to see that they’re being used by many Fortune 100 companies and startups alike, and have been downloaded more than 10 million times. As developers and maintainers of these libraries, we've had the privilege of working with top data science teams across the globe, and realized that a main obstacle in forecasting is that it remains an extremely hard and expensive process that requires a highly skilled team. We wanted to change that and democratize access to state-of-the-art time series tools, without the need for a dedicated team of machine learning engineers. Inspired by the revolution of OpenAI and others in text processing, we aimed to bring the whole paradigm shift of generative pre-trained models to time series. It wasn’t clear yet if there could be an effective foundation model in time series analysis, so we set out to explore what was possible and how accurate it could be. We didn’t want to do this just for the sake of it being done. We wanted it to provide fast and accurate results for people working in time series. This is how TimeGPT, the first foundation model for time series, was created and released.
How exactly does TimeGPT differ from an LLM? How is time series data transformed into tokens to be fed to the transformer?
We get this question a lot: GPT here stands for a generative pre-trained transformer, which is in no way related to a classical Large Language Model (LLM). The GPT in ChatGPT stands for the same thing, because they use transformer approaches, but we don’t do the Language part, we do the data part and deal with time series data. We essentially built a completely new model that was trained using publicly available time series from different domains, including retail, IoT, manufacturing, healthcare, electricity, and web traffic. It does not understand text; it only understands time series for forecasting and anomaly detection tasks. Technically speaking, the optimization function of TimeGPT is very different from the sequence prediction task in natural language contexts. In our case, the model takes as input the data with timestamps, values, and exogenous variables and outputs predictions or the anomalies detected. That means, when we speak about tokens in TimeGPT, we are really referring to timestamps. There is no embedding process in our case.
Despite deep learning's transformative impact on fields like NLP and computer vision, its contributions to time series forecasting have been more measured. From your perspective, what breakthroughs does TimeGPT represent in this context?
That’s a great question. There was a lot of discussion in the field as to whether deep learning approaches would outperform classical models. We did a lot of comparisons and work in this space ourselves, and the classical models do very well in many contexts! Readers familiar with our past contributions will remember that Nixtla has played a pivotal role in demonstrating how classical models outperformed many so-called state-of-the-art models with a fraction of the cost and complexity. We have published different experiments showcasing this. Therefore this evaluation of deep learning models is at the leading edge of the field, and we’ve done research in this space for many years. TimeGPT has been the first demonstration that a deep learning approach can not only outperform existing approaches, but also is much, much faster. From our perspective, the breakthroughs that TimeGPT represents in this context are significant.Â
We believe that times are changing, mainly due to data and compute availability. Currently, deploying pipelines for time series forecasting involves several steps, from data cleaning to model selection, that require a lot of effort and specialized knowledge unavailable to many users and companies. Pre-trained models offer a whole new paradigm in time series forecasting and anomaly detection given that users don’t have to train and deploy their own models. Simply upload and forecast.Â
The main breakthrough of TimeGPT is that it showed for the first time in the history of the field that the idea of a general pre-trained model was possible. In other words, TimeGPT is the first large-scale example of the transferability of time series models ready for production. We believe this marks a new chapter in the time series field, and we are extremely happy to see entities like Google (TimesFM), ServiceNow (LagLlama), Amazon (Chronos), Salesforce (Moirai), and CMU (Moment) following in our footsteps and contributing to this idea of pre-trained models for time series.
I'm curious about the choice of a Transformer-based model for TimeGPT. What drove this decision, and how has it influenced the model's performance and its ability to scale?
The short answer is: it was empirical. We tried (and are still trying) different deep learning architectures for time series. In our tests, we found transformers to be highly scalable and accurate when using huge and diverse amounts of data.