Token 1.4: Foundation Models – The Building Blocks

We touch upon some systematic concepts and also offer a few practical insights from Rishi Bommasani

Oct 13, 2023

Token 1.4: Foundation Models – The Building Blocks

The article is a good primer for someone looking to get acquainted with the topic. Here we touch upon some systematic concepts and also offer a few practical insights from Rishi Bommasani, a co-author of one of the best papers on the topic “On the Opportunities and Risks of Foundation Models.” In this episode, we will explore:

The definition of foundation models (FM)
Key characteristics that have transformed our understanding of ML applicability
Various types of FMs
Current trends in the field
Non-generative FMs worth noting
Unique challenges posed by these models
And, we'll wrap up with an interview on how to assess if your company should adopt a foundation model

In Token 1.1., we touched upon the paradigm shift from task-specific to task-centric machine learning (ML):

“Task-centric ML focuses on using foundation models to perform a wide range of tasks efficiently and with fewer training examples. You're no longer burdened by the never-ending thirst for new data. This eliminates the 'data bottleneck,' allowing you to iterate faster and meet evolving business needs.”

So let’s dive deeper into what are those foundation models that facilitated this profound shift in AI/ML applications.

The definition of foundation models

This term gained traction after being coined by the Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) in their seminal August 2021 paper, titled “On the Opportunities and Risks of Foundation Models”:

"A foundation model is any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks."

They are not foundational models in ML but they present a foundation that doesn’t need to be changed every time you change the task.

There is an interesting aspect of covering foundation models. Technologically, FMs are not new. They are based on deep neural networks and (most of the time) self-supervised learning – which have been around for decades.

"Foundation models are enabled by transfer learning and scale. The idea of transfer learning is to take the 'knowledge' learned from one task (e.g., object recognition in images) and apply it to another task (e.g., activity recognition in videos). Within deep learning, pretraining is the dominant approach to transfer learning: a model is trained on a surrogate task (often just as a means to an end) and then adapted to the downstream task of interest via fine-tuning."

Transfer learning is the mechanism that makes foundation models feasible. By adding scale, these models become extremely potent. The OpenAI developers were puzzled by the success of ChatGPT because it was a version of an AI system that they’d had for a while. What truly brought changes into the ML world was… user experience. “We made it more aligned with what humans want to do with it. It talks to you in dialogue, it’s easily accessible in a chat interface, and it tries to be helpful. That’s amazing progress, and I think that’s what people are realizing,” explains Jan Leike, the leader of OpenAI’s alignment team.

This recent success in simplifying access to these models has shaken the whole industry, enabling everyone – from a research lab to an ML startup, to enterprises and the average Joe – to start learning about FMs/LLMs and applying them. An uncountable amount of ML startups pivoted towards LLM applications. Yet, as accessibility grows, our collective understanding of these models remains in its early stages. A systematic approach, particularly when categorizing FMs, is evolving.

What sets foundation models apart?

Read further

Discussion about this post

Ready for more?