FOD#130: Where Is AI Heading in 2026?
were you right a year ago? Let's see! Compare the predictions and make new ones. Plus we share a very important shift in research
This Week in Turing Post:
Wednesday / AI 101 series: The state of RL
Friday / AI Interview: Ben Goodger: why OpenAI needs its own browser?
Our news digest is always free. Click on the partner’s link to support us or Upgrade to receive our deep dives in full, directly into your inbox. Join Premium members from top companies like Hugging Face, Microsoft, Google, a16z, Datadog plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand what’s going on with AI → Upgrade today
Were we and you right? Revisiting our early-2025 predictions and making new ones
Making predictions, especially about the future, is famously tricky yet remains a favorite year-end tradition. But, as Antoine de Saint-Exupéry said: “Your task is not to foresee the future but to enable it.”
So let’s enable it! This is our yearly tradition. What do you want AI in 2026 to be? What do you think it will be the year of?
Send us your thoughts – ks@turingpost.com – to be featured in the special Predictions Edition of Turing Post at the end of 2025!
→ OR SIMPLY REPLY TO THIS EMAIL WITH YOUR PREDICTIONS ←
And now let’s see if any of us were right 12 months ago:
Last December, we made a bold bet: 2025 would be the “Year of Inference-Time Search.” We predicted a massive industry pivot from models that talk fast to models that think slow. Looking back, that prediction defined the entire year.
We also said: “We think that Google will start to dominate the scene.” December proved it correct: OpenAI declared an internal “Code Red” as Gemini 3 threatened their lead. We even canceled our Pro subscription because there is literally no reason to keep paying $200 when Google (and other models) are now so good.
Here is how our 2025 scorecard looks in general.
The Big Win: The “Thinking” Shift François Chollet nailed it. Inference-time search now drives capabilities. The leaderboard has shifted away from parameter counts. Today, the best reasoning chains win. We are finally seeing “System 2” thinking in silicon. However, Chollet’s hope for a solved ARC-AGI benchmark proved too optimistic. We made massive progress. General intelligence remains an unsolved puzzle.
The Reality Check: Agents Stalled John K. Thompson correctly identified the macro timeline: AGI is nowhere near ready. His prediction of “millions of active agents,” however, missed the mark. 2025 proved that building an agent is easy. Making it reliable is excruciatingly hard. We remain in the pilot phase rather than the deployment phase.
The Sleeper Hit: Efficiency While the media chased trillion-parameter giants, real progress often came from the bottom up. Ronen Eldan, Will Schenk, and Maxime Labonne pointed early to the rise of compact, task-specialized models. And 2025 proved them directionally right: some of the most practical tools this year were small, efficient models that ran cheaply, handled math surprisingly well, and outperformed far larger systems in specific workflows. Examples include rStar-Math beating larger LLMs on reasoning tasks, Phi-3 Mini matching older frontier models on-device, and Qwen2.5-Coder outperforming bigger models in developer environments.
The Interface: The Death of Typing swyx predicted that voice would become the default. He was directionality right. In 2025, we still do type a lot, but conversation with a model became new normal.
The Verdict: The “Big vs. Small” debate turned out to be a false dichotomy. 2025 proved the necessity of both. Massive foundation models provided the broad reasoning substrate. Nimble, inference-heavy search solved the specific, hard problems. The industry figured out how to make them work together instead of declaring a single winner. We didn’t abandon scale. We learned that intelligence requires time as much as it requires data. Perhaps most significantly, this was the year we finally got used to AI.
What’s next? What 2026 will surprise us with? Send us your thoughts – ks@turingpost.com – to be featured in the special Predictions Edition of Turing Post at the end of 2025!
→ OR SIMPLY REPLY TO THIS EMAIL WITH YOUR PREDICTIONS ←
Topic 2: When everyone flew to NeurIPS, I went to Art Basel Miami to see how AI is doing in the wild (art). Why it’s good when machines hallucinate, and how much a robodog with Elon Musk’s head costs → check it out here
We are also watching/reading:
Introducing Anthropic Interviewer: What 1,250 professionals told us about working with AI by Anthropic
State of AI: An Empirical 100 Trillion Token Study by OpenRouter and a16z
AI Engineering Code Summit Report by Will Schenk
From Code Foundation Models to Agents and Applications: A Survey and Practical Guide to Code Intelligence by ByteDance
Curated – Highly recommended list of papers from NeurIPS
Follow us on 🎥 YouTube Twitter Hugging Face 🤗
Survey highlight – Deep Research: A Systematic Survey
Researchers from various institutions present a survey on Deep Research (DR). The paper reviews optimization methods – prompting, supervised fine-tuning, and agentic RL – and outlines evaluation criteria and unresolved challenges to guide future DR systems →read the paper. If Deep Research is of interest, you might also like the paper “How Far Are We from Genuinely Useful Deep Research Agents?” (→read it here)
Research this week – the center of gravity is shifting
I want to make a few observations about the last week in the research world. If you felt a sudden drop in pure LLM papers, you’re not imagining things. The frontier is in a temporary holding pattern. Or not temporarily – talking about predictions ha – it might be that we’re finally seeing a significant shift in research effort: from LLMs being the star of the show to LLMs becoming the engine under the hood. The spotlight is moving up the stack toward world models, agents, multimodal systems, simulation loops, and the efficiency work that turns frontier models into everyday tools.
And underneath all that, you can sense another shift brewing. The field keeps bumping into the edges of the transformer recipe. Long-context tricks feel like hacks, inference costs refuse to cooperate, and most reasoning gains now come from scaffolding rather than architecture. It’s the kind of pressure that usually precedes a break. We see early hints in video models, optimization-time reasoning, memory modules, and agent papers that stretch beyond text prediction. These are silhouettes of a new blueprint for intelligence, not the blueprint itself. But it might be coming very soon. (as always, 🌟 indicates papers that we recommend to pay attention to)






Great retrospective on the inference-time search shift. The observation about research gravity moving from LLMs as the star to LLMs as the engine is spot on. We're seeing this play out with world models and agentic systems pulling focus, while the transformer architecture itself starts showing cracks under pressure. The fact that most reasoning gains now come from scaffolding rather than architecture changes is probly the most telling signal.