FOD#104: AI “Holy Shit!” Moments of the Year – and What’s Still Not There Yet
short interviews from the AI Engineer World Fair, plus our regular curated selection of the most relevant AI news and research papers
This Week in Turing Post:
Wednesday – AI 101 / Concept: Let’s discuss the role of synthetic data
Friday – Interview: Olga Megorskaya on Human-in-the-Loop
Two main topics today!
1) The “Holy Shit” Moments of AI vs “Not There Yet” Hype
It’s June – halfway through the year. The pace of AI development is neck-breaking, and many things get forgotten. What was once impossible is now a commodity.
So, while at the AI Engineer World Fair last week in San Francisco, I decided to ask a few builders: what was their “holy shit” moment with AI this year?
It’s funny but a significant “wow” moment for one person might still feel like a “not there yet” to another. We also talked about what parts of their jobs they’d actually be happy to hand over to AI.
Simon Willison (an independent AI engineer everyone knows, aka the author of the 'pelican riding a bicycle' benchmark), swyx (Latent Space podcast and AI Engineer conferences), Jerry Liu (LlamaIndex), Solomon Hykes (Docker and Dagger), Stefania Druga (AI Educator) and a few others shared their views: watch and subscribe →
(I publish text summary of the survey online, you can read it here →read online
2) Everyone talks about these papers:
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity by Apple
And How much do language models memorize? – a collaboration between researchers from FAIR, Google DeepMind, Cornell University, and NVIDIA
But what do many people miss about them?
The papers seem to study different phenomena. One investigates the limits of reasoning, the other of memorization. But what everybody missed is this: they describe the same underlying breakdown – a model’s coping mechanism when pushed past its fundamental capacity.
In the Illusion of Thinking paper, the model’s processing capacity – its effective “CPU” – is overloaded by puzzles that demand deep, multi-step reasoning. The model’s response is to abort the task mid-thought, reducing reasoning effort as complexity increases. This leads to a visible reasoning collapse.
In the Memorization paper, the model’s storage capacity – its “Hard Drive” – is saturated by large training sets. The model can’t memorize everything, so it begins to compress aggressively by generalizing. This triggers the double descent phenomenon and reduces the model’s ability to recall specifics.
Same failure, different modality. Whether the pressure comes from too many steps or too much data, the result is the same: the model simplifies, guesses, or shuts down – all while still outputting something that looks fluent and confident.
Reasoning collapse and forced generalization aren’t separate problems. They’re two faces of the same coin: how finite architectures break under load.
Welcome to Monday. Don’t you worry, the models are still just a technology with a bunch of technological bottlenecks.
Curated Collections
LLM, SLM, VLM, MLLM, LAM… There are a lot of model abbreviations out there. We decided to help you learn them — or at least make them a bit clearer.
Follow us on 🎥 YouTube Twitter Hugging Face 🤗
We are reading/watching
The Perfect Week of Exercise - Rick Rubin talking to Jack Clark (Anthropic)
Disrupting malicious uses of AI: June 2025 by Open AI (read it like a thriller).
Some thoughts on human-AI relationships from Joanne Jang, OpenAI’s lead model behavior & policy
The last six months in LLMs, illustrated by pelicans on bicycles by Simon Willison
News from The Usual Suspects ©
Apple opens the AI gates – but keeps Siri on mute
At WWDC 2025, Apple finally cracked open its AI vault. The new “Apple Intelligence” suite is now available to third-party developers, promising features like image-aware suggestions and real-time translation. But Siri, expected to headline the show with a reboot, was curiously absent. That evolution is now delayed till 2026. For now, devs get the toys—users still get the same old assistant.Yoshua Bengio’s LawZero
AI legend Yoshua Bengio has launched LawZero, a nonprofit devoted to building AI that doesn't go rogue. Based in Montréal and incubated at Mila, the lab rejects agentic designs in favor of “Scientist AI” — models that understand rather than act. Think oversight over ambition. Backers include Open Philanthropy and Jaan Tallinn. The aim? Guardrails for an accelerating world.Anthropic’s Guide to Claude Code
Anthropic is dogfooding Claude Code across the board – from growth marketers building Figma-integrated ad generators to legal teams prototyping accessibility tools in an afternoon. Whether it's Kubernetes debugging, React dashboard generation, or Terraform reviews, Claude Code is their new universal colleague.OpenAI’s Voice
OpenAI has rolled out improvements to ChatGPT’s Advanced Voice Mode for paid users, enhancing naturalness and expressiveness in speech. The updated system now handles subtleties like tone, cadence, and emotional inflection more effectively. It also introduces live translation capabilities across languages – useful for both travel and global collaboration. Some minor audio inconsistencies remain, but overall, voice interactions take another step forward.
Models and datasets to pay attention to:
SmolVLA: A vision-language-action model for affordable and efficient robotics
Researchers from Hugging Face and Sorbonne University developed SmolVLA, a compact VLA model with just 0.45B parameters that rivals 10× larger systems in robotic control tasks. Trained on 22.9K episodes from 481 community datasets, it supports single-GPU training and CPU deployment. SmolVLA uses an asynchronous inference stack to decouple action prediction and execution, enabling 30% faster control. It outperformed larger baselines in real-world and simulation benchmarks while maintaining efficiency and reproducibility →read the paper