Sunday Links: OpenAI Week, TRM, and Reflection.AI

OpenAI makes a slew of announcements, taking on industry giants in every direction.

Steven Willmott

12 Oct 2025 • 3 min read

There's a good case for calling this week "OpenAI week," given all the big announcements the company made. We'll cover a couple of big ones below, but you might want to watch the whole keynote.

Introducing Agentkit. OpenAI introduces a way to create workflows and narrow repeatable processes with OpenAI models, without needing to write code. It's a big deal that shows that OpenAI is not just aiming at the consumer market but also wants to redefine work. There are a host of startups, many already very large (like Crew.AI), doing the same thing; now they are up against the model builder as well.
Introducing apps in ChatGPT and the new Apps SDK. Perhaps even more far-reaching in its implications is the new ability for third parties to create "apps" within ChatGPT. A host of well-known brands were on hand as early integration partners, including Zillow and Booking.com. A good way to think about this is OpenAI turning ChatGPT into a "SuperApp" that acts as a foundation for all sorts of functionality that you would normally go to other apps for. Much like WeChat in China, if it takes off, it would essentially keep users in the ChatGPT experience and make offerings via the Google Play store or the Apple App store much less relevant. I can't help thinking, though, that while this is a big power play, in the very long run, OpenAI probably imagines removing all these apps again and creating UIs on the fly for much of this functionality. Apple and Google risk being disintermediated if this takes off. Internet companies like the parties already on board are almost forced to create their own app to be visible to ChatGPTs large user base.
Sora 2 -> Sora App Cameos -> Social Network. So as not to let Facebook off the competitor list, OpenAI also launched its new video generation engine Sora 2 with a host of upgraded features (easier creation, sound with every video). More importantly, though, they launched the model in a social app (iPhone only initially) that allows you to easily create video, including with yourself as a protagonist. The app also allows you to invite and connect with friends to share videos. The app has been solidly at #1 or #2 in the Apple App store charts since launch and is growing fast. By making creation easy, this new social app is arguably tackling the only Achilles heel of Instagram, Snap, and other existing networks. Content there is (mostly) human-generated, and despite a lot of power tools, it still takes time. To worry Facebook, Sora doesn't have to get as big as Instagram overnight, but if it reaches a scale where there is a constant flow of interesting content from friends, it might displace a significant amount of user attention away from existing networks. Facebook's own AI video content offering Vibes has gotten mixed press, but Facebook is also in a more difficult position than OpenAI. The latter has no social network to damage if things go wrong; Facebook's ads business is entirely dependent on monetizing its networks through advertising.

OpenAI's launches are impressive in how bold they are in going after new markets and in how well executed they seem to be. One has to wonder, though, if striking so hard in so many directions might alienate partners they may need down the road. Hampering a plethora of agent platforms, for example, might turn out to be a bad move if they all move onto using Claude/Anthropic instead.

Was there any other news... Yes, a lot, and I'm sure we'll catch up next week, but for now, just two standout links:

Reflection AI raises $2B to be America’s open frontier AI lab, challenging DeepSeek. Just when you thought the frontier lab landscape might reach some kind of equilibrium, it turns out it is still possible to raise large amounts for new efforts. I'm not sure aiming to be a national champion makes sense, but it's good to see Open Source as an objective for Reflection AI.
Less is More: Recursive Reasoning with Tiny Networks. At the other end of the model size scale, this paper builds on previous work on small models that use abstraction to plan. The TRM model described here uses only 7M parameters (truly tiny) but obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the parameters. To be clear, that's not a great result on ARC-AGI-2, but that benchmark is particularly important because it is measuring "non-memory" intelligence. I'd expect some of these TRM results to be useful in making large models significantly smarter as well.

Wishing you a great weekend.