Sunday Links: OpenAI safeguards, LLM Introspection, and Bubbles

OpenAI continues to make announcements at an unprecedented rate: changing corporate structure, data on usage, and a policy enforcement model. Anthropic detects signs of introspection in models, and markets keep bubbling along.

Steven Willmott

02 Nov 2025 • 3 min read

This week, Bill Gates reoriented from climate to human wellbeing; small robots can now pull cars; and now you can't just remix music, but also sitcoms. On to the deeper dive links for the week:

A nonprofit on top, billions below: Inside OpenAI’s new corporate balancing act. One of the biggest bits of news this week was OpenAI finally getting approval to shift from non-profit status to corporate status (a For Profit Public Benefit Corporation) with the non-profit foundation owning 27% of the corporate entity. This opens the way for OpenAI to take on more venture funding and potentially IPO down the line. This change probably needed to happen, given the wide-ranging business impact it is having. It also creates one of the world's best-funded non-profit entities (the OpenAI Foundation owns 26% of OpenAI stock). Still, a big test will be to what extent the new entity really does align with a strong public benefit statement. The original OpenAI mission was: "to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity."
ChatGPT shares data on how many users exhibit psychosis or suicidal thoughts (Up to 1M weekly showing signs of suicidal thoughts). In a widely covered blog post earlier this week, OpenAI revealed figures on how many of its 800M users mentioned suicidal thoughts on a weekly basis. The %age is small, but with such a large user base still adds up to more than 1M per week. The blog post outlines a number of steps the company is taking to better handle responses to such queries and guide people towards seeking help. The headlines focus very much on the large number (which is shocking), but this is a complex issue. Is it better for people to talk (even if it is to a bot) rather than risk talking to no one at all? How well adapted are LLMs to addressing these concerns? What is the liability of LLM providers when things do go wrong? It seems obvious that mental health support and guidance will become a major part of personal AI interactions for many people. This is a scary thought, but running away from it won't make it any less true, so models will need to get good at handling these situations.
Introducing gpt-oss-safeguard (More OpenAI!). Potentially related to the previous story, OpenAI also released new versions of its open-source models this week that allow developers to use them for guardrail and content enforcement. The new models can be run locally in the data centre and have been tuned specifically for policy enforcement and safety. The company says: "Our gpt-oss-safeguard models and internal Safety Reasoner outperform gpt-5-thinking and the gpt-oss open models on multi-policy accuracy. The gpt-oss-safeguard models outperforming gpt-5-thinking is especially surprising given the former models’ small size." This is surprising in a sense, but likely due to the fact that this smaller model has been specifically tuned for a task. OpenAI has clearly come to the conclusion that in order to police a large general model, one will need something that is a step above it in terms of governance technology. This is good news since if the only answer was "an even more powerful general model" we'd never catch up with the need for reviewing power. We'll need to see if these models are indeed powerful enough to act as proper guardrails.
Signs of introspection in large language models (Anthropic). The title of this research post is perfectly pitched: "signs" of introspection. In the work, the Anthropic team describes experiments where they activate combinations of neurons in an LLM that seem to correlate to certain concepts and see if subsequent LLM responses reflect the change. The process involves comparing a base level of activation with a state where the specific concept vector is also activated. The results show that for a small percentage of iterations, the LLM responds immediately and recognizes that a new concept is present. Whether this repesents a "awareness" of one's own mind as humans are presumed to posses is unclear, but it is an interesting result. I'm not sure how different this really is from having something heavily represented in the context window, but at the very least, it's useful to know these phenomena exist.
Bubble & Build: The 2025 MAD (Machine Learning, AI & Data) Landscape. Zooming out from the individual lab activities (so many this week), this post by Matt Turk gives a high-level whirlwind tour of the AI market today. I don't agree with every one of his 25 market predictions/points, but I think he gets a lot right! I think he is right that capital markets are and will get bumpy, less so that AI is slowing down its spread. Commentators like Andrzej Karpathy, saying that "AI Agents are 10 years away," have an important point (the next steps are harder than the ones we just took), but I think there are still a lot of gains to be wrung out of even current innovations.

Wishing you a great weekend!