Weekly Links: 1000 Commits, Jailbreaking, and AI Movies

AI is still vulnerable to jailbreaks and will likely continue to be hard to secure. AI and the movie industry will have a challenging year again in 2026.

Weekly Links: 1000 Commits, Jailbreaking, and AI Movies

This week, CEOs go coding crazy, starting with Shopify's Toby Lütke, World Labs raises $1B, and (in other news...) Dancing Robots.

On to the main stories of the week:

  • Boundary Point Jailbreaking: A new way to break the strongest AI defences (Hat tip Yanghao on the Safe Intelligence team). Report on the first automated techniques that defeat Anthropics' Constitutional AI and OpenAI's current input classifiers. This iterative technique looks for model jailbreaks by looking for the decision boundaries in the harmful content classifiers built into (or around) a model, and then probing with variations of jailbreak options. These attacks are very hard to defend against unless model defences can look at collections of prompts rather than single calls or have multiple layers of input filters.
  • Anthropic published the prompt injection failure rates that enterprise security teams have been asking every vendor for. The latest AI models are impressive and run with sophisticated input and output filters. Yet, jailbreaks still work on them, and it's surprising how variable the results are depending on where an input comes from. Kudos to Anthropic for publishing detailed information like this. Iterating attacks to tune them gets through at high percentages after 200 turns. Models will also need to layer in controls to try to determine when such attack tuning is taking place.
  • KPMG partner fined for using artificial intelligence to cheat in AI training test. This story illustrates the all too human challenge with AI. Suddenly, a whole slew of things have an "easy mode" button. That's very hard to resist, especially when the task might not be considered as important as the day-to-day work at hand. Clearly, KPMG had to act to stop the cheating, but they may also need to redesign those tests.
  • AI Doesn’t Reduce Work—It Intensifies It. This Harvard Business Review article hits on a key reality with AI-powered work. The latest agentic systems are impressively good and complete tasks quickly. Working with them can really turn into a productive flow, surfing a wave of getting things done. The cognitive load can be high as well, though. I'll often find myself running multiple AI queries in parallel. Then, when the queries are complete, I feel the push to launch the next one to keep the whole thing rolling. Anecdotally, I also see this happening with Engineers running Claude Code and Codex on long-running tasks. It makes sense to have 3-4 streams going at once. The ability to be productive spurs work intensity that can be a real challenge to handle.
  • ‘Pulp Fiction’ Writer Says It Was ‘Impossible’ to Get His Movies Made Until He Started an AI Production Company: ‘Just Put AI in Front of It and All of a Sudden You’re in Production on Three Features’. Variety's headline pretty much tells the whole story. There is significant resistance to AI technology in the movie industry, along with uncertainty about what it will bring. This has a paralyzing effect on Hollywood and pushes others to make movies by other means. Means which are now getting rapidly better (see the uproar about the new movie quality clips that can be made with Seedance 2.0: Brad Pitt v's Tom Cruise fight scene). After a series of strikes in 2023 and 2024, a three-year agreement on AI usage was reached in the US between writers and Hollywood studios to control the use of AI. This kind of agreement is really just a temporary patch, though, unless an entire industry refuses to use the technology, anyone who does use it will race ahead. The 2023/2024 agreement also expires this year so the topic and there will no doubt be more pressure to use AI.

Also, don't forget to order your Mac Mini while you still can!