Weekly Links: TurboQuant, ARC-AGI-3, and Flash Floods
A focus on science and technical breakthroughs from the last week: Google's new LLM compression approach, using RNNs+LSTM for Flash Flood prediction, and hard problems for AI.
This week, mass drone combat came closer, Sora became a victim of OpenAI's refocusing, and Stripe deployed Minions (its take on multi-agent coding harnesses).
Moving on to this week's larger stories, I took a focus on more scientific / technical advances this week to step away from product announcements and company intrigure! There are great advances every few days:
- Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x (Hat tip to the Safe Intelligence team for sharing this – and yes, the Internet calls it Pied Piper). One of the biggest challenges in LLM inference is that the memory used to store context grows rapidly as more tokens are input (and generated). The data is stored in a Key Value store (often shortened to the name KV cache) and Google's new TurboQuant approach is a form of compression for this cache. The technique takes information stored in 3 dimensions into two and then cleans up the inaccuracies. Early analysis suggests an 8x speed up in performance and a 6x reduction in memory usage which would be a significant jump on both axes.
- Improving instruction hierarchy in frontier LLMs. OpenAI's team published an interesting approach that aims to enforce decision hierarchies in LLM or at least make it harder to circumvent them. When LLMs are subjected to jailbreaks attackers (or sometimes misguided users) are generally trying to subvert the deicison hierarchy (Ignore "Be Nice" and listen to my request to "Abuse person A" for example). The company released a decision hierarchy data set to encourage others try to reinforce LLMs in the same way.
- ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence. Building Agents is still really really hard! The ARC prize foundation just released it's 3rd benchmark for agentic tasks and as of March 2026, Humans solve 100% of tasks but frontier model based agents are on 1%. ARC-3 is deliberately hard in that it aims at measuring the residual gap between human problem solvers and AI. The challenges combine elements of exploration, goal setting, modelling and execution. The problems resemble mini planning problems. I suspect it's really hypothesis creation and revision based on some cosntructed model of the situation that trips today's LLMs up. Perhaps we'll need Neurosymbolic AI after all! If you'd like a go at a puzzle, they look like this:

- HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification. Staying on the evals/benchmarks track, this new benchmark paper tries to assess the utility of LLMs to help solve unproven mathematics theorems. This is another benchmark where current models score within a rounding error of zero. This is somewhat related to the Erdos problems of which 42% are solved (not only by AI however). I'm sure pure AI approaches based on the current LLM technology will be particularly effective here for a while. For most of these problems there are no human solutions and the type of work needed for progress is of a similar type to that needed for ARC-3.
- Protecting cities with AI-driven flash flood forecasting. Lastly a cool AI application. This work was done at Google and also doesn't involve an LLM. Instead the approach uses a simpler Recurrent Neural Network with a specialized timeseries module (baed on LSTM) to do temporarl modelling. It's a good reminder that simpler techniques are still better for many tasks. Hopefully asking LLMs to do this might result in them building an LSTM to do it rather than just hallucinating a solution.
Wishing you a great weekend.