Saturday Links: SmolVLM, connective labor, and government-funded AI

TinyLLMs, the human touch in labor, Occam's razor in learning + a deep dive into serving LLMs.

Steven Willmott

25 Jan 2025 • 4 min read

There was so much news this week that I'm not even going to cover OpenAI's launch of Operator (workflows to get things done – no doubt it'll be important, but we'll cover it and similar frameworks in the future + so far the demos have not been truly amazing).

Here are this week's standout stories:

SmolVLM Grows Smaller – Introducing the 250M & 500M Models! It's possible I have an unhealthy obsession with small models, but ... it's my belief that we'll ultimately be able to use combinations of very small models for many tasks, which will mean extremely powerful functionality in devices as small as sensors or earbuds. Hugging Face just announced a 256 and a 500M parameter vision model that has surprisingly good performance for image information extraction and multi-modal retrieval.
Meta’s Yann LeCun predicts ‘new paradigm of AI architectures’ within 5 years and ‘decade of robotics’. Yann LeCun has been out and about at Davos, as well as baiting Elon Musk on X. I generally like Yann's takes and agree with him that the future is neurosymbolic (we simply need more bits of brain than the "LLM"). Where I think he goes off base is in implying we don't need the LLM (or that it will be entirely replaced). Now, no doubt advances will be made, and terminology will change, but the breakthrough LLMs have made is in bringing together a somewhat coherent stream of consciousness based on stimuli and our learned patterns. This is like a "monkey mind" at times and seems unruly, but we surely all have an aspect to our minds that mirrors this. We build a lot around it, but when you sit to meditate for 10-20 minutes and realize how hard it is to control random thoughts that pop up, it seems clear that there is something like an LLM in there. We may not be "using" LLMs directly in 5 years, but their successors will be part of the system.
I set out to study which jobs should be done by AI – and found a very human answer. Professor Allison Pugh's article in the Guardian focused on the need to think about human connection in jobs. She suggests thinking in terms of a “connection criterion” to help us decide which AI to encourage – the kind that creates new antibiotics, for instance, or decodes sperm whale language – and which to put the brakes on, that is, the kind that intervenes in human relationships. I fully see the argument that human interaction and fully "seeing" people is a critical thing between individuals (and in society). I'm not a psychologist, but so many studies seem to support this. Where I feel that the argument goes off track is that it focused on "connective labor," which are work roles that can support human interaction, such as Doctors, Nurses, Baristas, etc. Why do the interactions need to be part of labor? They can be, but there are many other opportunities (friends, family, etc.), and in some job roles, the irony is that automation might free up more time to interact with patients. There are clearly dystopian scenarios that are with us already (automated ordering kiosks at fast food restaurants, for example, cut down human interaction). However, it seems unrealistic that we'll keep a system of many fast food workers working just to facilitate human interaction. It would make sense to create new interesting roles that, in their nature, encourage interaction (like solving scientific problems or creating a piece of art). The transition is hard, but trying to meet our need for connection through existing service jobs seems like the wrong way to think about the problem.
Alia, la IA del Gobierno, es un desastre: ha costado una millonada y no supera ni a modelos de 2023 // One of Spain's leading AI projects "ALIA" to build an LLM has drawn heavy criticism for being well behind LLMs released more than a year ago" (hat tip to a nameless contributor). The model costs 10M Euros+ to train and has caused external parties to criticize the team because benchmarks are below LLama 2 models. I don't know much about the project, but what this headline makes me feel is empathy for the team. Maybe mistakes were made, maybe not, but 10M is a tiny budget to produce an LLM (unless you are DeepSeek, who shocked everyone). Tech is moving very fast, and anyone who expected a publicly funded project to beat ChatGPT is not thinking straight. Also, the project started more than a year ago, and very few people could have predicted the rate of improvement in 2024. Some comments suggest they would have been better off fine-tuning an open-source model. My take on that is absolutely, since it's highly unlikely public money can produce something as good as what is out there. However, now mix in the confounding issue of EU regulations, which make it unattractive for companies like Meta to allow their more powerful models to be used in the EU (Llama 3 text models are now available, multi-modal models are not). This starts to look like a future of falling further behind.
Researchers find the key to AI's learning power—an inbuilt, special kind of Occam's razor. This is a scientific-sounding article but is fairly high level. I haven't gone deep into the underlying research, but the concept described makes some intuitive sense. When neural networks are trained they are essentially being driven to approximate the functions and patterns in their training data. The paper argues that as networks get bigger (more parameters) they could make functions more and more intricate (overfitting), but they often still find simpler, more general functions. It's unclear why this is, but the hypothesis presented is that the real world in fact has simplified patterns that can be discovered and that we often don't see. This is intuitively true (even obvious), but what is that the parameters that govern learning need to be finely tuned with the prevalence of data patterns to function.

As an additional tip for the week, I enjoyed this deep dive tour into AI model inference on the Latent Space podcast. It definitely gets into detailed territory, but it's high value in what it takes to actually serve up a model: Everything you need to run Mission Critical Inference (ft. DeepSeek v3 + SGLang).

Lastly, for more general reading, Ray Dalio has dropped a series of posts on big country debt crises, that are well worth reading. Whether you agree with his reasoning or not, there's a lot to think about.

Wishing you a great weekend and week!