Easter Links: Extending transformers, Devika and Databricks flexes

Easter Links: Extending transformers, Devika and Databricks flexes

Here are this week's standout AI links:

  • AI21 Labs juices up gen AI transformers with Jamba. Another little step in tweaking the power of transformers. The new model from AI21 labs focused on integrating Mamba (originally developed by CMU and Princeton) with Transformers to manage context more efficiently. In a nutshell, transformers (the current leading approach for building LLMs) need to keep the whole prompt history in memory to keep functioning. Mamba and approaches like it are trying to be selective about what they retain in active memory. Maarten Grootendorst has a detailed and clear visual guide here.
  • Devin, meet Devika. Where industry goes, open source follows (quickly!). A couple of weeks ago, the big wow event was Cognition.AIs powerful demo of an agentic AI for software development. Designed to act more as a co-worker than a co-pilot, the system can carry out almost all elements of a software project end-to-end without much human input. Seeing this, one might imagine an army of hired AI software engineers (maybe Cognition.AI would need an HR department for them). This week, open source followed up, and Devika aims to build capabilities similar to open source. If Devika reaches the same levels of functionality, the mental model flips again; now, every company just has to decide how many CPUs/GPUs to allocate to software development and how many to runtime execution.
  • Microsoft restructures for AI. Mikhail Parakhin, who used to head Bing search at Microsoft, is now transitioning to a new role after Inflection.AI's Mustafa Suleyman took over AI consumer products at Microsoft (which includes Bing). A personal change that's interesting because it underlines what Microsoft thinks is the future of search -> AI. We'll see if they are right or not, but since they trail Google by so much in search, it probably makes sense to go all in.
  • The problem of AI ethics, and laws about AI. Some good points from Ben Evans on the trouble with regulating AI ethics. I fundamentally agree that the target of regulation really should be on the use technology is being put to, not the technology itself. Different dimensions of that use require wildly different regulations (e.g., regulating an industry to prevent it from polluting the environment versus regulating it to prevent it from producing a good with sharp edges that can harm children). The argument regarding regulating AI as a technology stems from the fact that it's a powerful general technology that makes creating certain types of harmful products easier. However, this general utility argument makes AI valuable (like chips, motherboards, network switches, and Internet protocols). You could say AI should be regulated just as guns are in many countries, but it would be more like regulating high-grade steel and certain chemicals that could, under certain directions, be used to make a gun, but under most other uses are extremely beneficial.
  • DBRX: Databricks flexes for Enterprise AI. This week, Databricks published its DBRX LLM on Hugging Face and made it available to customers on the Databricks platform. The announcement post is interesting for a couple of reasons. The first is that a key part of using LLMs in the enterprise will be running on your own data, which Databricks is already managing for you. As a result the stronger Databricks can make its own models, the less the need to call out to other models. The second is that the blog post details the training process and implies training only took around $10M using the team and infrastructure it acquired from Mosaic, which is the same infrastructure also available to its customers. This won't mean everyone can now train similarly good models (the expertise gap is real), but as a long-term trend stand-alone LLMs that aren't attached to your data may end up looking less attractive.

In sad news, the Nobel Prize-winning psychologist Daniel Kahneman has died. Together with his long-time collaborator Amos Tversky he rewrote our understanding of why humans do what they do. The literature on cognitive bias owes them a great deal. A great place to start is Kahneman's book Thinking, Fast and Slow.

Happy Easter!