Smokebox: Copyright, AI Baby Faces and Cloning Yourself

Plus the limits of LLMs and what we'll need to build around them.

Smokebox: Copyright, AI Baby Faces and Cloning Yourself

The firehose of news never really shuts off, but this week did seem a few notches lower pressure than the previous few. No doubt everyone is recovering from their summer break. Here’s what stood out this week:

  • The Battle over Books 3: Wired has a detailed article on the fight over one of the key data sets currently used for AI training. Books3 contains the full text of nearly 200,000 books, some out of copyright, some not. The database has now been removed from a number of public hosting providers, but the debate over what can be used for training is far from over.
  • AI Startups: Sell Work, Not Software: Sarah Tavel has a thought-provoking post about the choice between selling software tools and an actual work product. For example, in accounting, rather than selling an accounting tool, sell the accounts being done. In financial analysis, don’t sell analysis tools but completed reports. This is a compelling argument in many domains where AI tools may just be too complex for buyers. On the other hand, this approach really will accelerate full automation and de-staffing in admin, legal, and other departments. Perhaps start with everything everybody hates to do, like writing software tests and cross-checking expense reports.
  • Ourbaby.AI: Just when you thought the crazy idea stream was slowing down, this service will let you predict what your future offspring might look like. All that’s needed for world domination is an API and an integration into the world’s favorite dating apps…. (Also, you might want to consider the privacy implications before uploading photos of yourself and your beloved.)
  • Delphi, the digital cloning platform, raises $2.7M (announced on X): This is one of the applications of AI that seems likely to get more and more traction over time and will also lead to many of the thorniest ethical issues. Quite a few companies have started offering “cloning", from visuals to voice and content (and many others). The idea is, when trained on content, your clone can respond “as you would” and potentially even do work on your behalf. In the short term, these are likely to be fun to play with but inconsequential. In the long run, questions will arise about liability, the ability to devolve authority to a “clone,” and rights for re-use. Cloning the dead in silica has also been an idea for several years. As services for this grow, they may come with profound challenges in determining the wishes of the deceased and the well-being of users.
  • Yan LeCun’s Talk on Objective Driven AI (1hr / Youtube from a month ago): Yan is one of the leading scientists in the space and goes into current technology’s strengths and weaknesses. If you are investing in or building something that relies on LLMs, it’s really worth your time to understand what LLMs lack. Some of the concepts are related to my post from a couple of weeks ago on architectures for the mind.

All the talk of copyright and AI training makes me wonder what will happen with scientific papers.

Copyright is often owned by major publishers such as Elsevier, but most scientists find ways to publish versions of their results online (and arxiv.org has become the shortcut to getting results out to the world). We’ll likely want future AI systems to apply knowledge in many of these papers (and cite them correctly). We’ll also want them to do an excellent job discerning valid papers (roll on AI scientific paper reviewers?). Perhaps paper archives need to be built with AI training in mind.

Lastly, one of the AI greats, Doug Lenat, sadly passed away. I remember studying the CyC project during my Masters and Ph.D. It was an immense effort, way ahead of its time.  Gary Marcus has a touching post about his life here and the final papers he worked on.

Have a great weekend!

PS: No steampunk robot image today. I had fun generating steampunk robot babies, but they were mostly quite creepy or hilariously off-base. Given the news about Doug Lenat, we’ll leave the strange baby pictures out of it.