Why Watermarking AI Content Makes No Sense

Or, the case Data Provenance over AI Watermarking

Why Watermarking AI Content Makes No Sense

Almost as soon as Midjourney, ChatGPT, and others hit the public consciousness, it became clear that their tremendous power could also make it easier to create misleading “fake” content. Content that could then be passed off as being created by a human (hello, lawyers who didn’t have time to do their research) or imply an event happened that did not. Since then, a loudly recurring theme has been that AI systems capable of creating images, text, video, and other content should be adjusted to  “watermark” their outputs. The idea being that by tracking watermarks, it would be possible to tell what was machine-made and what was not.

This is a pretty natural reaction, and the idea has gotten traction to the point where OpenAI and Google have been attempting to carry out such watermarking (here and here). The G7 countries are also considering requesting watermarking (prominent players have apparently agreed), and China already banned images without watermarks earlier this year. There are also numerous efforts  that aim to detect AI prose just from the content itself, based on the idea that the content will be inherently identifiable.

Unfortunately, the core idea of using watermarking AI content to distinguish it from other content is inherently flawed. It is flawed at two levels:

  1. The task itself is effectively technically impossible in the general case. Even if it could be done for specific content types at the moment of generation, most workflows involve post-processing and manipulation by humans. Very few watermarking techniques are resistant to post-edit manipulation, and in many cases, the manipulation will be heavy. Detecting such watermarks also requires specialist analysis, which is hard to deploy efficiently at scale.
  2. More importantly, though, even if it could be done technically, the idea itself is back-to-front: by trying to identify AI content, you implicitly want to be able to make the assumption that “non-watermarked content” was not created by AI, or even that is somehow “real”. We already know this will be an assumption no one can make.

We’ll dig into both of these points, but in essence, mandating watermarking is the equivalent of asking bank robbers to identify themselves clearly as they walk down the street toward the bank they intend to rob. Even if many of the places where such would-be robbers could get their clothing label their clothing, there are still many places to purchase or wash these clothes. By assuming bank robbers will be identified, we’d be lulling ourselves into a false sense of security in cases when people are holding up the robbery flag.

We’d all like to think that with a system like this, we’ll be able to “trust” that images on the Internet are “real,” but that hasn’t been true for a long time.

So, if watermarking won’t solve the information trust problem, what will? There is indeed a potential solution; it is just one that looks daunting to build: reverse the approach and start tracking content provenance instead. This would take a collaborative effort and a lot of new infrastructure, but in the long run, it’s likely the only way we’ll be able to establish where content/facts come from and how much to trust it.

Why do we care about Watermarks in the First Place?

Before understanding the potential value of watermarks, it’s important to step back to understand the problem they are meant to solve. At the core of the problem is that the outputs of Generative AI systems (text, images, and increasingly video) have become so realistic that it now feels extremely easy to create images that appear real but are completely fabricated. Following the various threads of arguments, there are really two fears driving the discussion:

  1. That generative AI systems will make it much easier to spread misinformation and do so at a scale that they could defraud individuals or even destabilize democracies.
  2. That such realistic content could be used without disclosure when human-created content is expected (in art contests, in creative work, or in school  homework/tests).

The first problem covers cases where deception leads to criminal harm, the latter to real or perceived economic fraud.

It’s only natural to want to be able to identify such content at a glance and be able to discount it (or at least know what it is).

There is also a 3rd problem, which is that:

  1. Over time, an increasing percentage of the content on the Internet may also be the output of Generative AI systems. Given that the public Internet is the source of much of the training data for AI systems, this creates the risk that new systems are potentially ingesting more and more generated content. Ultimately, this could then lead new AI systems to degenerate as a result (though it’s unclear if this will in fact happen).

The critical thing to recognize with the first two questions is that the current negative reaction is really to the fact that AI has suddenly made it much easier to commit such fraud. Not that AI has made them possible. Falsified evidence, faked images, and spurious photo montages have existed for a long time. Schoolchildren have also copied answers, most likely since the concept of school was invented. AI has just made these things far more straightforward.

The question we think we are addressing with watermarking is “spotting fakes” or spotting “non-permitted automation.” However, being able to spot output from an AI will definitely not guarantee the first and only partially addresses the latter.

The third problem is AI-specific, but as we’ll see later, mandating watermarks isn’t likely to help that much there, either.

The Technical Challenges of Watermarks

The first issue with the watermarking GenAI output is that it is extremely difficult to do. The base case of the problem statement could be written as follows:

Encode, in any output, a marker that (with the appropriate decoding) identifies that the content was created by an AI system, yet that does not interfer with the utility or quality of the content.

In other words, insert a marker in the content such that the content (image/text, etc.) looks pristine to a viewer but that can be identified when specifically sought out.

There are methods such as steganography to encode information invisibly into images, and, for text, OpenAI is reportedly using statistical encodings similar to those described in this paper. In images, stenography techniques make tiny adjustments to the image in patterns that blend into a normal viewer but are detectable if you are looking for them. For text, the statistical encodings use a reference list of words and insert these words at a slightly higher statistical frequency than a human writer would.

These techniques can encode a marker and do so discretely. However, there are further hurdles to clear. The first is that, in many cases, images and text will be post-processed after being generated. This may happen because a human edits something (an essay, for example) or uses algorithms are used to enhance an image. Some of this processing may be light or predictable (color filters or sharpening/softening edges), but some of it may be extreme: human additions/removals of large parts of the image or noise-based transformations.

So the problem statement now would read:

Encode in any output a marker that (with the appropriate decoding) identifies that the content was created by an AI system, yet that does not interfer with the utility or quality of the content and that is reslient to post-processing.

The maker would still need to be there even if there are manipulations on the file. This may be possible for some types of transformation but is extremely hard to do in the general case.

The problem is particularly significant given that many of the best uses of GenerativeAI will be in cases where humans work closely with AI to create things. For example, the new agreement between Hollywood Writers and Studios explicitly acknowledges that AI might be part of the writing process for shows and movies. The writing and rewriting in cases like this would be very likely to disrupt text watermarks.

Another problematic aspect of this is that since the software for identifying watermarks needs to be relatively widely distributed for the system to be useful, bad actors could run many examples through it to determine just which perturbations negate the current system. This creates an arms race between watermarkers and scammers, which is hard to win.

Image manipulation software today (including open-source products) is also extremely sophisticated, so flushing out watermarks from images may prove to be extremely easy. Open-source tools for flushing watermarks seem a near certainty.

So, while it may not be completely impossible to create a system that does work for some kinds of content, it seems highly unlikely we’ll be able to do it at scale and cover all content.

Scott Aaronson (UT Austin & OpenAI) has a great talk on ChatGPT watermarking and some of the issues with apparent approaches (including key questions on deployment). Researchers also recently assessed all current watermarking solutions and found them all to be breakable.

Lastly, in this section, we need to acknowledge that even if the main Generative AI providers all collaborate to add watermarking, open source Generative AI systems are already widely available and often rival commercial systems in output quality. So even if ordinary users do not actively try to subvert watermarks, plenty of people will, either via open-source models or by creating manipulation tools by back-testing against watermark detectors.

Solving the Wrong Problem

Overcoming the technical challenges in watermarking is extremely hard, if not impossible. However, even if we could, there’s an even bigger reason why it won’t give us the comfort we seek.

The primary driver behind watermarking is to create a signal that identifies AI-generated content. Deploying this technology at scale and for the stated purpose, we would essentially be saying:

If you see the “AI content” flag, then the content is AI generated. If you don’t see this flag, it is not.

Even if this statement is never explicitly made, this is the implicit promise of such a system. We would be trying to establish trust in the ability of automated systems to tell us whether AI was involved.

This promise can really never be true, however: 1) There are already plenty of ways to generate AI content that is not watermarked, 2) It’s already possible to create “fake” content, even without the use of AI, and 3) Even content that is initially watermarked might have been subject to enough manipulation to remove it. It’s even true that non-AI content could be watermarked to indicate something was made by AI that was not.

As a result, we’d be training Internet users to trust in a system that at least has some failure rate.

This is actively more harmful than not having the system in the first place.

Just because something is not AI-generated does not mean it represents some ground truth or reality. As already mentioned above, deepfakes have been possible for many years already.

If a system of flagging AI images with 95%+ accuracy (or even 99%+ accuracy) were to get significant traction, we would lull the public into a false sense of security.

So what does solve the Problem?

If watermarks don’t solve our problem, what does? In the short term, there really is no substitute for skepticism on the part of content consumers. It’s essential we don’t try to create a system that works “most” of the time but trains people to be even less skeptical of the content we see than they are today.

In the long term, though, it would be dire to think that we’ll never be able to establish the ground truth of any piece of evidence or the authorship of a piece of work.

Luckily, there are two types of longer-term solutions:

  1. Reinforced laws and penalties against fraudulent content (or, in the case of schools, rules). This doesn’t prevent bad actors, but it does dissuade them.
  2. Turn the technical problem on its head, and instead of looking to catch “fakes,” identify “real” images by tracking provenance. Then, by definition, anything that does not have a solid provenance trail has a higher risk of being a fake.

The former is essentially what China has done. Rather than mandating a particular technology, it has instituted laws that prohibit the posting of AI content without explicit declaration that it is AI-created. There are already laws on the books in many countries against the circulation of fraudulent images (though not AI) when they cause harm to others.

Different states may have different thresholds for what constitutes an offense, and many might argue that China’s rules are too draconian. Still, as a general approach, this system is already in place. Commit fraud that causes damages, and you risk being penalized. This is an important principle to uphold.

The problem with the legal approach is that it is a retroactive mechanism. Misinformation can be launched, do a tremendous amount of damage, and either never be detected or only detected long after the fact. By the time action is taken, it may be irrelevant or the situation irretrievable. Legal processes are also expensive and time-consuming. This makes them unsuitable for small disputes and violations. They also don’t remove the low-level unease of not knowing whether something you are looking for “in the moment” is real, fake, AI-generated, or a mix of the above.

The real solution to the problem of real-time identification, not only of AI but of fakes in general, lies in flipping the problem around: tracking the provenance of content you care about, not only flagging certain types.

Flipping the model: Tracking Provenance for Content

The concept of provenance was originally applied to art and other valuable works to refer to the known chain of ownership for an object. The same concept is now often applied to data:

The term “data provenance”, sometimes called “data lineage,” refers to a documented trail that accounts for the origin of a piece of data and where it has moved from to where it is presently. The purpose of data provenance is to tell researchers the origin, changes to, and details supporting the confidence or validity of research data. The concept of provenance guarantees that data creators are transparent about their work and where it came from and provides a chain of information where data can be tracked as researchers use other researchers’ data and adapt it for their own purposes.

(Definition from NNLM.gov)

In other words, it captures the origin of data and potentially the transformations and manipulations it may have gone through to reach its current state. (Note that for some, there is a distinction between a similar terms data lineage and data provenance, but for the purposes of this discussion, they could be used interchangably.)

While it’s now more common to apply the idea of provenance to data and content, it was new in the early 2000s. This is when I first stumbled across it through the work of people like Luke Moreau, who later went on to become one of the co-chairs of the W3C PROV Standard.

The idea of provenance is a statement about the origin of some data or (in our case) content. Where does an image, video, or piece of text come from? Who made it? How has it been manipulated on its journey to me? Knowing this information would give us most of what we need to know to determine whether we believe what the content suggests. It would also tell us how something was created, whether AI was involved and whether tools such as Photoshop were involved.

Provenance statements could be made in a multitude of ways, so this is a higher-level concept than a technical solution, such as watermarking, which tries specifically to embed a piece of meta-data into a piece of content.

Ideally, if we had trustworthy provenance data for a piece of content, then we’d be able to judge whether or not it was acceptable in the context we are using it for. Any content that didn’t have trustworthy provenance data would, by elimination, be dubious in terms of trust.

The question is, how could you do this at scale? Also, how could you be sure the provenance data was not itself fraudulent?

Unfortunately, this is where things get hard. To use provenance for content at scale, there are at least four things needed:

  1. Standard ways of encoding provenance information that can be associated with content in a straightforward way.
  2. A trusted manner for someone to attach provenance statements to a piece of content when they create or transform it.
  3. A trusted mechanism to verify and vouch for the identities of people making provenance statements.
  4. Mechanisms to automatically read provenance statements when content is being viewed or processed.

The first and fourth of these requirements are relatively straightforward. There are already meta-data standards for capturing provenance information. W3C PROV is the most well-known, though simpler models might be possible. Adding software to store, retrieve, and render provenance information is also not technically challenging. The primary challenge would be convergence on a standard, agreement on its use, and implementation.

Steps 2 and 3 are much harder.

These are very hard because our trust in a provenance statement will depend on our trust in the person making the statement. So, in effect, item #2 relies heavily on item #3. In many cases, we also won’t have direct relationships with the people creating a piece of content, so we’ll be dependent on chains of relationships between people we know (or trust for some particular reason - such as institutions, news organizations, etc.) and the source of a specific piece of content.

Which institutions we trust also depends heavily on who we are. Few people on the American liberal left trust Fox News. Few people on the Republican right trust CNN.

Possibly even more challenging, there is no globally agreed-upon online identity system. Many of us have accounts on a range of social media sites. Some of those may reflect the true “us”, and some may not. None of these sites is likely to feel like the right choice as the single source of truth as to who someone is.

There are also cases where one might want to make provenance statements about a piece of content without revealing a real-world identity, such as a valid government ID. Using a “less trusted” identity might weaken a piece of content as evidence, but maybe revealing more might compromise one’s own security.

In solving problem #3 (trusted identity), in addition to functioning technical infrastructure, there are two problems:

  1. Creating an identity network structure (or several) that allows open access to anyone but allows members to build trust/reputation by connections to others (endorsements, referrals, or other validations).
  2. Creating a bootstrapping mechanism that allows for the establishment of some trusted members that spread trust to less-validated members.

Existing social networks and public representations of identity, such as Facebook, X/Twitter, LinkedIn, and others, could serve as bootstrapping points for identity or be a form of identity network.

Any such identity network may end up being fragile and fragmented. There will be many nodes where real-world human identities are not known. Even if there is a mechanism for “vouching” for nodes you are connected to, the islands of connected users may not link up.

However, over time, it may be strong enough to build the crucial connection to step #2 in the process: making statements about provenance. The function of the identity network for provenance is to ground who is making a statement about a particular piece of content. Hence, once I have my identity - be it a Facebook ID or something new, I need to perform a new function:

State that I took a specific photograph on a street corner on May 13th 2023 and connect this to the image itself.  Or
State that I created a particular image using Midjourney and then edited it in Photoshop on May 26th 2023 and connect this to the image itself.

This information would likely be stored separately from the image, though some of it may be encoded in metadata packaged with the image. Ideally, the act of making the provenance statements would:

  • Create a link between the content and the provenance data such that one could be used to look up the other (bi-directional).
  • A digital signature is recorded that declares my agreement with the provenance statement.
  • Enable either anyone (if fully public) or a group of people in an approved set (if not public) to see the provenance data.

The process could be implemented via hosted servers using a registry-type architecture. The DNS system is a good example of a system like this, a number of highly neutral trusted servers act as root notes for the global domain name system. A Blockchain-based solution might however make even more sense. Blockchain already provides all the elements needed in the steps above and can store statements in an immutable, public way.

There are technical challenges in building these solutions, but by far, the biggest problem is in getting a groundswell of adoption. Existing technology platforms would need to support the protocols involved, and different bootstrapping campaigns would be needed to get users on board.

Over time, things could be made easier:

  • AI generators could automatically create provenance entries that can be published if desired.
  • The same for image tools such as Adobe Photoshop.
  • Cameras, video recorders, and phones could similarly do the same.
  • In turn, image-viewing tools could pull on the resulting provenance data.
  • News organizations, in particular, could both require and validate the provenance of content before publishing.

Done at scale, there is a risk that this starts to feel like a surveillance state of all content produced. However, there doesn’t need to be an obligation to report provenance data, only the knowledge that if you actively want to show proof of origin, you can use the framework.

Bad actors can, of course, publish false provenance information. However, this is where the trust network comes in. Who is making a statement becomes important.

An image submitted to the New York Times (for example) should pass a high bar before the New York Times chooses to republish it and attach its own provenance information to the chain.

AI Content Feeding AI Systems

The provenance architecture also brings back around to the third problem we mentioned in the introduction: that more of the content on the Internet might end up being AI-generated. This will almost certainly happen, and it’s unclear what effect this will have on training algorithms.

What seems obvious, however, is that if this content is wrong or misleading but presented as a correct fact, it very likely will have negative impacts, as do incorrect facts that humans have already posted.

There are already many incorrect statements on the Internet, as well as a lot of firmly held opinions that have little basis in fact. Great fiction that’s close to life is likely also highly confusing to an AI unless identified. Take most Daniel Suarez novels, for example (especially Daemon and its follow-up, or Kill Decision), read as pure text; it would be hard to determine if the events described had not happened despite them being Science Fiction.

As a result, we don’t just need AI watermarking (Daniel Suarez is human as far as we know) but, again, a more general method of identifying the provenance of a particular piece of content. For things on the Internet: Who wrote it? Who published it? When? Is it a factual account? A fiction? Has it been manipulated?

We can’t expect to relabel the entire Internet overnight, but the amount of content being created is accelerating. So the sooner we begin capturing content provenance, the sooner we’ll be able to clean up both AI training and our own use of Internet content.


Maintaining trust in content is an extremely hard problem, and whether AI is involved in creating something is a tangential issue to trust. This is why AI watermarking is, at best, a stopgap that might give us some information about certain types of content on certain platforms.

Much worse, relying on watermarking will not only mean giving the general public a false sense of security but also misses the more significant point: trust in content has to do with all aspects of its origin, not just the involvement of AI.

Building a functioning provenance network for public Internet content is a daunting (and maybe even impossible) task. Still, maybe ways will be found to start the effort off. Perhaps a blockchain-based solution for a narrow use case like digital art could grow into something more general. Perhaps existing social networks can be used to bootstrap a network of identities.

If you’re working on something related, I’d love to chat!

This turned out to be a very long post; thank you for hanging on until the end. Please comment and/or share if you found it useful!

Given that you can now make everyone smile in your photos if you have the latest-gen Google Pixel phone, Provenance tech can’t come soon enough!

Midjourney: A steel-bound, steampunk ledger showing rows of writing entries.

Thank you for reading SteampunkAI. This post is public, so feel free to share it.