Emergent AI
Posts
Copyright on the ins and outs

Copyright on the ins and outs

Protecting artists and ourselves

Ilan Man
August 18, 2025

I believe people need to get paid for the insights, effort, and creativity they put out, especially when it’s their livelihood and a market exists for it. Call me old fashioned.

So where does AI fit in? As usual, everywhere.

Most applicable meme of 2025

The New York Times made a deal with Amazon to pay them roughly $20 million per year to license their news, cooking and the Athletic sections for Amazon’s AI¹. This follows examples (more about citing sources than training, but still) from Conde Nast and Time (gotta stay relevant beyond Person of the Year) and others.
Cloudflare² blocked AI web crawlers from freely accessing a website in their network, without paying first.
On the other end of the spectrum, the Trump administration’s AI Action Plan makes it clear they are okay with AI developers training on copyrighted material (“fair use”) but apparently the details are murky and this is more of a vibe.

These deals raise interesting questions: Should AI companies be allowed to train their models on content they didn’t pay for? What constitutes AI plagiarism? Let’s dive in.

Copyright on the output

I’m not going to get into copyright law and all the nuances therein when discussing AI. That’s way too complicated, and I’m no expert (or even a novice) on the matter. I want to lay out what the discussion is, and offer some thoughts.

The easiest, and least controversial (though not completely uncontroversial) view is that AI should not be allowed to mimic existing content as part of its output. That is, when I ask AI “tell me something interesting about John D. Rockefeller” and the output is exactly, word for word, lifted from Ron Chernow’s biography of him, that’s straight up plagiarism, which we don’t like. Most people that I’ve read, or talked with, agree that we shouldn’t allow this…in principle!

What happens though when the AI learns a ton (based on its training data) and creates, completely independently from any specific input, something that feels / sounds / looks eerily similar to someones copyrighted IP? If we’re saying it’s okay for AI to train on the works of Ron Chernow, are we saying it’s okay if it also spits out something that is 99%³ similar to his work? What if the AI never actually trains on his stuff, but there’s enough derivative work, critiques, plays, etc. related to his content, that the AI can reproduce his work? Is that okay because what are the chances? That’s going to be hard to regulate.

But sure, we can agree, in principle, we would prefer if AI doesn’t plagiarize.

More tricky than the output, though, is monitoring the input - the training data.

Copyright on the input

I believe creators should have the ability to require payment for their content, and that it’s defaulted to “pay for training”. Creators can turn it off if they choose - and there are plenty of great reasons why they may choose to give away training content for free. The key, though, is that it’s their choice. This will lead to a marketplace and, as a proponent of markets, it will self regulate (though the government definitely has a place here, as it does in every market).

While I’m not an AI⁴, I can read a lot of content online for free. I also sometimes choose to pay for content. There’s a whole complex, interconnected system undergirding this stuff⁵, and it’s not as simple as I’m making it out. But in essence, that’s the divide - I directly pay for access to some content, and I don’t for others. And some of that is up to me, and some is up to the content creator (and their distribution system, etc). I believe AI companies should follow a similar approach - pay when creators require it, and don’t pay when they allow training for free.

I appreciate there are ramifications here:

It creates a two-tier system, where some creators (e.g. NY Times, Reddit) will be compensated for their content, whereas others (e.g. this newsletter) will not.
Similarly, not every AI company can afford to pay creators (at scale), and they’ll lose out when their models are worse than the big players who can afford to pay for training data.
Creators may start to create content specifically for AI models, not people, in order to make more money.⁶
Lots of regulatory and compliance issues arise, especially across jurisdictions and geographies.

Ultimately, we aren’t going to stop AI companies from getting access to content for training (though as mentioned before, we’re running out of data) so the best we can do is build in mechanisms to help support content creators, such as:

Opt-out clauses
Make it easier for anyone to monetize their content (collective agreements or micro-payments)
A repository of datasets that AI companies use to train their data so we (e.g. a computer) can map it against a “safe list” of content. If they train on your data, and you opt-out, you should know this. Again, simplifying things drastically.

Such a great performance that I’m willing to overlook his terrible Russian accent.

All of this needs effective regulation - companies won’t do this themselves, though many may participate if there is a governmental initiative.

But China!

AI Zoomers (and Trump’s AI Action Plan) will say that we need less regulation and more freedom for AI models to train and run wild (I’m paraphrasing) because China, America’s nemesis du jour⁷, surely won’t care about copyright protection, enabling their AI models to train faster, on a broader set of data, and ultimately propel them ahead in the AI race.⁸

Funny how on the one hand, we need to beat them at AI, and on the other, let’s keep selling them AI chips, and take a cut of the sales. I guess it just takes a lot of lobbying (helps that it’s from the CEO of a $4+ trillion market cap company). Has the US government ever allowed a private company to sell product in foreign markets on the condition of a kickback, when the government wasn’t part of the development of said product? I digress.

Maybe? But also, while it’s important to think strategically about AI, as we would rare-earth minerals, nuclear, military, intelligence, etc. I also appreciate living in a society that values its creators. This is why it’s important to fund the military and other programs to keep us safe, and at the same time, to fund healthcare, childcare, education services, and other programs that raise our quality of life. Artist endeavors are absolutely part of that set of quality of life raisers. I don’t want to live in a bland, Spartan society (though the mandatory weekly Spartan Races would be a fun twist). The challenge of course, is balancing the two.

Heuristics help

In general, when I need to grapple with problems this large (copyright law is an entire branch of law in itself) I reach for heuristics⁹:

Creators should get paid for their work. It’s not only morally right, but it ensures that people continue creating valuable (to someone) work in the future.
AI will be used in positive and negative ways - let’s ambition for more positive outcomes instead of throwing our hands up and accepting what’s coming.
Organizations whose mission is to improve the well being of the public, and whose work often comes at a cost to shareholders (via higher costs or lower prices), should get special treatment. How special? To what end? I don’t know.
I’m pro-innovation. Technological advancements have been net positive for humanity.¹⁰
I’m willing to accept some degree of slowdown¹¹ in AI if to helps ensure we don’t leave folks behind, and are heading the right direction.¹²
There are bad actors out there - both foreign and domestic. Recent history has shown that norms don’t matter.¹³ So we need to invest in oversight, and not make any assumptions, especially as it relates to institutions operating in good faith.

This clearly leaves us with open questions (sorry - I told you I wasn’t an expert in this stuff), but hopefully helps to frame what we’re talking about when we talk about copyright, creators, and AI training. It’s a hairy topic, and we all need to educate and equip ourselves to debate it so we aren’t left hold the bag (of words).¹⁴

¹ If this seems like a small fee for Amazon to safely (from a litigation perspective) and freely use NY Times content, then trust you instincts. Amazon (AMZN) made roughly $20 billion in 2023 and $30 billion in 2024 (albeit saw a net loss in 2022). So $25 million is slightly less than 0.1% of it’s profits. PROFITS! Let alone Revenue….

² Super quick: Cloudflare is one of those companies you’ve never heard of, but is part of the backbone of the internet. They provide security, reliability (against malicious attacks) and makes websites load faster. So its a big deal when they say “pay our customers (the websites) if you want to crawl their websites”.

³ Super quick - what % likeness would you feel is sufficiently different? Also, equally quick, how can you quantify this? Do we ask AI? Wait, I thought this was the easy one to agree on!

⁴ You’ll have to take my word for it. But then again, isn’t that what ChatGPT would say?

⁵ See: the internet runs on ads, which is why some stuff seems free

⁶ This is not a new phenomenon - SEO anyone?

⁷ I’m taking a very liberal (wide!) view of “jour”

⁸ Let’s put aside the incremental improvement in our ability to stay safe by “winning” the AI race (spoiler: “winning” the AI race is a fabrication).

⁹ Cousins of the pedestrian, rules of thumbs, and more erudite-sounding, principles

¹⁰ net is playing an important role here.

¹¹ And maybe we should all be okay with slowing down the pace of AI innovation

¹² This is where I keep leaning on the concept of velocity. Briefly, velocity is a combination of speed and direction. We want to not simply move fast; we want to make sure we’re heading in the right direction. Sometimes that means slowing down.

¹³ Very sad.

¹⁴ For my NLP and ML nerds. Too much?

Reply

or to participate.