The Ongoing Copyright Issue with Generative AI.

It’s a strange time. OpenAI (and Microsoft) are being sued by the New York Times and they’re claiming ‘Fair Use’ as if they’re having some coffee and discussing what they read in the New York Times, or are about to write a blog post about the entire published NYT archives, on demand.

It’s not just the New York Times, either. More and more authors are doing the same, or started before NYT.

IEEE’s article, “Generative AI has a Visual Plagiarism problem” demonstrates issues that back up the copyright claims. This is not regurgitation, this is not fair use – there is line by line text from the New York Times, amongst other things.

As I noted yesterday, OpenAI is making deals now for content and only caught this morning that, ‘The New York Times, too, had been in conversations with OpenAI to establish a “high-value” partnership involving “real-time display” of its brand in ChatGPT, OpenAI’s AI-powered chatbot.‘.

Clearly, discussions didn’t work out. I was going to link the New York Times article on it, but it seems I’ve used up my freebies so I can’t actually read it right now unless I subscribe.1 At this end of things, as a simple human being, I’m subject to paywalls for content, but OpenAI hasn’t been. If I can’t read and cite an article from the New York Times for free, why should they be able to?

On the other hand, when I get content that originated from news sites like the New York Times, there is fair use happening. People transform what they have read and regurgitate it, some more intellligibly than others, much like an artificial intelligence, but there is at least etiquette – linking the source, at the least. This is not something OpenAI does. It doesn’t give credit. It just inhales large amounts of text, the algorithms decide on the best ways to spit them out to answer prompts. Like blogging, only faster, and like blogging, sometimes it just makes stuff up.

This is not unlike a well read person doing the same. Ideas, thoughts, even memes are experiences we draw upon. What makes these generative artificial intelligences different? Speed. They also consume a lot more water, apparently.

The line has to be drawn somewhere, and since OpenAI isn’t living up to the first part of it’s name and is not being transparent, people are left poking a black box to see if their copyright has been sucked in without permission, mention, or recompense.

That does seem a bit like unfair use. This is not to say that the copyright system couldn’t use an overhaul, but apparently so could how generative AIs get their content.

What is that ‘Open’ in OpenAI mean, anyway?

  1. They do seem to have a good deal right now, I did try to subscribe but it failed for some obscure reason. I’ll try again later. $10 for a year of the New York Times is a definite deal, if only they could process my payment this morning. ↩︎

How Much AI In Journalism?

Recently, I’ve been active in a group on Facebook that is supposed to be a polite space to debate things. News articles fly around, and the news articles we see these days from different sources carry their own biases because rather than just presenting facts, they present narratives and narratives require framing.

I wondered how much of these articles were generated by what we’re calling artificial intelligence these days. In researching, I can tell you I’m still wondering – but I have found some things that are of interest.

The New York Times Lawsuit.

It’s only fair to get this out of the way since it’s short and sweet.

Of course, in the news now is the lawsuit that the New York Times has brought against Microsoft and OpenAI, where speculation runs rampant either way. To their credit, through that link, the New York Times presented things in as unbiased way as possible. Everyone’s talking about that, but speculation on that only really impacts investors and share prices. It doesn’t help people like me as much, who write their own content as individuals.

In an odd twist though, and not too long after the announcement, OpenAI is offering to pay for licensing news articles (paywalled), which you can read more about here if you’re not paying TheInformation1.

Either way, that lawsuit is likely not going to help my content stay out of a learning model because I just don’t have the lawyers. Speculating on it doesn’t get me anywhere.

How Much AI is being used?

Statista only has one statistic they cite in the amount of artificial intelligence used in media and entertainment: 78 percent of U.S. adults think news articles created by AI is not a good thing.

The articles there go on and tell us about the present challenges, etc, but one word should stand out from that: foggy.

So how would it be used, if it is used? With nearly 50 ‘news’ websites as of May last year, almost a year ago, and with one news site even going so far as having an AI news anchor as of late last year, we should have questions.

Well, we don’t really know how many news agencies are using artificial intelligence or how. One would think disclosure would be the issue then.

The arguments against disclosure are pretty much summed up below (an extract from a larger well balanced article).

Against disclosure

One concern is that it could stifle innovation. If news organisations are required to disclose every time they use AI, they may be less likely to experiment with the technology.

Another is that disclosure could be confusing for consumers. Not everyone understands how AI works. Some people may be suspicious of AI-generated content. Requiring disclosure could make it more difficult for consumers to get the information they need.

Should the media tell you when they use AI to report the news? What consumers should know, Jo Adetunji, Editor, TheConversationUK, TheConversation.com, November 14, 2023.

On the surface, the arguments make sense.

In my opinion, placing innovation over trust, which is the actual argument being made by some with that argument, is abhorrent. To innovate, one needs that trust and if you want that trust, it seems to me that the trust has to be earned. This, given the present state of news outlets in their many shades of truth and bias might seem completely alien to some.

I do encourage people to read that entire article because the framing of it here doesn’t do the article justice and I’ve simply expressed an opinion on one side of the arguments presented.

How Is AI presently used?

Again, we really don’t know because of the disclosure issue, but in October of last year Twipe published 10 ways that journalists are using artificial intelligence. It points from the onset to Klara Indernach, a tag used by Express.de to note when an article is created with artificial intelligence tools.

Arist von Harpe is cited in the article for saying, “We do not highlight AI-aided articles. We’re only using [AI] as a tool. As with any tool, it’s always the person using it who is responsible for what comes out.” This seems a reasonable position, and puts the accountability on the humans related to it. I have yet to see artificial intelligences be thrown under the bus for an incorrect article, so we have that landmark to look for.

The rest of that article is pretty interesting and mentions fact checking, which is peculiar given the prevalence of hallucinations and even strategic deception, as well as image generation, etc.

We’ll never really know.

In the end, I imagine the use of any artificial intelligence in newsrooms is evolving even as I write this and will be evolving well beyond when you read this. In a few years, it may not be as much of a big deal, but now we’re finding failures in artificial intelligences all the way to a court, in a matter that is simply fraught with political consequences. They were quick to throw Google Bard under the bus on that one.

It is still disturbing we don’t have much insight into the learning models being used, which is a consistent problem. The lawsuit of the New York Times seems to be somewhat helpful there.

I honestly tried to find out what I could here and in doing so came up with my own conclusion that wasn’t what I would have expected it to be.

In the end, it is as Arist von Harpe is cited. We have to judge based on the stories we get because every newsroom will do things differently. It would have helped if we had less room to speculate on biases before the creation of these artificial intelligence tools, and whoever screws up should lose some trust. In this day and age, though, feeding cognitive biases seems to trump trust.

That’s probably the discussion we should have had some time ago.

  1. These paywalls are super-annoying for we mere mortals who do not have the deep pockets of corporate America. How many subscriptions is a well informed person supposed to have? It’s gotten ridiculous. We’ve known that business models for news have been in such trouble that a ‘news story’ has a more literal definition these days, but… surely we can do better than this? ↩︎

Google and The New York Times: A New Path?

A few days ago I mentioned the normalization of Web 2.0, and yesterday I ended up reading about The New York Times getting around $100 million over a period of 3 years from Google.

“…The deal gives the Times an additional revenue driver as news publishers are bracing for an advertising-market slowdown. The company posted revenue of $2.31 billion last year, up 11% from a year earlier. It also more than offsets the revenue that the Times is losing after Facebook parent Meta Platforms last year told publishers it wouldn’t renew contracts to feature their content in its Facebook News tab. The Wall Street Journal at the time reported that Meta had paid annual fees of just over $20 million to the Times…”

New York Times to Get Around $100 Million From Google Over Three Years, Wall Street Journal, May 8th, 2023.

That’s a definite punch in the arm for The New York Times, particularly with the ad revenue model that Web 2.0 delivered from. Will it lower the paywall to their articles? No idea.

This is a little amusing because just on May 2nd, the New York Times called out Google’s lack of follow through on defunding advertising related alongside content denying climate hoaxes.

Still, it may demonstrate the move to a more solid model for actual trusted sources of news, and that could be a good thing for all of us. Maybe.

Personally, if it reduces paywalled content while allowing the New York Times to be critical of those that hand them money… well. It could be a start.