The Ongoing Copyright Issue with Generative AI.

It’s a strange time. OpenAI (and Microsoft) are being sued by the New York Times and they’re claiming ‘Fair Use’ as if they’re having some coffee and discussing what they read in the New York Times, or are about to write a blog post about the entire published NYT archives, on demand.

It’s not just the New York Times, either. More and more authors are doing the same, or started before NYT.

IEEE’s article, “Generative AI has a Visual Plagiarism problem” demonstrates issues that back up the copyright claims. This is not regurgitation, this is not fair use – there is line by line text from the New York Times, amongst other things.

As I noted yesterday, OpenAI is making deals now for content and only caught this morning that, ‘The New York Times, too, had been in conversations with OpenAI to establish a “high-value” partnership involving “real-time display” of its brand in ChatGPT, OpenAI’s AI-powered chatbot.‘.

Clearly, discussions didn’t work out. I was going to link the New York Times article on it, but it seems I’ve used up my freebies so I can’t actually read it right now unless I subscribe.1 At this end of things, as a simple human being, I’m subject to paywalls for content, but OpenAI hasn’t been. If I can’t read and cite an article from the New York Times for free, why should they be able to?

On the other hand, when I get content that originated from news sites like the New York Times, there is fair use happening. People transform what they have read and regurgitate it, some more intellligibly than others, much like an artificial intelligence, but there is at least etiquette – linking the source, at the least. This is not something OpenAI does. It doesn’t give credit. It just inhales large amounts of text, the algorithms decide on the best ways to spit them out to answer prompts. Like blogging, only faster, and like blogging, sometimes it just makes stuff up.

This is not unlike a well read person doing the same. Ideas, thoughts, even memes are experiences we draw upon. What makes these generative artificial intelligences different? Speed. They also consume a lot more water, apparently.

The line has to be drawn somewhere, and since OpenAI isn’t living up to the first part of it’s name and is not being transparent, people are left poking a black box to see if their copyright has been sucked in without permission, mention, or recompense.

That does seem a bit like unfair use. This is not to say that the copyright system couldn’t use an overhaul, but apparently so could how generative AIs get their content.

What is that ‘Open’ in OpenAI mean, anyway?

  1. They do seem to have a good deal right now, I did try to subscribe but it failed for some obscure reason. I’ll try again later. $10 for a year of the New York Times is a definite deal, if only they could process my payment this morning. ↩︎

How Much AI In Journalism?

Recently, I’ve been active in a group on Facebook that is supposed to be a polite space to debate things. News articles fly around, and the news articles we see these days from different sources carry their own biases because rather than just presenting facts, they present narratives and narratives require framing.

I wondered how much of these articles were generated by what we’re calling artificial intelligence these days. In researching, I can tell you I’m still wondering – but I have found some things that are of interest.

The New York Times Lawsuit.

It’s only fair to get this out of the way since it’s short and sweet.

Of course, in the news now is the lawsuit that the New York Times has brought against Microsoft and OpenAI, where speculation runs rampant either way. To their credit, through that link, the New York Times presented things in as unbiased way as possible. Everyone’s talking about that, but speculation on that only really impacts investors and share prices. It doesn’t help people like me as much, who write their own content as individuals.

In an odd twist though, and not too long after the announcement, OpenAI is offering to pay for licensing news articles (paywalled), which you can read more about here if you’re not paying TheInformation1.

Either way, that lawsuit is likely not going to help my content stay out of a learning model because I just don’t have the lawyers. Speculating on it doesn’t get me anywhere.

How Much AI is being used?

Statista only has one statistic they cite in the amount of artificial intelligence used in media and entertainment: 78 percent of U.S. adults think news articles created by AI is not a good thing.

The articles there go on and tell us about the present challenges, etc, but one word should stand out from that: foggy.

So how would it be used, if it is used? With nearly 50 ‘news’ websites as of May last year, almost a year ago, and with one news site even going so far as having an AI news anchor as of late last year, we should have questions.

Well, we don’t really know how many news agencies are using artificial intelligence or how. One would think disclosure would be the issue then.

The arguments against disclosure are pretty much summed up below (an extract from a larger well balanced article).

Against disclosure

One concern is that it could stifle innovation. If news organisations are required to disclose every time they use AI, they may be less likely to experiment with the technology.

Another is that disclosure could be confusing for consumers. Not everyone understands how AI works. Some people may be suspicious of AI-generated content. Requiring disclosure could make it more difficult for consumers to get the information they need.

Should the media tell you when they use AI to report the news? What consumers should know, Jo Adetunji, Editor, TheConversationUK, TheConversation.com, November 14, 2023.

On the surface, the arguments make sense.

In my opinion, placing innovation over trust, which is the actual argument being made by some with that argument, is abhorrent. To innovate, one needs that trust and if you want that trust, it seems to me that the trust has to be earned. This, given the present state of news outlets in their many shades of truth and bias might seem completely alien to some.

I do encourage people to read that entire article because the framing of it here doesn’t do the article justice and I’ve simply expressed an opinion on one side of the arguments presented.

How Is AI presently used?

Again, we really don’t know because of the disclosure issue, but in October of last year Twipe published 10 ways that journalists are using artificial intelligence. It points from the onset to Klara Indernach, a tag used by Express.de to note when an article is created with artificial intelligence tools.

Arist von Harpe is cited in the article for saying, “We do not highlight AI-aided articles. We’re only using [AI] as a tool. As with any tool, it’s always the person using it who is responsible for what comes out.” This seems a reasonable position, and puts the accountability on the humans related to it. I have yet to see artificial intelligences be thrown under the bus for an incorrect article, so we have that landmark to look for.

The rest of that article is pretty interesting and mentions fact checking, which is peculiar given the prevalence of hallucinations and even strategic deception, as well as image generation, etc.

We’ll never really know.

In the end, I imagine the use of any artificial intelligence in newsrooms is evolving even as I write this and will be evolving well beyond when you read this. In a few years, it may not be as much of a big deal, but now we’re finding failures in artificial intelligences all the way to a court, in a matter that is simply fraught with political consequences. They were quick to throw Google Bard under the bus on that one.

It is still disturbing we don’t have much insight into the learning models being used, which is a consistent problem. The lawsuit of the New York Times seems to be somewhat helpful there.

I honestly tried to find out what I could here and in doing so came up with my own conclusion that wasn’t what I would have expected it to be.

In the end, it is as Arist von Harpe is cited. We have to judge based on the stories we get because every newsroom will do things differently. It would have helped if we had less room to speculate on biases before the creation of these artificial intelligence tools, and whoever screws up should lose some trust. In this day and age, though, feeding cognitive biases seems to trump trust.

That’s probably the discussion we should have had some time ago.

  1. These paywalls are super-annoying for we mere mortals who do not have the deep pockets of corporate America. How many subscriptions is a well informed person supposed to have? It’s gotten ridiculous. We’ve known that business models for news have been in such trouble that a ‘news story’ has a more literal definition these days, but… surely we can do better than this? ↩︎

Social Networks, Privacy, Revenue and AI.

I’ve seen more and more people leaving Facebook because their content just isn’t getting into timelines. It’s an interesting thing to consider the possibilities of. While some of the complaints about the Facebook algorithms are fun to read, it doesn’t really mean too much to write those sort of complaints. It’s not as if Facebook is going to change it’s algorithms over complaints.

As I pointed out to people, people using Facebook aren’t the customers. People using Twitter-X aren’t the customers either. To be a customer, you have to buy something. Who buys things on social networks? Advertisers are one, of course.

That’s something Elon Musk didn’t quite get the memo on. Why would he be this confidence? Hubris? Maybe, that always seems a factor, but it’s probably something more sensible.

Billionaires used to be much better spoken, it seems.

There’s something pretty valuable in social networks that people don’t see. It’s the user data, which is strangely what the canceled West World was about. The real value is in being able to predict what people want and influence outcomes, much as the television series showed after the first season.1

Many people seem to think that privacy is only about credit card information and personal details. It also includes choices that allow algorithms to predict choices. Humans are black boxes in this regard, and if you have enough computing power you can go around poking and prodding to see the results.

Have you noticed that these social networks are linked somehow to AI initiatives? Through Meta, Facebook is linked to AI initiatives of Meta. Musk, chief twit at X, has his fingers in the AI pie too.

Artificial intelligences need learning models, and if you own a social network, you not only get to poke and prod – you have the potential to influence. Are your future choices something that fall under privacy? Probably not – but your past choices probably should be because that’s how you get to predicting and influencing future choices.

I never really got into Twitter. Facebook was less interruptive. On the surface, these started off as content management systems that provided a service and had paid advertising to support it, yet now one has to wonder at the value of the user data. Back in 2018, Cambridge Analytics harvested data from 50 million Facebook users. Zuckerberg later apologized, and talked about how 3rd party apps would be limited. To his credit, I think it was handled pretty well.

Still, it also signaled how powerful and useful that data could be and if you own a social network, that would at least give you pause. After all, Cambridge Analytics influenced politics at the least, and that could have also influenced markets. The butterfly effect reins supreme in the age of big data and artificial intelligence.

This is why privacy is important in the age of artificial intelligence learning models, algorithms, and so forth. It can impact the responses one gets from any large language model, which is why there are pretty serious questions regarding privacy, copyright, and other things related to training them. Bias leaks into everything, and popular bias on social networks is simply about the most vocal and repetitive – not about what is actually correct. This is also why canceling as a culture phenomenon can also be so damaging. It’s a nuclear option in the world of information, and oddly, large groups of smart or stupid people can use it with impunity.

This is why we see large language models hedge on some questions presently, because of conflicts within the learning model as well as some well designed algorithms. In that we should be a little grateful.

We should probably lobbying to find out what is in these learning models that artificial intelligences are given in much the same way we used2 to grill people who would represent us collectively. Sure, Elon Musk might be taking a financial hit, but what if it’s a gambit to leverage user data for bigger returns later with his ethics embedded in how he gets his companies to do that?

You don’t have to like or dislike people to question them and how they use this data, but we should all be a bit concerned. Yes, artificial intelligence is pretty cool and interesting, but unleashed without question of the integrity of the information trained on is at the least foolish.

Be careful what you share, what you say, who you interact with and why. Quizzes that require access to your user profile are definitely questionable, as that information and information of people you are connected with quickly get folded into data creating a digital shadow of yourself, part of the larger crowd that can influence the now and the future.

  1. This is not to say it was canceled for this reason. I only recently watched it, and have yet to finish season 3, but it’s very compelling and topical content for the now. Great writing and acting. ↩︎
  2. We don’t seem to be that good at it grilling people these days, perhaps because of all of this and more. ↩︎