The Ongoing Copyright Issue with Generative AI.

It’s a strange time. OpenAI (and Microsoft) are being sued by the New York Times and they’re claiming ‘Fair Use’ as if they’re having some coffee and discussing what they read in the New York Times, or are about to write a blog post about the entire published NYT archives, on demand.

It’s not just the New York Times, either. More and more authors are doing the same, or started before NYT.

IEEE’s article, “Generative AI has a Visual Plagiarism problem” demonstrates issues that back up the copyright claims. This is not regurgitation, this is not fair use – there is line by line text from the New York Times, amongst other things.

As I noted yesterday, OpenAI is making deals now for content and only caught this morning that, ‘The New York Times, too, had been in conversations with OpenAI to establish a “high-value” partnership involving “real-time display” of its brand in ChatGPT, OpenAI’s AI-powered chatbot.‘.

Clearly, discussions didn’t work out. I was going to link the New York Times article on it, but it seems I’ve used up my freebies so I can’t actually read it right now unless I subscribe.1 At this end of things, as a simple human being, I’m subject to paywalls for content, but OpenAI hasn’t been. If I can’t read and cite an article from the New York Times for free, why should they be able to?

On the other hand, when I get content that originated from news sites like the New York Times, there is fair use happening. People transform what they have read and regurgitate it, some more intellligibly than others, much like an artificial intelligence, but there is at least etiquette – linking the source, at the least. This is not something OpenAI does. It doesn’t give credit. It just inhales large amounts of text, the algorithms decide on the best ways to spit them out to answer prompts. Like blogging, only faster, and like blogging, sometimes it just makes stuff up.

This is not unlike a well read person doing the same. Ideas, thoughts, even memes are experiences we draw upon. What makes these generative artificial intelligences different? Speed. They also consume a lot more water, apparently.

The line has to be drawn somewhere, and since OpenAI isn’t living up to the first part of it’s name and is not being transparent, people are left poking a black box to see if their copyright has been sucked in without permission, mention, or recompense.

That does seem a bit like unfair use. This is not to say that the copyright system couldn’t use an overhaul, but apparently so could how generative AIs get their content.

What is that ‘Open’ in OpenAI mean, anyway?

  1. They do seem to have a good deal right now, I did try to subscribe but it failed for some obscure reason. I’ll try again later. $10 for a year of the New York Times is a definite deal, if only they could process my payment this morning. ↩︎

Strategic Deception, AI, and Investors.

‘Strategic deception’ in large language models is indeed a thing. It should be unsurprising. After all, people do it all the time when trying to give the answer that is wanted by the person asking the question.

Large Language Models are designed to… give the answer wanted by the person asking the question.

That there had to be a report on this is a little disturbing. It’s the nature of the Large Language Model algorithms.

Strategic deception is at the very least one form of AI Hallucination, which potentially reinforces biases that we might to think twice about. Like Arthur Juliani, I believe the term ‘hallucinate’ is misleading, and I believe we’re seeing a shift away from that. Good.

It’s also something I simply summarize as ‘bullshitting’. It is, after all, just statistics, but it’s statistics toward an end, which makes the statistics pliable enough for strategic deception.

It’s sort of like AI investors claiming ‘Fair Use’ when not paying for copyrighted materials in the large language models. If they truly believe that, it’s a strategic deception on themselves. If they wanted to find a way, they could, and they still may.

The Future Of Technology and Society (May 2016)

FutureIf you’re one of those who likes tl;dr, skip this post and find a tweet to read.

It has been bothering me. There are a bunch of very positive articles out there that do not touch on the problems we face in technology.

What I mean by this is that, since the early 1980s, I have been voraciously reading up on the future and plotting my own course through it as I go through long, dark tea-times of my career. It allows me to land where things are interesting to me, or where I can make a living for a while as I watch things settle into place. I’ve never been 100% accurate, but I have never starved and have done well enough even in 3rd world countries without advanced infrastructure or policy. Over the course of decades, I have adapted and found myself attempting to affect policies that I found limiting – something most people don’t really care about.

Today, we’re in exciting times. We have the buzz phrases of big data, deep learning and artificial intelligence floating around as if they were all something new rather than things that have advanced and have been re-branded to make them more palatable. Where in the 1990s the joke was that, “We have a pill for that!”, these days the joke is, “We have an app for that!”. As someone who has always striven to provide things of use to the world, I shook my head when flatulence apps went to war for millions of dollars.

Social networks erupted where people willingly give up their privacy to get things for ‘free’. A read of Daniel Solove’s 10 year old book, The Digital Person: Technology and Privacy in the Information Age, should have woken people up in 2006, but by then everyone was being trained to read 140 characters at a time and ‘tl;dr’ became a thing. I am pleased you made it this far, gentle reader, please continue.

Big Data

All these networks collect the big data. They have predicted pregnancies from shopping habits and been sued for it (Feb 2012). There’s a pretty good list of 10 issues with Big Data and Privacy – here’s some highlights (emphasis mine):

1. Privacy breaches and embarrassments.
2. Anonymization could become impossible.
3. Data masking could be defeated to reveal personal information.
4. Unethical actions based on interpretations.
5. Big data analytics are not 100% accurate.
6. Discrimination.
7. Few (if any) legal protections exist for the involved individuals.
8. Big data will probably exist forever.
9. Concerns for e-discovery.
10. Making patents and copyrights irrelevant.

Item 4, to me, is the largest one – coupled with 5 and 7, it gets downright ugly. Do you want people to make judgements about you based on interpretations of the data that aren’t 100% accurate, and where you have no legal protections?

Instead, the legal framework is biased towards those that collect the data – entities known as corporations (you may have heard of them) – through a grouping of disparate ideas known as intellectual property. In fact, in at least one country I know of, a database can be copyrighted (Trinidad and Tobago) even though the information in it isn’t new. Attempts are being made by some to make things better, but in the end they become feeble – if not brittle – under a legal system that is undeniably swayed by whoever has the most money.

If it sounds like I’m griping – 10 years ago I would have been. This is just a statement of fact at this point. I did what I could to inform over the years, as did others, but ultimately the choice was not that of a well informed minority but that of a poorly informed majority.

Deep Learning / Artificial Intelligence

Deep learning allows amazing things to be done with data. There is no question of that; I’ve played with it myself and done my own analyses on things I have been working on in my ‘spare time’ (read: I have no life). There’s a lot of hypotheses that can come from big data, but it’s the outliers within the big data that are actually the meat of any hypothesis.

In English, the exceptions create the rules which further define what needs to be looked at. For outliers in the data can mean that another bit of data needs to be added to the mix.

Artificial Intelligence (AI), on the other hand, can incorporate deep learning and big data. While an AI may not be able to write a news article that can fool an editor, I imagine it could fool the reading public. This is particularly true since, because of the income issues related to the Internet, media outlets have gone to pulp opinionated pieces instead of the factual news that used to inform rather than attempt to sway or just get more reads by echoing a demographic’s sentiment. Then it is shared by people of like-minded people on social media. It’s an epic charlie-foxtrot. 

People worry about jobs and careers in all of this with robots and AI, and a lot of white collar folks are thinking it will affect those in the blue collar jobs alone. No, it will not. There is an evolution taking place (some call it a revolution), and better paid white collar jobs are much juicier for saving money for people who care only about their stock price. 5 white collar jobs are already under the gun.

KFC and McDonalds have already begun robotizing. More are coming.

And then let’s discuss Ethics in the implementation of AI – look at what Microsoft did with their Twitter-bot, Tay. We have a large corporation putting an alleged AI (chatbot, whatever you want to call it) into a live environment without a thought to the consequences. Granted, it seemed like a simple evolution of Eliza (click the link to see what that means), but you don’t just let your dog off it’s leash or your AI out in an uncontrolled environment. It’s just not done, particularly in an environment where kids need ‘safe places’ and others need trigger warnings. If they didn’t have an army of lawyers – another issue with technology – they probably would have had their pants shaken severely in Courts across the world. Ahh, but they do have an army of well paid lawyers – which leads us to Intellectual Property.
Space Marines: Into the Future

Copyrights, Patents and Trademarks (and Privacy)

If you haven’t read anything about Copyright by Lawrence Lessig in the past decade, or Privacy by Daniel Solove, you’re akin to an unlicensed, blindfolded teenager joy riding in your Mom’s Corvette ZR1. Sure, things might be fun, but it’s a matter of time unless you’re really, really lucky. You shouldn’t be allowed near a computing device without these prerequisites because you’re uninformed. This is not alarmist. This is your reality.

And anyone writing code without this level of familiarity is driving an 18 wheeler in much the same way.

You need a lawyer just to flush a virtual toilet these days. I exaggerate to make the point – but maybe not. It would depend on who owns the virtual toilet.

You can convert any text into a patent application. Really.

Meanwhile, Patent trolls are finally seen as harming innovation. The key point here is that the entire system is biased toward those with more in the bank – which means that small companies are destroyed while the larger companies, such as Google and Oracle, have larger legal battles that impact more people than even know about it. Even writing software tools has become a legal battle between the behemoths.

‘Fair Use’ – the ability to use things you bought in ways that allow you to keep copies of them – has all but been lost in all of this.

Meanwhile, Wounded Warrior – an alleged veteran’s non-profit – has been suing other non-profits because of use of the phrase, ‘Wounded Warrior’. If you want to take the nice approach, they’re trying to avoid dilution of their trademark… at the cost of veterans themselves, but that doesn’t explain them suing two of their own former employees with PTSD.

And Here I Am, Wondering About The Future.

There are a bunch of very positive articles out there that do not touch on the problems we face in technology. Our technology is presently being held for ransom by legal frameworks that do not fit well; this in turn means our ability to innovate, and by proxy entrepreneurship, are also being held ransom. Meanwhile we have people running around with Stockholm Syndrome waiting for the next iPhone hand built by suicidal workers, or the next application that they can open their private data to (hi, Google, Microsoft!), or…

I can’t predict anything at this point. It used to be much simpler and, by proxy, easily controlled. The questions of whether to do something used to be an ethical question, but now we go to lawyers for ethics (a group that is largely not known for ethics – apologies to those who do). The governments institute policies biased by whoever funds the campaigns of politicians, or gives United States congress people nice things. It affects the entire world, and every few years I think it won’t last – it continues.

Too big to fail.

But out of all of this, I don’t mean to stop trying. I don’t mean to stop innovating, starting new businesses, etc. What I mean is – we have a lot of things to do properly to assure a future that isn’t as dim as I see it now, to assure that the kids who are hooked on realities that someone else created rather than what they imagined. Imagination itself needs to be revisited, cultivated and unleashed against all of this like a cool wind across the desert.

It cannot be done blindly. People need to understand all of this. And if you made it this far – congratulations – I offer that you should, if not share this, share the ideas within it freely rather than simply clicking ‘like’ and hoping for the best.

We cannot change things on our own.

As for myself – just surfing the waves as they come in, but I fully intend to build my house on a distant shore at this point.