One of the things that has bothered me most about ChatGPT is that it’s data was scraped from the Internet, where a fair amount of writing I have done resides. It would be hubris to think that what I wrote is so awesome that it could be ‘stealing’ from me, but it would also be idiotic to think that content ChatGPT produces isn’t derivative in a legal sense. In a world almost critically defined by self-preservation, I think we all should know where the line is. We don’t, really, but we should.
I’m no lawyer, but I’ve had my own ‘fun’ with copyright.
In fact, New Tech Observations from the UK (ntouk) seems to have caught ChatGPT lifting the plot of Alice in Wonderland without any attribution. There are legal issues here that seem to have been ignored in most of the hype, where even reusing content from ChatGPT could be seen as contributing to the infringement.
That hasn’t really stopped anyone since most people don’t seem to take copyright seriously unless they work for an organization that takes copyright seriously, and even when they do take copyright seriously, it’s only within specific contexts. This is why I point out where I have used a large language model such as ChatGPT for anything, since I’m citing it citing nobody – and even then, I don’t use it for generating content other than some interesting images.
Entities with deep pockets are protected by their deep pockets, but the average person writing on the Internet has less deep pockets – and there are more of us. I’ve had content ‘borrowed’ without attribution. It can range from mildly amusing to outrage, particularly when some schmuck just borrowed to create a popular post without citation so that they could ‘produce’ content that they didn’t actually produce. And Copyright is implicit.
Privacy is a partner to Copyright as well. I’m wondering when the question will be raised about text scraped for these training models by some publishers that deal mainly with text rather than images – because the image lawsuits are happening.
For now, I suppose, don’t put anything online that you wouldn’t want anyone regurgitating without attribution.
🤔 I would want to believe that most experienced writers and bloggers know what plagiarism is and would not willfully indulge themselves in it.
In regards to ChatGPT, I also wondered if my content was scraped during the time it was being trained.
Thanks for another great post, Taran.
A friend of mine reacted on Facebook to this post and mentioned that he’d like the robots.txt file to be honored by *anything* scraping the web.