The recent news of Stack Overflow selling it’s content to OpenAI was something I expected. It was a matter of time. Users of Stack Overflow were surprised, which I am surprised by, and upset, which I’m not surprised by.
That seems to me a reasonable response. Who wouldn’t? Yet when we contribute to websites for free on the Internet and it’s not our website, it’s always a terrible bargain. You give of yourself for whatever reason – fame, prestige, or just sincerely enjoying helping, and it gets traded into cash by someone else.
But companies don’t want you to get wise. They want you to give them your content for free so that they can tie a bow around it and sell it. You might get a nice “Thank you!” email, or little awards of no value.
No Good Code Goes Unpunished.
The fallout has been disappointing. People have tried logging in and sabotaging their top answers. I spoke to one guy on Mastodon a few days ago and he got banned. It seems pretty obvious to me that they had already backed up the database where all the stuff was, and that they would be keeping an eye on stuff. Software developers should know that. There was also some confusion about the Creative Commons licensing the site uses versus the rights given to the owners of the website, which are mutually exclusive.
Is it slimy? You bet. It’s not new, and the companies training generative AI have been pretty slimy. The problem isn’t generative AI, it’s the way the companies decide to do business by eroding trust with the very market for their product while poisoning wells that they can no longer drink from. If you’re contributing answers for free that will be used to train AI to give the same answers for a subscription, you’re a silly person1.
These days, generative AI companies need to put filters on the front of their learning models to keep small children from getting sucked in.
Remember Huffington Post?
Huffington Post had this neat little algorithm for swapping around headlines til it found one that people liked, it gamed SEO, and it built itself into a powerhouse that almost no one remembers now. It was social, it was quirky, and it was fun. Volunteers put up lots of great content.
When Huffingpost sold for $315 million, the volunteers who provided the content for free and built the site up before it sold sued – and got nothing. Why? Because they had volunteered their work.
I knew a professional journalist who was building up her portfolio and added some real value – I met her at a conference in Chicago probably a few months before the sale, and I asked her why she was contributing to HuffPost for free. She said it was a good outlet to get some things out – and she was right. When it sold, she was angry. She felt betrayed, and rightfully so I think.
It seems people weren’t paying attention to that. I did2.
You live, you learn, and you don’t do it again. With firsthand and second hand experience, if I write on a website and I don’t get paid, it’s my website. Don’t trust anyone who says, “Contribute and good things will happen!”. Yeah, they might, but it’s unlikely it will happen for you.
If your content is good enough for a popular site, it’s good enough to get paid to be there. You in the LinkedIn section – pay attention.
Back To AI’s Intake Manifold.
I’ve written about companies with generative AI models scraping around looking for content, with contributed works to websites being a part of the training models. It’s their oil, it’s what keeps them burning through cash as they try to… replace the people whose content they use. In return, the Internet gets slop generated all over, and you’ll know the slop when you read it – it lacks soul and human connection, though it fakes it from time to time like the pornographic videos that make the inexperienced think that’s what sex is really like. Nope.
The question we should be asking is whether it’s worth putting anything on the Internet at this point, just to have it folded into a statistical algorithm that chews up our work and spits out something like it. Sure, there are copyright lawsuits happening. The argument of transformative works doesn’t really work that well in a sane mind when it comes to the exponentially higher amount of content used to create a generative AI at this point.
So what happens when less people contribute their own work? One thing is certain: the social aspect of the Internet will not thrive as well.
Social.
The Stack Overflow website was mainly an annoyance for me over the years, but I understand that many people had a thriving society of a sort there. It was largely a meritocracy, as open source, at least at it’s core. You’ll note that I’m writing of it in the past tense – I don’t think anyone with any bit of self-worth will contribute there anymore.
The annoyance aspect for me came from (1) Not finding solutions to the quirky problems that people hired me to solve3, and (2) Finding code fragments I tracked down to Stack Overflow poorly (if at all) adapted to the employer or client needs. I also had learned not to give away valuable things for free, so I didn’t get involved. Most, if not all, of the work I did required my silence on how things worked, and if you get on a site like StackOverflow – your keyboard might just get you in trouble. Yet the problem wasn’t the site itself, but those who borrowed code like it was a cup of sugar instead of a recipe.
Beyond we software engineers, developers, whatever they call themselves these days, there are a lot of websites with social interaction that are likely getting their content shoved into an AI learning model at some point. LinkedIn, owned by Microsoft, annoyingly in the top search results, is ripe for being used that way.
LinkedIn doesn’t pay for content, yet if you manage to get popular, you can make money off of sponsored posts. “Hey, say something nice about our company, here’s $x”. That’s not really social, but it’s how ‘influencers’ make money these days: sponsored posts. When you get paid to write posts in that way, you might be selling your soul unless you keep a good moral compass, but when bills need to get paid, that moral compass sometimes goes out the window. I won’t say everyone is like that, I will say it’s a danger and why I don’t care much about ‘influencers’.
In my mind, anyone who is an influencer is trying to sell me something, or has an ego so large that Zaphod Beeblebrox would be insanely jealous.
Regardless, to get popular, you have to contribute content. Who owns LinkedIn? Microsoft. Who is Microsoft partnered with? OpenAI. The dots are there. Maybe they’re not connected. Maybe they are.
Other websites are out there that are building on user content. The odds are good that they have more money for lawyers than you do, that their content licensing and user agreement work for them and not you, and if someone wants to buy that content for any reason… you’ll find out what users on Stack Overflow found out.
All relationships are built on trust. All networks are built on trust. The Internet is built on trust.
The Internet is eating itself.
- I am being kind. ↩︎
- I volunteered some stuff to WorldChanging.com way back when with the understanding it would be Creative Commons licensed. I went back and forth with Alex and Jamais, as did a few other contributors, and because of that and some nastiness related to the Alert Retrieval Cache, I walked away from the site to find out from an editor that contacted me about their book that they wanted to use some of my work. Nope. I don’t trust futurists, and maybe you shouldn’t either. ↩︎
- I always seemed to be the software engineer that could make sense out of gobblygook code, rein it in, take it to water and convince it to drink. ↩︎