Paying To Whitewash The Fence of AI.

I suppose a lot of people may not have read Tom Sawyer since it has been banned here and there. Yet there is a part of the book that seems really appropriate today and is unfortunate people didn’t read. It’s a great con.

It’s in Chapter 2 that Tom Sawyer gets punished and has to whitewash a fence for his Aunt Polly, and when mocked about his punishment by another boy, he claims whitewashing the fence is fun. It’s so fun, in fact, that the other kid gives Tom an apple (an initial offer was the apple core, I believe), and so Tom pulled this con on other kids and got their treasures while they painted the fence. He got ‘rich’ and had fun at their expense while they did his penance.

That’s what’s happening with social media like Facebook, LinkedIn, Twitter, etc.

Videos, text, everything being generated on these social networks is being used to train generative AI that you can use for free – at least for now – while others pay and subscribe to get the better trained versions.

It’s a pretty good con that I suppose people didn’t read about. It’s a classic con.

Some people will complain when the AI’s start taking over whitewashing the fences, or start whitewashing their children.

Meanwhile, these same companies are selling metaphorical paint and brushes.

I suppose this is why reading is important.

Oddly, the premise of the ban of “The Adventures of Tom Sawyer” was “when librarians said they found Mr. Sawyer to be a “questionable” protagonist in terms of his moral character.”

Happy Painting.

From Inputs to The Big Picture: An AI Roundup

This started off as a baseline post regarding generative artificial intelligence and it’s aspects and grew fairly long because even as I was writing it, information was coming out. It’s my intention to do a ’roundup’ like this highlighting different focuses as needed. Every bit of it is connected, but in social media postings things tend to be written of in silos. I’m attempting to integrate since the larger implications are hidden in these details, and will try to stay on top of it as things progress.

It’s long enough where it could have been several posts, but I wanted it all together at least once.

No AI was used in the writing, though some images have been generated by AI.

The two versions of artificial intelligence on the table right now – the marketed and the reality – have various problems that make it seem like we’re wrestling a mating orgy of cephalopods.

The marketing aspect is a constant distraction, feeding us what helps with stock prices and good will toward those implementing the generative AIs, while the real aspect of these generative AIs is not really being addressed in a cohesive way.

To simplify this, this post breaks it down into the Input, the Output, and the impacts on the ecosystem the generative AIs work in.

The Input.

There’s a lot that goes into these systems other than money and water. There’s the information used for the learning models, the hardware needed, and the algorithms used.

The Training Data.

The focus so far has been on what goes into their training data, and that has been an issue including lawsuits, and less obviously, trust of the involved companies.

…The race to lead A.I. has become a desperate hunt for the digital data needed to advance the technology. To obtain that data, tech companies including OpenAI, Google and Meta have cut corners, ignored corporate policies and debated bending the law, according to an examination by The New York Times…

How Tech Giants Cut Corners to Harvest Data for A.I.“, Cade MetzCecilia KangSheera FrenkelStuart A. Thompson and Nico Grant, New York Times, April 6, 2024 1

Of note, too, is that Google has been indexing AI generated books, which is what is called ‘synthetic data’ and has been warned against, but is something that companies are planning for or even doing already, consciously and unconsciously.

Where some of these actions are questionably legal, they’re not as questionably ethical to some, thus the revolt mentioned last year against AI companies using content without permission. It’s of questionable effect because no one seems to have insight into what the training data consists of, and there seems no one is auditing them.

There’s a need for that audit, if only to allow for trust.

…Industry and audit leaders must break from the pack and embrace the emerging skills needed for AI oversight. Those that fail to address AI’s cascading advancements, flaws, and complexities of design will likely find their organizations facing legal, regulatory, and investor scrutiny for a failure to anticipate and address advanced data-driven controls and guidelines.

Auditing AI: The emerging battlefield of transparency and assessment“, Mark Dangelo, Thomson Reuters, 25 Oct 2023.

While everyone is hunting down data, no one seems to be seriously working on oversight and audits, at least in a public way, though the United States is pushing for global regulations on artificial intelligence at the UN. The status of that hasn’t seemed to have been updated, even as artificial intelligence is being used to select targets in at least 2 wars right now (Ukraine and Gaza).

There’s an imbalance here that needs to be addressed. It would be sensible to have external auditing of learning data models and the sources, as well as the algorithms involved – and just get get a little ahead, also for the output. Of course, these sorts of things should be done with trading on stock markets as well, though that doesn’t seem to have made as much headway in all the time that has been happening either.

Some websites are trying to block AI crawlers, and it is an ongoing process. Blocking them requires knowing who they are and doesn’t guarantee bad actors might not stop by.

There is a new Bill that being pressed in the United States, the Generative AI Copyright Disclosure Act, that is worth keeping an eye on:

“…The California Democratic congressman Adam Schiff introduced the bill, the Generative AI Copyright Disclosure Act, which would require that AI companies submit any copyrighted works in their training datasets to the Register of Copyrights before releasing new generative AI systems, which create text, images, music or video in response to users’ prompts. The bill would need companies to file such documents at least 30 days before publicly debuting their AI tools, or face a financial penalty. Such datasets encompass billions of lines of text and images or millions of hours of music and movies…”

New bill would force AI companies to reveal use of copyrighted art“, Nick Robins-Early, TheGuardian.com, April 9th, 2024.

Given how much information is used by these companies already from Web 2.0 forward, through social media websites such as Facebook and Instagram (Meta), Twitter, and even search engines and advertising tracking, it’s pretty obvious that this would be in the training data as well.

The Algorithms.

The algorithms for generative AI are pretty much trade secrets at this point, but one has to wonder at why so much data is needed to feed the training models when better algorithms could require less. Consider a well read person could answer some questions, even as a layperson, with less of a carbon footprint. We have no insight into the algorithms either, which makes it seem as though these companies are simply throwing more hardware and data at the problem than being more efficient with the data and hardware that they already took.

There’s not much news about that, and it’s unlikely that we’ll see any. It does seem like fuzzy logic is playing a role, but it’s difficult to say to what extent, and given the nature of fuzzy logic, it’s hard to say whether it’s implementation is as good as it should be.

The Hardware

Generative AI has brought about an AI chip race between Microsoft, Meta, Google, and Nvidia, which definitely leaves smaller companies that can’t afford to compete in that arena at a disadvantage so great that it could be seen as impossible, at least at present.

The future holds quantum computing, which could make all of the present efforts obsolete, but no one seems interested in waiting around for that to happen. Instead, it’s full speed ahead with NVIDIA presently dominating the market for hardware for these AI companies.

The Output.

One of the larger topics that has seemed to have faded is regarding what was called by some as ‘hallucinations’ by generative AI. Strategic deception was also something that was very prominent for a short period.

There is criticism that the algorithms are making the spread of false information faster, and the US Department of Justice is stepping up efforts to go after the misuse of generative AI. This is dangerous ground, since algorithms are being sent out to hunt products of other algorithms, and the crossfire between doesn’t care too much about civilians.2

The impact on education, as students use generative AI, education itself has been disrupted. It is being portrayed as an overall good, which may simply be an acceptance that it’s not going away. It’s interesting to consider that the AI companies have taken more content than students could possibly get or afford in the educational system, which is something worth exploring.

Given that ChatGPT is presently 82% more persuasive than humans, likely because it has been trained on persuasive works (Input; Training Data), and since most content on the internet is marketing either products, services or ideas, that was predictable. While it’s hard to say how much content being put into training data feeds on our confirmation biases, it’s fair to say that at least some of it is. Then there are the other biases that the training data inherits through omission or selective writing of history.

There are a lot of problems, clearly, and much of it can be traced back to the training data, which even on a good day is as imperfect as our own imperfections, it can magnify, distort, or even be consciously influenced by good or bad actors.

And that’s what leads us to the Big Picture.

The Big Picture

…For the past year, a political fight has been raging around the world, mostly in the shadows, over how — and whether — to control AI. This new digital Great Game is a long way from over. Whoever wins will cement their dominance over Western rules for an era-defining technology. Once these rules are set, they will be almost impossible to rewrite…

Inside the shadowy global battle to tame the world’s most dangerous technology“, Mark Scott, Gian Volpicelli, Mohar Chatterjee, Vincent Manancourt, Clothilde Goujard and Brendan Bordelon, Politico.com, March 26th, 2024

What most people don’t realize is that the ‘game’ includes social media and the information it provides for training models, such as what is happening with TikTok in the United States now. There is a deeper battle, and just perusing content on social networks gives data to those building training models. Even WordPress.com, where this site is presently hosted, is selling data, though there is a way to unvolunteer one’s self.

Even the Fediverse is open to data being pulled for training models.

All of this, combined with the persuasiveness of generative AI that has given psychology pause, has democracies concerned about the influence. A recent example is Grok, Twitter X’s AI for paid subscribers, fell victim to what was clearly satire and caused a panic – which should also have us wondering about how we view intelligence.

…The headline available to Grok subscribers on Monday read, “Sun’s Odd Behavior: Experts Baffled.” And it went on to explain that the sun had been, “behaving unusually, sparking widespread concern and confusion among the general public.”…

Elon Musk’s Grok Creates Bizarre Fake News About the Solar Eclipse Thanks to Jokes on X“, Matt Novak, Gizmodo, 8 April 2024

Of course, some levity is involved in that one whereas Grok posting that Iran had struck Tel Aviv (Israel) with missiles seems dangerous, particularly when posted to the front page of Twitter X. It shows the dangers of fake news with AI, deepening concerns related to social media and AI and should be making us ask the question about why billionaires involved in artificial intelligence wield the influence that they do. How much of that is generated? We have an idea how much it is lobbied for.

Meanwhile, Facebook has been spamming users and has been restricting accounts without demonstrating a cause. If there were a video tape in a Blockbuster on this, it would be titled, “Algorithms Gone Wild!”.

Journalism is also impacted by AI, though real journalists tend to be rigorous in their sources. Real newsrooms have rules, and while we don’t have that much insight into how AI is being used in newsrooms, it stands to reason that if a newsroom is to be a trusted source, they will go out of their way to make sure that they are: They have a vested interest in getting things right. This has not stopped some websites parading as trusted sources disseminating untrustworthy information because, even in Web 2.0 when the world had an opportunity to discuss such things at the World Summit on Information Society, the country with the largest web presence did not participate much, if at all, at a government level.

Then we have the thing that concerns the most people: their lives. Jon Stewart even did a Daily Show on it, which is worth watching, because people are worried about generative AI taking their jobs with good reason. Even as the Davids of AI3 square off for your market-share, layoffs have been happening in tech as they reposition for AI.

Meanwhile, AI is also apparently being used as a cover for some outsourcing:

Your automated cashier isn’t an AI, just someone in India. Amazon made headlines this week for rolling back its “Just Walk Out” checkout system, where customers could simply grab their in-store purchases and leave while a “generative AI” tallied up their receipt. As reported by The Information, however, the system wasn’t as automated as it seemed. Amazon merely relied on Indian workers reviewing store surveillance camera footage to produce an itemized list of purchases. Instead of saving money on cashiers or training better systems, costs escalated and the promise of a fully technical solution was even further away…

Don’t Be Fooled: Much “AI” is Just Outsourcing, Redux“, Janet Vertesi, TechPolicy.com, Apr 4, 2024

Maybe AI is creating jobs in India by proxy. It’s easy to blame problems on AI, too, which is a larger problem because the world often looks for something to blame and having an automated scapegoat certainly muddies the waters.

And the waters of The Big Picture of AI are muddied indeed – perhaps partly by design. After all, those involved are making money, they have now even better tools to influence markets, populations, and you.

In a world that seems to be running a deficit when it comes to trust, the tools we’re creating seem to be increasing rather than decreasing that deficit at an exponential pace.

  1. The full article at the New York Times is worth expending one of your free articles, if you’re not a subscriber. It gets into a lot of specifics, and is really a treasure chest of a snapshot of what companies such as Google, Meta and OpenAI have been up to and have released as plans so far. ↩︎
  2. That’s not just a metaphor, as the Israeli use of Lavender (AI) has been outed recently. ↩︎
  3. Not the Goliaths. David was the one with newer technology: The sling. ↩︎

Facebook’s Algorithms Spamming Users.

If you haven’t left Facebook yet, as I have, you’ve probably noticed a lot of AI spam. I did when I was there and blocked a bunch of it (it was hard to keep up with).

Well, it isn’t just you.

“…What is happening, simply, is that hundreds of AI-generated spam pages are posting dozens of times a day and are being rewarded by Facebook’s recommendation algorithm. Because AI-generated spam works, increasingly outlandish things are going viral and are then being recommended to the people who interact with them. Some of the pages which originally seemed to have no purpose other than to amass a large number of followers have since pivoted to driving traffic to webpages that are uniformly littered with ads and themselves are sometimes AI-generated, or to sites that are selling cheap products or outright scams. Some of the pages have also started buying Facebook ads featuring Jesus or telling people to like the page “If you Respect US Army.”…”

Facebook’s Algorithm Is Boosting AI Spam That Links to AI-Generated, Ad-Laden Click Farms“, Jason Koebler, 404 Media, March 19, 2024

So not only are the algorithms arbitrarily restricting user accounts, as they did mine, but they’re feeding people with spam to an extent that it wasn’t just noticeable to an individual.

Meanwhile, Facebook has been buying GPUs to develop ‘next level’ AI, when in fact their algorithms are about as gullible as their GPU purchases are numerous.

Glad I left that platform.

The Battle For Your Habits.

Found floating around today in the wild. As an atheist that doesn’t use Chrome, I know he ain’t talking to me.

There are some funny memes going around about TikTok and… Chinease spyware, or what have you. The New York Times had a few articles on TikTok last week that were interesting and yet… missed a key point that the memes do not.

Being afraid of Chinese Spyware while so many companies have been spying on their customers seems more of a bias than anything.

Certainly, India got rid of TikTok and has done better for it. Personally, I don’t like giving my information to anyone if I can help it, but these days it can’t be helped. Why is TikTok an issue in the United States?

It’s not too hard to speculate that it’s about lobbyism of American tech companies who lost the magic sauce for this generation. It’s also not hard to consider India’s reasoning about China being able to push their own agenda, particularly with violence on their shared borders.

Yet lobbying from the American tech companies is most likely, because they want your data and don’t want you to give it to China. They want to be able to sell you stuff based on what you’ve viewed, liked, posted, etc. So really, it’s not even about us.

It’s about the data that we give away daily when browsing social networks of any sort, websites, or even when you think you’re being anonymous using Google Chrome when in fact you’re still being tracked. The people who are advocating banning TikTok aren’t holding anyone else’s feet to the fire, instead using the ‘they will do stuff with your information’ when in fact we’ve had a lot of bad stuff happen with our information over the years.

Found circulating as a meme, which lead me to check out StoneToss.com – some really great work there.

Since 9/11, in particular, the US government has taken a pretty big interest in electronic trails, all in the interest in National Security, with the FBI showing up after the Boston Marathon bombing just because people were looking at pressure cookers.

All of this information will get possibly get poured into learning models for artificial intelligences, too. Even WordPress.com volunteered people’s blogs rather than asked for volunteers.

What value do you get for that? They say you get better advertising, which is something that I boggle at. Have you ever heard anyone wish that they could see better advertising rather than less advertising?

They say you get the stuff you didn’t even know you wanted, and to a degree, that might be true, but the ability to just go browse things has become a lost art. Just about everything you see on the flat screen you’re looking at is because of an algorithm deciding for you what you should see. Thank you for visiting, I didn’t do that!

Even that system gets gamed. This past week I got a ‘account restriction’ from Facebook for reasons that were not explained other than instructions to go read the community standards because algorithms are deciding based on behaviors that Facebook can’t seem to explain. Things really took off with that during Covid, where even people I knew were spreading some wrong information because they didn’t know better and, sometimes, willfully didn’t want to know better or understand their own posts in a broader context.

Am I worried about TikTok? Nope. I don’t use it. If you do use TikTok, you should. But you should worry if you use any social network. It’s not as much about who is selling and reselling information about you as much as what they can do with it to control what you see.

Of course, most people on those platforms don’t see them for what they are, instead taking things at face value and not understanding the implications it has on choices they will have in the future that could range from advertising to content that one views.

China’s not our only problem.

The Supreme Court, Your Social Network, and AI

One of the ongoing issues that people maybe haven’t paid as much attention to is related to the United States Supreme Court and social networks.

That this has a larger impact than just within the United States takes a little bit of understanding. Still, we’ll start in the United States and what started the ball rolling.

“A majority of the Supreme Court seemed wary on Monday of a bid by two Republican-led states to limit the Biden administration’s interactions with social media companies, with several justices questioning the states’ legal theories and factual assertions.

Most of the justices appeared convinced that government officials should be able to try to persuade private companies, whether news organizations or tech platforms, not to publish information so long as the requests are not backed by coercive threats….”

Supreme Court Wary of States’ Bid to Limit Federal Contact With Social Media Companies“, Adam Liptak, New York Times, March 18, 2024

This deals with the last United States Presidential Election, and we’re in an election year. It also had a lot to do with the response to Covid-19 and a lot of false information that was spread, and even there we see arguments about about whether the government should be the only one spreading false information.

Now I’ll connect this to the rest of the planet. Social networks, aside from the 800lb Chinese Gorilla (TikTok) are mainly in the United States. Facebook. The Social Network formerly known as Twitter. So the servers all fall under US jurisdiction.

Let’s pull that 800 lb Chinese Gorilla back in the ring too, where that political issue of TikTok is at odds with who is collecting data from who, since the Great Firewall of China keeps China in China but lets the data from the world go to their government.

Why is that data important? Because it’s being used to train Artificial Intelligences. It’s about who trains their artificial intelligence’s faster, really.

Knock the dust off this old tune.

Even WordPress.com, where this site is presently hosted, got into the game by volunteering it’s customers before telling them how not to volunteer.

The Supreme Court is supposed to have the last say on all matter of things, and because of that there’s a level of ethics assumed of the members – which John Oliver dragged under a spotlight. Let’s just say: there are questions.

It’s also worth noting that in 2010, the U.S. Supreme Court decided that money was free speech. This means, since technology companies lobby and support politicians, the social networks you use have more free speech than the users combined based on their income alone – not to mention their ability to choose what you see, what you can say, and who you can say it to by algorithms that they can’t seem to master themselves. In a way that’s heartening, in a way it’s sickening.

So, the Supreme Court ruling on issues of whether the United States government’s interference in social networks is also about who collects the data, and what sort of information will be used to train artificial intelligences of the present and future.

The dots are all there, but it seems like people don’t really understand that this isn’t as much a fight for individual freedom of speech as it is about deciding what future generations will be told.

Even more disturbing now is just how much content is AI generated on the Internet, which has already been noted to be a significant amount, and is estimated to be 90% by some experts by 2026.

So who should control what you can post? Should governments decide? Should technology companies?

These days, few trust either. It seems like we need oversight on both, which will never happen on a planet where everybody wants to rule the world. Please fasten your seat-belts.

Social Networks, Privacy, Revenue and AI.

I’ve seen more and more people leaving Facebook because their content just isn’t getting into timelines. It’s an interesting thing to consider the possibilities of. While some of the complaints about the Facebook algorithms are fun to read, it doesn’t really mean too much to write those sort of complaints. It’s not as if Facebook is going to change it’s algorithms over complaints.

As I pointed out to people, people using Facebook aren’t the customers. People using Twitter-X aren’t the customers either. To be a customer, you have to buy something. Who buys things on social networks? Advertisers are one, of course.

That’s something Elon Musk didn’t quite get the memo on. Why would he be this confidence? Hubris? Maybe, that always seems a factor, but it’s probably something more sensible.

Billionaires used to be much better spoken, it seems.

There’s something pretty valuable in social networks that people don’t see. It’s the user data, which is strangely what the canceled West World was about. The real value is in being able to predict what people want and influence outcomes, much as the television series showed after the first season.1

Many people seem to think that privacy is only about credit card information and personal details. It also includes choices that allow algorithms to predict choices. Humans are black boxes in this regard, and if you have enough computing power you can go around poking and prodding to see the results.

Have you noticed that these social networks are linked somehow to AI initiatives? Through Meta, Facebook is linked to AI initiatives of Meta. Musk, chief twit at X, has his fingers in the AI pie too.

Artificial intelligences need learning models, and if you own a social network, you not only get to poke and prod – you have the potential to influence. Are your future choices something that fall under privacy? Probably not – but your past choices probably should be because that’s how you get to predicting and influencing future choices.

I never really got into Twitter. Facebook was less interruptive. On the surface, these started off as content management systems that provided a service and had paid advertising to support it, yet now one has to wonder at the value of the user data. Back in 2018, Cambridge Analytics harvested data from 50 million Facebook users. Zuckerberg later apologized, and talked about how 3rd party apps would be limited. To his credit, I think it was handled pretty well.

Still, it also signaled how powerful and useful that data could be and if you own a social network, that would at least give you pause. After all, Cambridge Analytics influenced politics at the least, and that could have also influenced markets. The butterfly effect reins supreme in the age of big data and artificial intelligence.

This is why privacy is important in the age of artificial intelligence learning models, algorithms, and so forth. It can impact the responses one gets from any large language model, which is why there are pretty serious questions regarding privacy, copyright, and other things related to training them. Bias leaks into everything, and popular bias on social networks is simply about the most vocal and repetitive – not about what is actually correct. This is also why canceling as a culture phenomenon can also be so damaging. It’s a nuclear option in the world of information, and oddly, large groups of smart or stupid people can use it with impunity.

This is why we see large language models hedge on some questions presently, because of conflicts within the learning model as well as some well designed algorithms. In that we should be a little grateful.

We should probably lobbying to find out what is in these learning models that artificial intelligences are given in much the same way we used2 to grill people who would represent us collectively. Sure, Elon Musk might be taking a financial hit, but what if it’s a gambit to leverage user data for bigger returns later with his ethics embedded in how he gets his companies to do that?

You don’t have to like or dislike people to question them and how they use this data, but we should all be a bit concerned. Yes, artificial intelligence is pretty cool and interesting, but unleashed without question of the integrity of the information trained on is at the least foolish.

Be careful what you share, what you say, who you interact with and why. Quizzes that require access to your user profile are definitely questionable, as that information and information of people you are connected with quickly get folded into data creating a digital shadow of yourself, part of the larger crowd that can influence the now and the future.

  1. This is not to say it was canceled for this reason. I only recently watched it, and have yet to finish season 3, but it’s very compelling and topical content for the now. Great writing and acting. ↩︎
  2. We don’t seem to be that good at it grilling people these days, perhaps because of all of this and more. ↩︎

The Invisible Future.

Joe McKendrick, senior contributor at Forbes.com, predicts that artificial intelligence will fade into the background.

It sort of already has, as even he points out in his article.

That, you see, is the trouble. We don’t know the training models for these artificial intelligences, we don’t know what biases are inherent in it, and we’re at the mercy of whoever is responsible for these artificial intelligences. We’re hoping that they’re thoughtful and considerate and not more concerned with money than people.

That really hasn’t worked out so well for us in the past. Yet the present is here in all it’s glory, unrepentant. It’s happening more obviously now with the news since next year we get artificial news anchors. It’s being used to fight misinformation on social media platforms like Facebook without even explaining to Facebook users why posts are removed and what they contained that was worth removing them for. It’s here and has been here for a while.

Weirder still is the fact that even Facebook’s algorithms aren’t catching deepfake videos with consequences in Bangladesh.

Pandora’s box has been opened, and the world will never quite be the same again. Archimedes once talked about having a lever long enough.

Nowadays it’s just a matter of a choice of fulcrum.

Democracy, based on the idea that informed people can make informed choices in their own interest and the common good, could easily become misDemocracy, where the misinformed make misinformed choices that they think is in their own interests and what they think is the common good.

The Best Way To Avoid Spreading Misinformation

It’s likely at some point we’ve all spread some misinformation involuntarily. It can have dire consequences, too. Washington Post has an article on misinformation but they forgot the most important thing, I think.

Waiting.

‘Trusted sources’ has been a problem that I’ve been studying since we were working on the Alert Retrieval Cache. In an actual emergency, knowing which information you can trust from the ground and elsewhere is paramount. I remember Andy Carvin asking me how Twitter could be used for the same and I shook my head, explaining the problem that no one seemed to want to listen to: The problem is that an open network presents problems with flawed information getting accepted as truth.

Credentialism is a part of the problem. We expect experts to be all-knowing when in fact being an expert itself has no certification. It requires being right before, all the while we want right now and unfortunately the truth doesn’t work that way.

We see a story on social media and we share it, sometimes without thinking, which is why bad news travels faster than good news.1

The easiest way to avoid spreading misinformation is to do something we’re not very good at in a society that pulses like a tachycardic heart: We wait and see what happens. We pause, and if we must pass something along to our social networks, we say we’re not sure it’s real, but since headlines are usually algorithm generated to catch eyes and to spread them like Covid-19, we have to read the stories and check the facts before we share rather than share off the cuff.

Somewhere along the line, the right now trumped being right, and we see it everywhere. By simply following a story before sharing it, you can stop spreading misinformation and stop the virus of misinformation in it’s tracks. Let the story develop. See where it goes. Don’t jump in immediately to write about it when you don’t actually know much about it.

Check news sources for the stories. Wait for confirmation. If it’s important enough to post, point out that it’s unconfirmed.

It’s that simple.

  1. There’s a pun or two in there. ↩︎

The Quiet Misery of Content Mediators: Sama.

When I first read about content moderators spoke of psychological trauma in moderating Big Tech’s content for training models 2 weeks ago, I waited for the other shoe to drop. Instead, aside from a BBC mention related to Facebook, the whole thing seems to have dropped off the radar of the media.

The images pop up in Mophat Okinyi’s mind when he’s alone, or when he’s about to sleep.

Okinyi, a former content moderator for Open AI’s ChatGPT in Nairobi, Kenya, is one of four people in that role who have filed a petition to the Kenyan government calling for an investigation into what they describe as exploitative conditions for contractors reviewing the content that powers artificial intelligence programs.

“It has really damaged my mental health,” said Okinyi.

The 27-year-old said he would would view up to 700 text passages a day, many depicting graphic sexual violence. He recalls he started avoiding people after having read texts about rapists and found himself projecting paranoid narratives on to people around him. Then last year, his wife told him he was a changed man, and left. She was pregnant at the time. “I lost my family,” he said.

‘It’s destroyed me completely’: Kenyan moderators decry toll of training of AI models“, Niamh Rowe, The Guardian, August 2nd, 2023.

I expected more on this because it’s… well, it’s terrible to consider, especially for $1.46 and $3.74 an hour through Sama. Sama is a data annotation services company headquartered in California that employs content moderators around the world. As their homepage says, “25% of Fortune 50 companies trust Sama to help them deliver industry-leading ML models”.

Thus, this should be a bigger story, I think, but since it’s happening outside of the United States and Europe, it probably doesn’t score big with the larger media houses. The BBC differs a little in that regard.

A firm which was contracted to moderate Facebook posts in East Africa has said with hindsight it should not have taken on the job.

Former Kenya-based employees of Sama – an outsourcing company – have said they were traumatised by exposure to graphic posts.

Some are now taking legal cases against the firm through the Kenyan courts.

Chief executive Wendy Gonzalez said Sama would no longer take work involving moderating harmful content.

Firm regrets taking Facebook moderation work“, Chris Vallance, BBC News, August 15th 2023.

The CEO of Sama says that they won’t be taking further work related to harmful content. The question then becomes whether something is harmful content or not, so there’s no doubt in my mind that Sama is in a difficult position itself. She points out that Sama has ‘lifted 65,000 people out of poverty’.

Of course, global poverty is decreasing while economic disparity is increasing – something that keeps being forgotten and says much about how the measurement of global poverty is paralyzed while the rest of the world moves on.

The BBC article also mentions the OpenAI issue mentioned in The Guardian article mentioned above.

We have global poverty, economic disparity, big tech and the dirty underbelly of AI training models and social media moderation…

This is something we should all be following up on, I think. It seems like ‘lifting people out of global poverty’ is big business, in it’s own way, too, and that is just a little bit disturbing.