linguistic bias

_web_studying ourselves Once upon a time as a Navy Corpsman in the former Naval Hospital in Orlando, we lost a patient for a period – we simply couldn’t find them. There was a search of the entire hospital. We eventually did find her but it wasn’t by brute force. It was by recognizing what she had come in for and guessing that she was on LSD. She was in a ladies room, staring into the mirror, studying herself through a sensory filter that she found mesmerizing. What she saw was something only she knows, but it’s safe to say it was a version of herself, distorted in a way only she would be able to explain.

I bring this up because as a species, many of us connected to our artificial nervous system are fiddling with ChatGPT, and what we are seeing are versions of our society in a mirror.

As readers, what we get out of it has a lot of what we bring to it. As we query it, we also get out of it what we ask of it through the filters of how it was trained and it’s algorithms, the reflexes we give it. Is it sentient? Of course not, these are just large language models and are not artificial general intelligences.

With social media companies, we have seen the effect of the social media echo chambers as groups become more and more isolated despite being more and more connected, aggregating to make it easier to sell advertising to. This is not to demonize them, many bloggers were doing it before them, and before bloggers there was the media, and before then as well. It might be amusing if we found out that cave paintings were actually advertising for someone’s spears or some hunting consulting service, or it might be depressing.

All of this cycled through my mind yesterday as I began considering the role of language itself with it’s inherent bias based on an article that stretched it to large language models and artificial intelligence. The actual study was just about English and showed a bias toward addition, but with ChatGPT and other large language models being the current advertising tropism, it’s easy to understand the intention of linking the two in an article.

Regardless of intention, there is a point as we stare into the societal mirror of large language models. The training data will vary, languages and cultures vary, and it’s not hard to imagine that every language, and every dialect, has some form of bias. It might be a good guess that where you see a lot of bureaucracy, there is linguistic bias and that can get into a chicken and egg conversation: Did the bias exist before the language, or did the language create the bias? Regardless, it can reinforce it.

fake hero dog Then I came across this humorous meme. It ends up being a legitimate thing that happened. The dog was rewarded with a steak for saving the life of a child from drowning and quickly came to the conclusion that pulling children out of water got it steak.

Apparently not enough children were falling into water for it to get steaks, so it helped things along. It happened in 1908, and Dr. Pavlov was still alive during this. His famous derived work with dogs was published in 1897, about 11 years prior, but given how slow news traveled then it wasn’t as common knowledge as we who have internet access would expect. It’s possible the New York Times article mentioned him, but I didn’t feel like unlocking their paywall.

If we take this back to society, we have seen the tyranny of fake news propagation. That’s nothing new either. What is interesting is the paywall aspect, where credible news is hidden behind paywalls leaving the majority of the planet to read what is available for free. This is a product of publishing adaptation to the Internet age, which I lived through and which to an extent I gained some insight into when I worked for Linux Journal’s parent company, SSC. The path from print to internet remains a very questionable area because of how advertising differs between the two media.

Are large language models being trained on paywalled information as well? Do they have access to academic papers that are paywalled? What do they have access to?

What parts of ourselves are we seeing through these mirrors? Then we have to ask whether the large language models have access to things that most humans don’t, and based on who is involved, it’s not hard to come to a conclusion where the data being fed to them by these companies isn’t available for consumption for the average person. Whether that is true or not is up for debate.

All of this is important to consider as we deal with these large language models, yet the average person plays with them as a novelty, unaware of the biases. How much should we trust what comes out of them?

As far as disruptive technologies go, this is probably the largest we have seen since the Internet itself. As long as it gives people what they want, and it supports cognitive biases, it’s less likely to be questioned. Completely false articles propagate on the Internet still, there are groups of people who seriously believe that the Earth is flat, and we have people asking ChatGPT things that they believe are important. I even saw someone in a Facebook reel quoting a GPT-4 answer.

We should at the least be concerned, but overall we aren’t. We’re too busy dealing with other things, chasing red dots.

KnowProSE.com

Where one line can make a difference.

Silent Bias