An Example of Bias (ChatGPT, DALL-E)

I was about to write up a history of my interactions with the music industry as far as ownership over at RealityFragments.com, and I was thinking about how far back my love for music went in my soundtrack of life. This always draws me back to “The Entertainer” by Scott Joplin as a starting point.

I could use one of the public domain images of Scott Joplin, someone I have grown to know a bit about, but they didn’t capture the spirit of the music.

I figured that I’d see what DALL-E could put together on it, and gave it a pretty challenging prompt in it’s knowledge of Pop Culture.

As you can see, it got the spirit of things. But there’s something wrong other than the misspelling of “Entertainer”. A lot of people won’t get this because a lot of people don’t know much about Scott Joplin, and if they were to learn from this, they’d get something wrong that might upset a large segment of the world population.

I doubled down to see if this was just a meta-level mistake because of a flaw in algorithm somewhere.

Well, what’s wrong with this? It claims to be reflecting the era and occupation of a ragtime musician, yet ragtime music came from the a specific community in the United States that are called African-Americans now, in the late 19th century.

That would mean that a depiction of a ragtime musician would be more pigmented. Maybe it’s a hiccough, right? 2 in a row? Let’s go for 3.

Well, that’s 3. I imagined they’d get hip-hop right, and it seems like they did, even with a person of European descent in one.

So where did this bias come from? I’m betting that it’s the learning model. I can’t test that, but I can go just do a quick check with DeepAI.org.

Sure, it’s not the same starting prompt, but it’s the same general sort of prompt.

Let’s try again.

Well, there’s definitely something different. Something maybe you can figure out.

For some reason, ChatGPT is racebending ragtime musicians, and I have no idea why.

    Public Domain Image of Scott Joplin.

There’s no transparency in any of these learning models or algorithms. The majority of the algorithms wouldn’t make much sense to most people on the planet, but the learning models definitely would.

Even if we had control over the learning models, we don’t have control over what we collectively recorded over the millennia and made it into some form of digital representation. There are implicit biases in our histories, our cultures, and our Internet because of who has access to what, who shares what, and these artificial intelligences using that information based only on our biases of past and present determines the biases of the future.

I’m not sure Scott Joplin would appreciate being whitewashed. Being someone respected, of his pigmentation, in his period, being the son of a former slave, I suspect he might have been proud of who he became despite the biases of the period.

Anyway, this is a pretty good example of how artificial intelligence bias can impact the future when kids are doing their homework with large language models. It’s a problem that isn’t going away, and in a world that is increasingly becoming a mixing pot beyond social constructs of yesteryear, this particular example is a little disturbing.

I’m not saying it’s conscious. Most biases aren’t. It’s hard to say it doesn’t exist, though.

I’ll leave you with The Entertainer, complete with clips from 1977, where they got something pretty important right.

From Wikipedia, accessed on February 1st 2024:

Although he was penniless and disappointed at the end of his life, Joplin set the standard for ragtime compositions and played a key role in the development of ragtime music. And as a pioneer composer and performer, he helped pave the way for young black artists to reach American audiences of all races.

It seems like the least we could do is get him right in artificial intelligences.

A Basic Explanation of how AIs Identify Objects.

I’ve been experimenting with uploading images to ChatGPT 4 and seeing what it has to say about them. To me, it’s interesting because I gain some insight into how far things have progressed, as well as how descriptive ChatGPT can be about things.

While having coffee yesterday with a friend, I was showing him the capabilities. He chose this scene.

He, like others I showed here in Trinidad and Tobago, couldn’t believe it. It’s a sort of magic for people. What I like when I use it for this is that it doesn’t look at the picture as a human would, where the subject is pretty obvious. It looks at all of the picture, which is worth exploring in a future post

He asked me how it could do that, give the details that it did in the next image in this post. I tried explaining it, and I caught that he was thinking of the classic “IF…THEN… ELSE” sequence that came from ‘classical’ computer science that we had been exposed to in the 1980s.

I tried and failed explaining it. I could tell I failed because he was frustrated with my explanation, and when I can’t explain something it bothers me.

We went our separate ways, and I went to a birthday party for an old friend. I didn’t get home til much later. With people driving as they do here in Trinidad, my mind was focused on avoiding them so I didn’t get to think on it as I would have liked.

I slept on it.

This morning I remembered something I had drawn up in my teens, and now I think I can explain it better to my friend, and perhaps even people curious about it. Hopefully when I send this to him he’ll understand, and since I’m spending the time doing just that, why not everyone else?

Identifying Objects.

As a teenager, my drawing on a sketch pad page was about getting a computer to identify objects. It included a camera connected to the computer, which wasn’t done commercially yet, and what one would do was rotate the object through all the axes and the computer would be told what the object was at every conceivable angle. It was just an idea of a young man passionate about the future with the beginnings of a grounding in personal computing.

What we’ve all been doing with social media for some time is tagging things. This is how we organized finding things, and the incentive was for people to find our content.

A bat in the bathtub where I was staying in Guyana, circa 2005, while I was doing some volunteer IT stuff. It was noteworthy to me, so I did what I did then – took a picture and posted it to Flickr.

Someone would post something on social media, as I did with Flickr, and tag it appropriately (we would hope). I did have fun with it, tagging things like a bat in a photograph as being naked, which oddly was my most popular photo. Of course it was naked, you perverts.

However, I also tagged it as a bat. And if you search Flickr for a bat, you’ll come up with a lot of images of bats. They are of all different sorts of bats, of all angles. There are even more specific tags for kinds of bats, but overall we humans pretty much know a bat when we see one, so all those images of bats could then be added to a training model to allow a computer to come up with it’s own algorithmic way of identifying bats.

And it gets better.

The most popular versions of bats on Flickr, as an example, will be the ones that the most people liked. So now, the images of bats are given weight based on their popularity, and therefore could be seen as the best images of bats. Clearly, my picture of the bat in the bathtub shouldn’t be as popular a version.

It gets even better.

The more popular an image is, the more likely it is to be used on the Internet regardless of copyright, which means that it will show up in search engine rankings if you search for images of bats. Search Engine ranking then becomes another weight.

The more popular images that we collectively have chosen become the training models for bats. The system learns the pattern of the objects, much as we do but differently because they have different ways of looking at the same things.

If you take thousands – perhaps millions – of pictures of bats and train a system to identify it, it can go around looking for bats in images, going through all of the images available looking for bats. It will screw up sometimes, and you tell it, “Not a bat”. It also finds the bats that people haven’t tagged.

Given the amount of tagged images and even text on the Internet, doing it with specific things is a fairly straightforward process because we don’t do anything. We simply correct mistakes.

Now do that with all the tags of different objects. Eventually, you’ll get to where multiple images in a picture can be identified.

That’s basically how it works. I don’t know that they used Flickr.com, or search engines, but if I were doing it, that’s probably how I would – and it’s not a mistake that people have been encouraged to do this a lot more over the last years preceding artificial intelligence hitting the mainstream. Now look at who is developing artificial intelligences. Social networks and search engines.

The same thing applies to text.

Then, when you hand it an image with various objects in it, it identifies what it knows and describes them based on the words commonly associated with the objects, and if objects are grouped together, they become a higher level object. Milk and cookies is a great example.

And so it goes, stacking higher and higher a capacity to recognized patterns of patterns of patterns…

And this also explains how the revolution of Web x.0 may have carried the seeds of it’s own destruction.

How Much AI In Journalism?

Recently, I’ve been active in a group on Facebook that is supposed to be a polite space to debate things. News articles fly around, and the news articles we see these days from different sources carry their own biases because rather than just presenting facts, they present narratives and narratives require framing.

I wondered how much of these articles were generated by what we’re calling artificial intelligence these days. In researching, I can tell you I’m still wondering – but I have found some things that are of interest.

The New York Times Lawsuit.

It’s only fair to get this out of the way since it’s short and sweet.

Of course, in the news now is the lawsuit that the New York Times has brought against Microsoft and OpenAI, where speculation runs rampant either way. To their credit, through that link, the New York Times presented things in as unbiased way as possible. Everyone’s talking about that, but speculation on that only really impacts investors and share prices. It doesn’t help people like me as much, who write their own content as individuals.

In an odd twist though, and not too long after the announcement, OpenAI is offering to pay for licensing news articles (paywalled), which you can read more about here if you’re not paying TheInformation1.

Either way, that lawsuit is likely not going to help my content stay out of a learning model because I just don’t have the lawyers. Speculating on it doesn’t get me anywhere.

How Much AI is being used?

Statista only has one statistic they cite in the amount of artificial intelligence used in media and entertainment: 78 percent of U.S. adults think news articles created by AI is not a good thing.

The articles there go on and tell us about the present challenges, etc, but one word should stand out from that: foggy.

So how would it be used, if it is used? With nearly 50 ‘news’ websites as of May last year, almost a year ago, and with one news site even going so far as having an AI news anchor as of late last year, we should have questions.

Well, we don’t really know how many news agencies are using artificial intelligence or how. One would think disclosure would be the issue then.

The arguments against disclosure are pretty much summed up below (an extract from a larger well balanced article).

Against disclosure

One concern is that it could stifle innovation. If news organisations are required to disclose every time they use AI, they may be less likely to experiment with the technology.

Another is that disclosure could be confusing for consumers. Not everyone understands how AI works. Some people may be suspicious of AI-generated content. Requiring disclosure could make it more difficult for consumers to get the information they need.

Should the media tell you when they use AI to report the news? What consumers should know, Jo Adetunji, Editor, TheConversationUK, TheConversation.com, November 14, 2023.

On the surface, the arguments make sense.

In my opinion, placing innovation over trust, which is the actual argument being made by some with that argument, is abhorrent. To innovate, one needs that trust and if you want that trust, it seems to me that the trust has to be earned. This, given the present state of news outlets in their many shades of truth and bias might seem completely alien to some.

I do encourage people to read that entire article because the framing of it here doesn’t do the article justice and I’ve simply expressed an opinion on one side of the arguments presented.

How Is AI presently used?

Again, we really don’t know because of the disclosure issue, but in October of last year Twipe published 10 ways that journalists are using artificial intelligence. It points from the onset to Klara Indernach, a tag used by Express.de to note when an article is created with artificial intelligence tools.

Arist von Harpe is cited in the article for saying, “We do not highlight AI-aided articles. We’re only using [AI] as a tool. As with any tool, it’s always the person using it who is responsible for what comes out.” This seems a reasonable position, and puts the accountability on the humans related to it. I have yet to see artificial intelligences be thrown under the bus for an incorrect article, so we have that landmark to look for.

The rest of that article is pretty interesting and mentions fact checking, which is peculiar given the prevalence of hallucinations and even strategic deception, as well as image generation, etc.

We’ll never really know.

In the end, I imagine the use of any artificial intelligence in newsrooms is evolving even as I write this and will be evolving well beyond when you read this. In a few years, it may not be as much of a big deal, but now we’re finding failures in artificial intelligences all the way to a court, in a matter that is simply fraught with political consequences. They were quick to throw Google Bard under the bus on that one.

It is still disturbing we don’t have much insight into the learning models being used, which is a consistent problem. The lawsuit of the New York Times seems to be somewhat helpful there.

I honestly tried to find out what I could here and in doing so came up with my own conclusion that wasn’t what I would have expected it to be.

In the end, it is as Arist von Harpe is cited. We have to judge based on the stories we get because every newsroom will do things differently. It would have helped if we had less room to speculate on biases before the creation of these artificial intelligence tools, and whoever screws up should lose some trust. In this day and age, though, feeding cognitive biases seems to trump trust.

That’s probably the discussion we should have had some time ago.

  1. These paywalls are super-annoying for we mere mortals who do not have the deep pockets of corporate America. How many subscriptions is a well informed person supposed to have? It’s gotten ridiculous. We’ve known that business models for news have been in such trouble that a ‘news story’ has a more literal definition these days, but… surely we can do better than this? ↩︎

Social Networks, Privacy, Revenue and AI.

I’ve seen more and more people leaving Facebook because their content just isn’t getting into timelines. It’s an interesting thing to consider the possibilities of. While some of the complaints about the Facebook algorithms are fun to read, it doesn’t really mean too much to write those sort of complaints. It’s not as if Facebook is going to change it’s algorithms over complaints.

As I pointed out to people, people using Facebook aren’t the customers. People using Twitter-X aren’t the customers either. To be a customer, you have to buy something. Who buys things on social networks? Advertisers are one, of course.

That’s something Elon Musk didn’t quite get the memo on. Why would he be this confidence? Hubris? Maybe, that always seems a factor, but it’s probably something more sensible.

Billionaires used to be much better spoken, it seems.

There’s something pretty valuable in social networks that people don’t see. It’s the user data, which is strangely what the canceled West World was about. The real value is in being able to predict what people want and influence outcomes, much as the television series showed after the first season.1

Many people seem to think that privacy is only about credit card information and personal details. It also includes choices that allow algorithms to predict choices. Humans are black boxes in this regard, and if you have enough computing power you can go around poking and prodding to see the results.

Have you noticed that these social networks are linked somehow to AI initiatives? Through Meta, Facebook is linked to AI initiatives of Meta. Musk, chief twit at X, has his fingers in the AI pie too.

Artificial intelligences need learning models, and if you own a social network, you not only get to poke and prod – you have the potential to influence. Are your future choices something that fall under privacy? Probably not – but your past choices probably should be because that’s how you get to predicting and influencing future choices.

I never really got into Twitter. Facebook was less interruptive. On the surface, these started off as content management systems that provided a service and had paid advertising to support it, yet now one has to wonder at the value of the user data. Back in 2018, Cambridge Analytics harvested data from 50 million Facebook users. Zuckerberg later apologized, and talked about how 3rd party apps would be limited. To his credit, I think it was handled pretty well.

Still, it also signaled how powerful and useful that data could be and if you own a social network, that would at least give you pause. After all, Cambridge Analytics influenced politics at the least, and that could have also influenced markets. The butterfly effect reins supreme in the age of big data and artificial intelligence.

This is why privacy is important in the age of artificial intelligence learning models, algorithms, and so forth. It can impact the responses one gets from any large language model, which is why there are pretty serious questions regarding privacy, copyright, and other things related to training them. Bias leaks into everything, and popular bias on social networks is simply about the most vocal and repetitive – not about what is actually correct. This is also why canceling as a culture phenomenon can also be so damaging. It’s a nuclear option in the world of information, and oddly, large groups of smart or stupid people can use it with impunity.

This is why we see large language models hedge on some questions presently, because of conflicts within the learning model as well as some well designed algorithms. In that we should be a little grateful.

We should probably lobbying to find out what is in these learning models that artificial intelligences are given in much the same way we used2 to grill people who would represent us collectively. Sure, Elon Musk might be taking a financial hit, but what if it’s a gambit to leverage user data for bigger returns later with his ethics embedded in how he gets his companies to do that?

You don’t have to like or dislike people to question them and how they use this data, but we should all be a bit concerned. Yes, artificial intelligence is pretty cool and interesting, but unleashed without question of the integrity of the information trained on is at the least foolish.

Be careful what you share, what you say, who you interact with and why. Quizzes that require access to your user profile are definitely questionable, as that information and information of people you are connected with quickly get folded into data creating a digital shadow of yourself, part of the larger crowd that can influence the now and the future.

  1. This is not to say it was canceled for this reason. I only recently watched it, and have yet to finish season 3, but it’s very compelling and topical content for the now. Great writing and acting. ↩︎
  2. We don’t seem to be that good at it grilling people these days, perhaps because of all of this and more. ↩︎

Exploring Beyond Code 2.0: Into A World of AI.

It’s become a saying on the Internet without many people understanding it: “Code is Law”. This is a reference to one of the works of Lawrence Lessig, revised already since it’s original publication.

Code Version 2.0 dealt with much of the nuances of Law and Code in an era where we are connected by code. The fact that you’re reading this implicitly means that the Code allowed it.

Here’s an example that weaves it’s way throughout our society.

One of the more disturbing things to consider is that when Alexis de Tocqueville wrote Democracy in America 1, he recognized the jury as a powerful mechanism for democracy itself.

“…If it is your intention to correct the abuses of unlicensed printing and to restore the use of orderly language, you may in the first instance try the offender by a jury; but if the jury acquits him, the opinion which was that of a single individual becomes the opinion of the country at large…”

Alexis de Tocqueville, Volume 1 of Democracy in America, Chapter XI: Liberty of the Press In the United States (direct link to the chapter within Project Gutenberg’s free copy of the book)

In this, he makes the point that public opinion on an issue is summarized by the jury, for better and worse. Implicit in that is the discussion within the Jury itself, as well as the public opinion at the time of the trial. This is indeed a powerful thing, because it allows the people to decide instead of those in authority. Indeed, the jury gives authority to the people.

‘The People’, of course, means the citizens of a nation, and within that there is discourse between members of society regarding whether something is or is not right, or ethical, within the context of that society. In essence, it allows ethics to breathe, and in so doing, it allows Law to be guided by the ethics of a society.

It’s likely no mistake that some of the greatest concerns in society stem from divisions in what people consider to be ethical. Abortion is one of those key issues, where the ethics of the rights of a woman are put into conflict with the rights of an unborn child. On either side of the debate, people have an ethical stance based on their beliefs without compromise. Which is more important? It’s an extreme example, and one that is still playing out in less than complimentary ways for society.

Clearly no large language model will solve it, since the large language models are trained with implicitly biased training models and algorithms which is why they shouldn’t be involved, and this would likely go for general artificial intelligences of the future. Machine learning, or deep learning, learns from us, and every learning model is developed by it’s own secret jury whose stewed biases may not reflect the whole of society.

In fact, they would reflect a subset of society that is as disconnected from society as the companies that make them, since the company hires people based on it’s own values to move toward their version of success. Companies are about making money. Creating value is a very subjective thing for human society, but money is it’s currency.

With artificial intelligence being involved in so many things and with them becoming more and more involved, people should at the least be concerned:

  • AI-powered driving systems are trained to identify people, yet darker shades of humanity are not seen.
  • AI-powered facial recognition systems are trained on datasets of facial images. The code that governs these systems determines which features of a face are used to identify individuals, and how those features are compared to the data in the dataset. As a result, the code can have a significant impact on the accuracy and fairness of these systems, which has been shown to have an ethnic bias.
  • AI-powered search engines are designed to rank websites and other online content according to their relevance to a user’s query. The code that governs these systems determines how relevance is calculated, and which factors are considered. As a result, the code can have a significant impact on the information that users see, and therefore what they discuss, and how they are influenced.
  • AI-powered social media platforms are designed to connect users with each other and to share content. The code that governs these platforms determines how users are recommended to each other, and how content is filtered and ranked. As a result, the code can have a significant impact on the experiences of users on these platforms – aggregating into echo chambers.

We were behind before artificial intelligence reared it’s head recently with the availability of large language models, separating ourselves in ways that polarized and made compromise impossible.

Maybe it’s time for Code Version 3.0. Maybe it’s time we really got to talking about how our technology will impact society beyond a few smart people.

1 This was covered in Volume 1 of ‘Democracy in America‘, available for free here on Project Gutenberg.