From Inputs to The Big Picture: An AI Roundup

This started off as a baseline post regarding generative artificial intelligence and it’s aspects and grew fairly long because even as I was writing it, information was coming out. It’s my intention to do a ’roundup’ like this highlighting different focuses as needed. Every bit of it is connected, but in social media postings things tend to be written of in silos. I’m attempting to integrate since the larger implications are hidden in these details, and will try to stay on top of it as things progress.

It’s long enough where it could have been several posts, but I wanted it all together at least once.

No AI was used in the writing, though some images have been generated by AI.

The two versions of artificial intelligence on the table right now – the marketed and the reality – have various problems that make it seem like we’re wrestling a mating orgy of cephalopods.

The marketing aspect is a constant distraction, feeding us what helps with stock prices and good will toward those implementing the generative AIs, while the real aspect of these generative AIs is not really being addressed in a cohesive way.

To simplify this, this post breaks it down into the Input, the Output, and the impacts on the ecosystem the generative AIs work in.

The Input.

There’s a lot that goes into these systems other than money and water. There’s the information used for the learning models, the hardware needed, and the algorithms used.

The Training Data.

The focus so far has been on what goes into their training data, and that has been an issue including lawsuits, and less obviously, trust of the involved companies.

…The race to lead A.I. has become a desperate hunt for the digital data needed to advance the technology. To obtain that data, tech companies including OpenAI, Google and Meta have cut corners, ignored corporate policies and debated bending the law, according to an examination by The New York Times…

How Tech Giants Cut Corners to Harvest Data for A.I.“, Cade MetzCecilia KangSheera FrenkelStuart A. Thompson and Nico Grant, New York Times, April 6, 2024 1

Of note, too, is that Google has been indexing AI generated books, which is what is called ‘synthetic data’ and has been warned against, but is something that companies are planning for or even doing already, consciously and unconsciously.

Where some of these actions are questionably legal, they’re not as questionably ethical to some, thus the revolt mentioned last year against AI companies using content without permission. It’s of questionable effect because no one seems to have insight into what the training data consists of, and there seems no one is auditing them.

There’s a need for that audit, if only to allow for trust.

…Industry and audit leaders must break from the pack and embrace the emerging skills needed for AI oversight. Those that fail to address AI’s cascading advancements, flaws, and complexities of design will likely find their organizations facing legal, regulatory, and investor scrutiny for a failure to anticipate and address advanced data-driven controls and guidelines.

Auditing AI: The emerging battlefield of transparency and assessment“, Mark Dangelo, Thomson Reuters, 25 Oct 2023.

While everyone is hunting down data, no one seems to be seriously working on oversight and audits, at least in a public way, though the United States is pushing for global regulations on artificial intelligence at the UN. The status of that hasn’t seemed to have been updated, even as artificial intelligence is being used to select targets in at least 2 wars right now (Ukraine and Gaza).

There’s an imbalance here that needs to be addressed. It would be sensible to have external auditing of learning data models and the sources, as well as the algorithms involved – and just get get a little ahead, also for the output. Of course, these sorts of things should be done with trading on stock markets as well, though that doesn’t seem to have made as much headway in all the time that has been happening either.

Some websites are trying to block AI crawlers, and it is an ongoing process. Blocking them requires knowing who they are and doesn’t guarantee bad actors might not stop by.

There is a new Bill that being pressed in the United States, the Generative AI Copyright Disclosure Act, that is worth keeping an eye on:

“…The California Democratic congressman Adam Schiff introduced the bill, the Generative AI Copyright Disclosure Act, which would require that AI companies submit any copyrighted works in their training datasets to the Register of Copyrights before releasing new generative AI systems, which create text, images, music or video in response to users’ prompts. The bill would need companies to file such documents at least 30 days before publicly debuting their AI tools, or face a financial penalty. Such datasets encompass billions of lines of text and images or millions of hours of music and movies…”

New bill would force AI companies to reveal use of copyrighted art“, Nick Robins-Early, TheGuardian.com, April 9th, 2024.

Given how much information is used by these companies already from Web 2.0 forward, through social media websites such as Facebook and Instagram (Meta), Twitter, and even search engines and advertising tracking, it’s pretty obvious that this would be in the training data as well.

The Algorithms.

The algorithms for generative AI are pretty much trade secrets at this point, but one has to wonder at why so much data is needed to feed the training models when better algorithms could require less. Consider a well read person could answer some questions, even as a layperson, with less of a carbon footprint. We have no insight into the algorithms either, which makes it seem as though these companies are simply throwing more hardware and data at the problem than being more efficient with the data and hardware that they already took.

There’s not much news about that, and it’s unlikely that we’ll see any. It does seem like fuzzy logic is playing a role, but it’s difficult to say to what extent, and given the nature of fuzzy logic, it’s hard to say whether it’s implementation is as good as it should be.

The Hardware

Generative AI has brought about an AI chip race between Microsoft, Meta, Google, and Nvidia, which definitely leaves smaller companies that can’t afford to compete in that arena at a disadvantage so great that it could be seen as impossible, at least at present.

The future holds quantum computing, which could make all of the present efforts obsolete, but no one seems interested in waiting around for that to happen. Instead, it’s full speed ahead with NVIDIA presently dominating the market for hardware for these AI companies.

The Output.

One of the larger topics that has seemed to have faded is regarding what was called by some as ‘hallucinations’ by generative AI. Strategic deception was also something that was very prominent for a short period.

There is criticism that the algorithms are making the spread of false information faster, and the US Department of Justice is stepping up efforts to go after the misuse of generative AI. This is dangerous ground, since algorithms are being sent out to hunt products of other algorithms, and the crossfire between doesn’t care too much about civilians.2

The impact on education, as students use generative AI, education itself has been disrupted. It is being portrayed as an overall good, which may simply be an acceptance that it’s not going away. It’s interesting to consider that the AI companies have taken more content than students could possibly get or afford in the educational system, which is something worth exploring.

Given that ChatGPT is presently 82% more persuasive than humans, likely because it has been trained on persuasive works (Input; Training Data), and since most content on the internet is marketing either products, services or ideas, that was predictable. While it’s hard to say how much content being put into training data feeds on our confirmation biases, it’s fair to say that at least some of it is. Then there are the other biases that the training data inherits through omission or selective writing of history.

There are a lot of problems, clearly, and much of it can be traced back to the training data, which even on a good day is as imperfect as our own imperfections, it can magnify, distort, or even be consciously influenced by good or bad actors.

And that’s what leads us to the Big Picture.

The Big Picture

…For the past year, a political fight has been raging around the world, mostly in the shadows, over how — and whether — to control AI. This new digital Great Game is a long way from over. Whoever wins will cement their dominance over Western rules for an era-defining technology. Once these rules are set, they will be almost impossible to rewrite…

Inside the shadowy global battle to tame the world’s most dangerous technology“, Mark Scott, Gian Volpicelli, Mohar Chatterjee, Vincent Manancourt, Clothilde Goujard and Brendan Bordelon, Politico.com, March 26th, 2024

What most people don’t realize is that the ‘game’ includes social media and the information it provides for training models, such as what is happening with TikTok in the United States now. There is a deeper battle, and just perusing content on social networks gives data to those building training models. Even WordPress.com, where this site is presently hosted, is selling data, though there is a way to unvolunteer one’s self.

Even the Fediverse is open to data being pulled for training models.

All of this, combined with the persuasiveness of generative AI that has given psychology pause, has democracies concerned about the influence. A recent example is Grok, Twitter X’s AI for paid subscribers, fell victim to what was clearly satire and caused a panic – which should also have us wondering about how we view intelligence.

…The headline available to Grok subscribers on Monday read, “Sun’s Odd Behavior: Experts Baffled.” And it went on to explain that the sun had been, “behaving unusually, sparking widespread concern and confusion among the general public.”…

Elon Musk’s Grok Creates Bizarre Fake News About the Solar Eclipse Thanks to Jokes on X“, Matt Novak, Gizmodo, 8 April 2024

Of course, some levity is involved in that one whereas Grok posting that Iran had struck Tel Aviv (Israel) with missiles seems dangerous, particularly when posted to the front page of Twitter X. It shows the dangers of fake news with AI, deepening concerns related to social media and AI and should be making us ask the question about why billionaires involved in artificial intelligence wield the influence that they do. How much of that is generated? We have an idea how much it is lobbied for.

Meanwhile, Facebook has been spamming users and has been restricting accounts without demonstrating a cause. If there were a video tape in a Blockbuster on this, it would be titled, “Algorithms Gone Wild!”.

Journalism is also impacted by AI, though real journalists tend to be rigorous in their sources. Real newsrooms have rules, and while we don’t have that much insight into how AI is being used in newsrooms, it stands to reason that if a newsroom is to be a trusted source, they will go out of their way to make sure that they are: They have a vested interest in getting things right. This has not stopped some websites parading as trusted sources disseminating untrustworthy information because, even in Web 2.0 when the world had an opportunity to discuss such things at the World Summit on Information Society, the country with the largest web presence did not participate much, if at all, at a government level.

Then we have the thing that concerns the most people: their lives. Jon Stewart even did a Daily Show on it, which is worth watching, because people are worried about generative AI taking their jobs with good reason. Even as the Davids of AI3 square off for your market-share, layoffs have been happening in tech as they reposition for AI.

Meanwhile, AI is also apparently being used as a cover for some outsourcing:

Your automated cashier isn’t an AI, just someone in India. Amazon made headlines this week for rolling back its “Just Walk Out” checkout system, where customers could simply grab their in-store purchases and leave while a “generative AI” tallied up their receipt. As reported by The Information, however, the system wasn’t as automated as it seemed. Amazon merely relied on Indian workers reviewing store surveillance camera footage to produce an itemized list of purchases. Instead of saving money on cashiers or training better systems, costs escalated and the promise of a fully technical solution was even further away…

Don’t Be Fooled: Much “AI” is Just Outsourcing, Redux“, Janet Vertesi, TechPolicy.com, Apr 4, 2024

Maybe AI is creating jobs in India by proxy. It’s easy to blame problems on AI, too, which is a larger problem because the world often looks for something to blame and having an automated scapegoat certainly muddies the waters.

And the waters of The Big Picture of AI are muddied indeed – perhaps partly by design. After all, those involved are making money, they have now even better tools to influence markets, populations, and you.

In a world that seems to be running a deficit when it comes to trust, the tools we’re creating seem to be increasing rather than decreasing that deficit at an exponential pace.

  1. The full article at the New York Times is worth expending one of your free articles, if you’re not a subscriber. It gets into a lot of specifics, and is really a treasure chest of a snapshot of what companies such as Google, Meta and OpenAI have been up to and have released as plans so far. ↩︎
  2. That’s not just a metaphor, as the Israeli use of Lavender (AI) has been outed recently. ↩︎
  3. Not the Goliaths. David was the one with newer technology: The sling. ↩︎

Noam Chomsky, Ludditism and AI.

There’s been a lot of discussion about society and artificial intelligence. This includes the resurgence of mocking shares of Noam Chomsky’s opinion, “The False Promise of ChatGPT” where the only real criticism of it seems to have come from people asking ChatGPT what ChatGPT thought about Chomsky’s criticism.

I suppose human thought would be asking too much when it comes to responding to human thought in the age of generative artificial intelligences, and they kind of make Chomsky’s point.

Anyone who has been paying attention to what presently can be done with artificial intelligence should be concerned at some level. The older one is, the more one has to risk since the skills and experience can easily be flattened by these generative AIs.

So, let’s look at the crux of what Chomsky said:

“…Today our supposedly revolutionary advancements in artificial intelligence are indeed cause for both concern and optimism. Optimism because intelligence is the means by which we solve problems. Concern because we fear that the most popular and fashionable strain of A.I. — machine learning — will degrade our science and debase our ethics by incorporating into our technology a fundamentally flawed conception of language and knowledge…”

Noam Chomsky, Ian Roberts and Jeffrey Watumull, “The False Promise of ChatGPT“, New York Times, March 8th, 2023.

Given that generative AI do their ‘magic’ by statistics, we see biases based on what their training models consist of. I caught DALL-E misrepresenting ragtime musicians recently. A simplified way of looking at it is that the training model is implicitly biased because art and literature on the Internet is also biased by what is available and what is not, as well as what it consists of. This could be a problem for mankind’s scientific endeavors, though there is space to also say that it may also allow us to connect things that were previously not as easy to connect. This is how I generally use Generative AI.

Debasing our ethics seems to be something we’re pretty good at by ourselves. Consider that to train ChatGPT, there are lawsuits about the use of copyrighted materials. When you consider how much people have put into their work over the years, it is at least questionably ethical to use people’s work without recompense. That’s ethical, and subjective, but the legal side of it has yet to be decided.

A definite issue to consider at this moment in history is how Israel is using AI to select targets and how well that is working out for civilians. It is a hot topic, but regardless of where one stands on the issue it is commonly agreed that avoiding civilian casualties should be done, except the most extreme and thus problematic positions on either side. Is it the AI’s fault when civilian Palestinians die? Where is the line for accountability drawn? We can’t even seem to get that settled without involving AI, and AI is involved.

It’s no secret that law and ethics don’t seem compatible often enough. Of course, generative AI use in Law so far has been problematic.

So is it really an issue where Noam Chomsky is being a Luddite? No, far from, he’s pointing out that we might have issues with a new technology and specifies what sort of issues he might see. In fact, I even asked ChatGPT, which you can see in the footnote. 1

And yet there are Luddites. In fact, I got to know more about Luddites by reading what a Luddite had to say about humanity’s remaining timeline. One even predicts that humanity will end in 2 years.

I’d say Noam Chomsky’s piece will stand up to the test of time, and maybe some of his stuff will be judged outside of the time her wrote it – the plagiarism aspect seems like it’s tied to that copyright issue.

The Luddites have consistently been wrong, and/or humanity has been consistently lucky.

However you look at it, a bit of skepticism is a good thing. Noam Chomsky gave us that. It’s worth understanding it for what it is. Implicitly, it’s partly calling for us to be better humans without a religious guise.

That gets some agreement from me.

  1. ↩︎

An Example of Bias (ChatGPT, DALL-E)

I was about to write up a history of my interactions with the music industry as far as ownership over at RealityFragments.com, and I was thinking about how far back my love for music went in my soundtrack of life. This always draws me back to “The Entertainer” by Scott Joplin as a starting point.

I could use one of the public domain images of Scott Joplin, someone I have grown to know a bit about, but they didn’t capture the spirit of the music.

I figured that I’d see what DALL-E could put together on it, and gave it a pretty challenging prompt in it’s knowledge of Pop Culture.

As you can see, it got the spirit of things. But there’s something wrong other than the misspelling of “Entertainer”. A lot of people won’t get this because a lot of people don’t know much about Scott Joplin, and if they were to learn from this, they’d get something wrong that might upset a large segment of the world population.

I doubled down to see if this was just a meta-level mistake because of a flaw in algorithm somewhere.

Well, what’s wrong with this? It claims to be reflecting the era and occupation of a ragtime musician, yet ragtime music came from the a specific community in the United States that are called African-Americans now, in the late 19th century.

That would mean that a depiction of a ragtime musician would be more pigmented. Maybe it’s a hiccough, right? 2 in a row? Let’s go for 3.

Well, that’s 3. I imagined they’d get hip-hop right, and it seems like they did, even with a person of European descent in one.

So where did this bias come from? I’m betting that it’s the learning model. I can’t test that, but I can go just do a quick check with DeepAI.org.

Sure, it’s not the same starting prompt, but it’s the same general sort of prompt.

Let’s try again.

Well, there’s definitely something different. Something maybe you can figure out.

For some reason, ChatGPT is racebending ragtime musicians, and I have no idea why.

    Public Domain Image of Scott Joplin.

There’s no transparency in any of these learning models or algorithms. The majority of the algorithms wouldn’t make much sense to most people on the planet, but the learning models definitely would.

Even if we had control over the learning models, we don’t have control over what we collectively recorded over the millennia and made it into some form of digital representation. There are implicit biases in our histories, our cultures, and our Internet because of who has access to what, who shares what, and these artificial intelligences using that information based only on our biases of past and present determines the biases of the future.

I’m not sure Scott Joplin would appreciate being whitewashed. Being someone respected, of his pigmentation, in his period, being the son of a former slave, I suspect he might have been proud of who he became despite the biases of the period.

Anyway, this is a pretty good example of how artificial intelligence bias can impact the future when kids are doing their homework with large language models. It’s a problem that isn’t going away, and in a world that is increasingly becoming a mixing pot beyond social constructs of yesteryear, this particular example is a little disturbing.

I’m not saying it’s conscious. Most biases aren’t. It’s hard to say it doesn’t exist, though.

I’ll leave you with The Entertainer, complete with clips from 1977, where they got something pretty important right.

From Wikipedia, accessed on February 1st 2024:

Although he was penniless and disappointed at the end of his life, Joplin set the standard for ragtime compositions and played a key role in the development of ragtime music. And as a pioneer composer and performer, he helped pave the way for young black artists to reach American audiences of all races.

It seems like the least we could do is get him right in artificial intelligences.

A Basic Explanation of how AIs Identify Objects.

I’ve been experimenting with uploading images to ChatGPT 4 and seeing what it has to say about them. To me, it’s interesting because I gain some insight into how far things have progressed, as well as how descriptive ChatGPT can be about things.

While having coffee yesterday with a friend, I was showing him the capabilities. He chose this scene.

He, like others I showed here in Trinidad and Tobago, couldn’t believe it. It’s a sort of magic for people. What I like when I use it for this is that it doesn’t look at the picture as a human would, where the subject is pretty obvious. It looks at all of the picture, which is worth exploring in a future post

He asked me how it could do that, give the details that it did in the next image in this post. I tried explaining it, and I caught that he was thinking of the classic “IF…THEN… ELSE” sequence that came from ‘classical’ computer science that we had been exposed to in the 1980s.

I tried and failed explaining it. I could tell I failed because he was frustrated with my explanation, and when I can’t explain something it bothers me.

We went our separate ways, and I went to a birthday party for an old friend. I didn’t get home til much later. With people driving as they do here in Trinidad, my mind was focused on avoiding them so I didn’t get to think on it as I would have liked.

I slept on it.

This morning I remembered something I had drawn up in my teens, and now I think I can explain it better to my friend, and perhaps even people curious about it. Hopefully when I send this to him he’ll understand, and since I’m spending the time doing just that, why not everyone else?

Identifying Objects.

As a teenager, my drawing on a sketch pad page was about getting a computer to identify objects. It included a camera connected to the computer, which wasn’t done commercially yet, and what one would do was rotate the object through all the axes and the computer would be told what the object was at every conceivable angle. It was just an idea of a young man passionate about the future with the beginnings of a grounding in personal computing.

What we’ve all been doing with social media for some time is tagging things. This is how we organized finding things, and the incentive was for people to find our content.

A bat in the bathtub where I was staying in Guyana, circa 2005, while I was doing some volunteer IT stuff. It was noteworthy to me, so I did what I did then – took a picture and posted it to Flickr.

Someone would post something on social media, as I did with Flickr, and tag it appropriately (we would hope). I did have fun with it, tagging things like a bat in a photograph as being naked, which oddly was my most popular photo. Of course it was naked, you perverts.

However, I also tagged it as a bat. And if you search Flickr for a bat, you’ll come up with a lot of images of bats. They are of all different sorts of bats, of all angles. There are even more specific tags for kinds of bats, but overall we humans pretty much know a bat when we see one, so all those images of bats could then be added to a training model to allow a computer to come up with it’s own algorithmic way of identifying bats.

And it gets better.

The most popular versions of bats on Flickr, as an example, will be the ones that the most people liked. So now, the images of bats are given weight based on their popularity, and therefore could be seen as the best images of bats. Clearly, my picture of the bat in the bathtub shouldn’t be as popular a version.

It gets even better.

The more popular an image is, the more likely it is to be used on the Internet regardless of copyright, which means that it will show up in search engine rankings if you search for images of bats. Search Engine ranking then becomes another weight.

The more popular images that we collectively have chosen become the training models for bats. The system learns the pattern of the objects, much as we do but differently because they have different ways of looking at the same things.

If you take thousands – perhaps millions – of pictures of bats and train a system to identify it, it can go around looking for bats in images, going through all of the images available looking for bats. It will screw up sometimes, and you tell it, “Not a bat”. It also finds the bats that people haven’t tagged.

Given the amount of tagged images and even text on the Internet, doing it with specific things is a fairly straightforward process because we don’t do anything. We simply correct mistakes.

Now do that with all the tags of different objects. Eventually, you’ll get to where multiple images in a picture can be identified.

That’s basically how it works. I don’t know that they used Flickr.com, or search engines, but if I were doing it, that’s probably how I would – and it’s not a mistake that people have been encouraged to do this a lot more over the last years preceding artificial intelligence hitting the mainstream. Now look at who is developing artificial intelligences. Social networks and search engines.

The same thing applies to text.

Then, when you hand it an image with various objects in it, it identifies what it knows and describes them based on the words commonly associated with the objects, and if objects are grouped together, they become a higher level object. Milk and cookies is a great example.

And so it goes, stacking higher and higher a capacity to recognized patterns of patterns of patterns…

And this also explains how the revolution of Web x.0 may have carried the seeds of it’s own destruction.

A Tale of Two AIs.

2023 has been the year where artificial intelligences went from science fiction to technology possibility. It’s become so ubiquitous that on Christmas Eve, chatting with acquaintances and friends, people from all walks of life were talking about it.

I found it disappointing, honestly, because it was pretty clear I was talking about one sort of artificial intelligence where others were talking about another sort of artificial intelligence.

One, a lawyer, mentioned that she’d had lunch with an artificial intelligence expert. On listening and with a few questions, she was talking about what sounded to be a power user of ChatGPT. When I started talking about some of the things I write about here related to artificial intelligence, she said that they had not discussed all of that. Apparently I went a bit too far because she then asked, “But do you use the latest version of ChatGPT that you have to pay for like this expert does?”

Well, yes, I do. I don’t use it to write articles and if I do use ChatGPT to write something, I quote it. I have my own illusions, I don’t need to take credit for any hallucinations ChatGPT has. I also don’t want to incorporate strategic deception in my writing. To me, it’s a novelty and something I often find flaws with. I’m not going to beat up ChatGPT, it has usefulness, but the fact that I can use DALL-E to generate some images, like above, is helpful.

What disturbed me is that she thought that was what an artificial intelligence expert does. That seems a pretty low bar; I wouldn’t claim to be an artificial intelligence expert because I spend $20/month. I’m exploring it like many others and stepping back to look at problematic consequences, of which there are many. If we don’t acknowledge and deal with those, the rest doesn’t seem to matter as much.

That’s the trouble. Artificial intelligence, when discussed or written about, falls into two main categories that co-exist.

Marketed AI.

The most prominent one is the marketing hype right now, where we get ‘experts’ who for whatever reason are claiming a title for being power users of stabs at artificial intelligence. This is what I believe Cory Doctorow wrote about with respect to the ‘AI bubble’. It’s more about perception than reality, in my mind, and in some ways it can be good because it gets people to spend money so that hopefully those that collect it can do something more about the second category.

Yet it wasn’t long ago that people were selling snake oil. In the last decades, I’ve seen ‘website experts’ become ‘social media experts’, and now suddenly we have ‘artificial intelligence experts’.

Actual Artificial Intelligence.

The second category is actually artificial intelligence itself, which I believe we may be getting closer to. It’s where expert systems, which have been around since the 1970s, have made some quantum leaps. When I look at ChatGPT, as an example, I see an inference engine (the code) and the knowledge base which is processed from a learning model. That’s oversimplified, I know, and one can get into semantic arguments, but conceptually it’s pretty close to reality.

If you take a large language model like ChatGPT and feed it only medical information, it can diagnose based on symptoms a patient has. Feed it only information on a programming language like COBOL, it can probably write COBOL code pretty well. ChatGPT has a learning model that we don’t really know, and it is apparently pretty diverse, which allows us to do a lot of pretty interesting things besides generating silly images on blog posts. I’ve seen some code in JavaScript done this way, and I just generated some C++ code as a quick test with ChatGPT 4 that, yes, works and it does something better than most programmers do: it documents how it works.

I’d written about software engineers needing to evolve too with respect to artificial intelligence.

It has potential to revolutionize everything, all walks of life, and it’s going to be really messy because it will change jobs and even replace them. It will be something that will have psychological and sociological consequences, impacting governments and the ways we do… everything.

The Mix of Marketed vs. Actual

The argument could be made that without marketing, businesses would not make enough money for the continued expense of pushing the boundaries of artificial intelligence. Personally, I think this is true. The trouble is that marketing takes over what people believe artificial intelligence is. This goes with what Doctorow wrote about the bubble as well as what Joe McKendrick wrote about artificial intelligence fading into the background. When the phrase is over-used and misused in businesses, which seems to already be happening, the novelty wears off and the bubble pops in business.

That’s kind of what happened with social media and ‘social media experts’.

The marketing aspect, too, also causes people to worry about their own jobs, which maybe they don’t want but they want income because there are bills to pay in modern society. The fear of some is tangible, and with good reason. All the large language models use a very broad brush in answering those fears, as do the CEOs of the companies: We’ll just retrain everyone. There are people getting closer to retirement, and what companies have been doing to save money and improve their stock performance is finding reasons to ‘let people go’, so that comfort is spoken from on high with the same sensitivity as, “Let them eat cake“. It’s dismissive and ignores the reality people live in.

Finding the right balance is hard when there’s no control of the environment. People are talking about what bubbles leave behind, but they don’t talk as much about who they leave behind. Harvard Business Review predicted that the companies that get rid of jobs with artificial intelligence will eventually get left behind, but eventually can be a long time and can have some unpredictable economic consequences.

‘Eventually’ can be a long time.

The balance must be struck by the technology leaders in artificial intelligence, and that seems to be about as unlikely as it was with the dot-com boom. Maybe Chat-GPT 4 can help them out if they haven’t been feeding it enough of their own claims.

And no, you aren’t an ‘artificial intelligence expert’ if you are a paid user of artificial intelligence of any platform alone, just like buying a subscription to a medical journal doesn’t make you a medical professional.

AI, Ethics, Us.

Most of us live in a lot of different worlds, and we see things differently because of it. Some of us live in more than one world at a time. That’s why sometimes it’s hard for me to consider the promise of artificial intelligence and what we’re getting and the direction that’s going.

There’s space in this world in research for what we have now, which allows previously isolated knowledge to be regurgitated in a feat of math that makes the digital calculator look mundane. It’s statistics, it gives us what we want when we hit the ‘Enter’ button, and that’s not too bad.

Except it can replace an actual mind. Previously, if you read something, you didn’t guess if a machine threw the words together or not. You didn’t wonder if the teacher gave you a test generated by a large language model, and the teacher didn’t wonder if you didn’t generate the results the same way.

Now, we wonder. We wonder if we see an image. We wonder if we watch a video. We wonder enough so that the most popular female name for 2023 should be Alice.

So let me tell you where I think we should be heading with AI at this time.

What Could Be.

Everyone who is paying attention to what’s happening can see that the world is fairly volatile right now after the global pandemic, after a lot of economic issues that banks created combined with algorithmic trading… so this is the perfect time to drop some large language models in the world to make things better.

Nope.

No, it isn’t working that way. If we were focused on making the world better rather than worrying about using a good prompt for that term paper or blog post, it maybe could work that way. We could use things like ChatGPT to be consultants, but across mankind we lack the integrity to only use them as consultants.

“If anyone takes an AI system and starts designing speeches or messages, they generate the narrative that people want to hear. And the worst thing is that you don’t know that you are putting the noose around your neck alone.” The academic added that the way to this situation is education.

The only way to avoid manipulation is through knowledge. Without this, without information, without education, any human group is vulnerable, he concluded.1

IA: implicaciones éticas más allá de una herramienta tecnológica“, Miguel Ángel Pérez Álvarez, Wired.com (Spanish), 29 Nov 2023.

There’s the problem. Education needs to adapt to artificial intelligence as well because this argument, which at the heart I believe to be true, does not suffer it’s own recursion because people don’t know when it’s ethically right to use it, or even don’t know that there should be ethics involved.

As it happens, I’m pretty sure Miguel Ángel Pérez Álvarez already understands this and simply had his thoughts truncated, as happens in articles. He’s also got me wondering how different languages are handled by these Large Language Models and how different their training models are.

It’s like finding someone using an image you created and telling them, “Hey, you’re using my stuff!” and they say, “But it was on the Internet”. Nevermind the people who believe that the Earth is flat, or who think that vaccinations give you better mobile connections.

AI doesn’t bother me. It’s people, it’s habits, and in a few decades they’ll put a bandaid on it and call it progress. The trouble is we have a stack of bandaids on top of each other at this point and we really need to look at this beyond the pulpits of some billionaires who enjoy more free speech than anyone else.

  1. actual quote: “Si cualquier persona toma un sistema de IA y se pone a diseñar discursos o mensajes, te generan la narrativa que la gente quiere escuchar. Y lo peor es que tú no sabes que te estás poniendo la soga al cuello solito”. El académico añadió que la manera de contrarrestar esta situación es la educación.

    “La única manera de evitar la manipulación es a través del conocimiento. Sin este, sin información, sin educación, cualquier grupo humano es vulnerable”, finalizó.” ↩︎

Blocking AI Bots: The Opt Out Issue.

Those of us that create anything – at least without the crutches of a large language model like ChatGPT- are a bit concerned about our works being used to train large language models. We get no attribution, no pay, and the companies that run the models basically can just grab our work, train their models and turn around and charge customers for access to responses that our work helped create.

No single one of us is likely that important. But combined, it’s a bit of a rip off. One friend suggested being able to block the bots, which is an insurmountable task because it depends on the bots obeying what is in the robots.txt file. There’s no real reason that they have to.

How to Block AI Chatbots From Scraping Your Website’s Content” is a worthwhile guide to attempting to block the bots. It also makes the point that maybe it doesn’t matter.

I think that it does, at least in principle, because I’m of the firm opinion that websites should not have to opt out of being used by these AI bots – but rather, that websites should opt in as they wish. Nobody’s asked for anything, have they? Why should these companies use your work, or my work, without recompense and then turn around and charge access to these things?

Somehow, we got stuck with ‘opting out’ when what these companies running the AI Bots should have done is allow people to opt in with a revenue model.

TAANSTAAFL. Except if you’re a large tech company, apparently.

On the flip side, Zoom says that they’re not using data from users for their training models. Taken at face value, that’s great, but the real problem is that we wouldn’t know if they did.

Wikipedia, AI, Oh My.

One of the most disruptive things that has happened during Web 2.0 is Wikipedia – displacing the Encyclopedia Britannica as an online resource, forging strategic partnerships, and – for better and worse – the editorial community.

It has become one of the more dependable sources of information on the Internet, and while imperfect, the editors have collectively been a part of an evolution of verification and quality control that has made Wikipedia a staple.

It apparently has also been part of the training models of the large language models that we have grown to know over the past months, such as ChatGPT and Google’s Bard, which is interesting given how much volunteer work went into creating Wikipedia – something that makes me wonder if Wikimedia could be a part of the lawsuit.

This is pure speculation on my part, but given how much collective effort has gone into the many projects of Wikimedia, and given it’s mission is pretty clear about bringing free educational content to the world, large language models charging subscribers on that content is something that might be worth a bit of thought.

On a conference call in March that focused on A.I.’s threats to Wikipedia, as well as the potential benefits, the editors’ hopes contended with anxiety. While some participants seemed confident that generative A.I. tools would soon help expand Wikipedia’s articles and global reach, others worried about whether users would increasingly choose ChatGPT — fast, fluent, seemingly oracular — over a wonky entry from Wikipedia. A main concern among the editors was how Wikipedians could defend themselves from such a threatening technological interloper. And some worried about whether the digital realm had reached a point where their own organization — especially in its striving for accuracy and truthfulness — was being threatened by a type of intelligence that was both factually unreliable and hard to contain.

One conclusion from the conference call was clear enough: We want a world in which knowledge is created by humans. But is it already too late for that?

John Gertner, “Wikipedia’s Moment of Truth“, New York Times Magazine, July 18th, 2023, Updated on July 19th, 2023.

It is a quandary, that’s for sure. Speaking for myself, I prefer having citations on a Wikipedia page that I can research on my own – there seem to be at least some of us that trample our way through footnotes – and large language models don’t cite anything, which is the main problem I have with them.

In the facts category, I would say Wikipedia should win.

Unfortunately, time and again, the world has demonstrated that facts are sometimes a liability for selling a story, and so the concern I have is real.

Yet it could be useful to combine the two somehow.

ChatGPT Migrations.

I haven’t really mentioned ebb and flow to data streams, but it’s not that different from what we see in nature. Birds migrate. Elephants migrate. Whales migrate.

Users migrate. Sure, they move from application/service to application/service, but during the day they are more likely to use certain software/services. Then we get into weekdays and weekends, with the holidays…

People use different stuff at different times. So ChatGPT has noted such a migration and it I found it mildly disturbing:

ChatGPT is losing users for the first time ever, and those users aren’t who you would expect. Traffic to ChatGPT’s website fell by 9.7% in June, according to estimates from Similarweb, a web analytics firm. The decline was steeper in the U.S., with a 10.3% month-on-month decline, and the number of unique visitors to ChatGPT also fell by 5.7% from the previous month.

One thing is clear to Francois Chollet, a software engineer and AI researcher at Google, who told Fortune over email that “one thing is sure: it’s not booming anymore.”

Chollet thinks he knows what’s going on: summer vacation. Instead of using ChatGPT for education-related activities, the engineer said on Twitter, kids are probably playing Minecraft or enjoying summer activities. Search interest over time for ChatGPT has steadily declined, while search interest for Minecraft has steadily increased, he pointed out. 

ChatGPT suddenly ‘isn’t booming anymore,’ Google A.I. researcher says—and kids are the big problem“, Fortune.com, Stephen Pastis.

It’s noted in the article that doing homework is what ranks 2nd of ChatGPT’s use. Personally, I’ve got a really bad history about doing homework, so I get it, but are we truly punishing people by giving them bad grades who say, “Nope, I didn’t do it” to those who have ChatGPT do it? Honesty is being penalized again?

Honesty, integrity – all those things Disney remakes stories about that we have kids sit down and watch gets penalized if they don’t have ChatGPT do their homework?

This isn’t like the calculator, which took away some of the drudgery of math. ChatGPT, prompted properly, can do an entire assignment in moments, paste that into a spreadsheet…

And we have just trained primates to copy, paste, and not learn anything while those that might want to learn actually are at risk of getting lower grades.

I thought school was bad in my day…

Lawsuit Regarding ChatGPT

Anonymous individuals are claiming that ChatGPT stole ‘vast amounts of data’ in what they hope to become a class action lawsuit. It’s a nebulous claim about the nebulous data that OpenAI has used to train ChatGPT.

…“Despite established protocols for the purchase and use of personal information, Defendants took a different approach: theft,” they allege. The company’s popular chatbot program ChatGPT and other products are trained on private information taken from what the plaintiffs described as hundreds of millions of internet users, including children, without their permission.

Microsoft Corp., which plans to invest a reported $13 billion in OpenAI, was also named as a defendant…”

Creator of buzzy ChatGPT is sued for vacuuming up ‘vast amounts’ of private data to win the ‘A.I. arms race’“, Fortune.com, Teresa Xie, Isaiah Poritz and Bloomberg, June 28th 2023.

I’ve had suspicions myself about where their training data came from, but with no insight into the training model, how is anyone to know? That’s what makes this case interesting.

“…Misappropriating personal data on a vast scale to win an “AI arms race,” OpenAI illegally accesses private information from individuals’ interactions with its products and from applications that have integrated ChatGPT, the plaintiffs claim. Such integrations allow the company to gather image and location data from Snapchat, music preferences on Spotify, financial information from Stripe and private conversations on Slack and Microsoft Teams, according to the suit.”…Misappropriating personal data on a vast scale to win an “AI arms race,” OpenAI illegally accesses private information from individuals’ interactions with its products and from applications that have integrated ChatGPT, the plaintiffs claim. Such integrations allow the company to gather image and location data from Snapchat, music preferences on Spotify, financial information from Stripe and private conversations on Slack and Microsoft Teams, according to the suit.

Chasing profits, OpenAI abandoned its original principle of advancing artificial intelligence “in the way that is most likely to benefit humanity as a whole,” the plaintiffs allege. The suit puts ChatGPT’s expected revenue for 2023 at $200 million…”

ibid (same article quoted above).

This would run contrary to what Sam Altman, CEO of OpenAI, put in writing before US Congress.

“…Our models are trained on a broad range of data that includes publicly available content,
licensed content, and content generated by human reviewers.3 Creating these models requires
not just advanced algorithmic design and significant amounts of training data, but also
substantial computing infrastructure to train models and then operate them for millions of users…”

[Reference: 3 “Our Approach to AI Safety.” OpenAI, 5 Apr. 2023, https://openai.com/blog/our-approach-to-ai-safety.]

Written Testimony of Sam Altman Chief Executive Officer OpenAI Before the U.S. Senate Committee on the Judiciary Subcommittee on Privacy, Technology, & the Law“, Senate.Gov, Sam Altman,CEO of OpenAI, 5-16-2023.

I would love to know who the anonymous plaintiffs are, and would love to know how they got enough information to make the allegations. I suppose we’ll find out more as this progresses.

I, for one, am curious where they got this training data from.