Why I Installed AIs (LLMs) On My Local Systems.

May 24, 2024 ~ Taran ~ 2 Comments

The last few days I’ve been doing some actual experimentation, initially begun because of Daniel Miessler’s Fabric, an Open Source Framework for using artificial intelligence to augment we lowly humans instead of the self-lauding tech bros whose business model falls to, “move fast and break things“.

It’s hard to trust people with that sort of business model when you understand your life is potentially one of those things, and you like that particular thing.

I have generative AIs on all of my machines at home now, which was not as difficult as people might think. I’m writing this part up because to impress upon someone how easy it was, I walked them through doing it in minutes over the phone on a Windows machine. I’ll write that up as my next post, since apparently it seems difficult to people.

For myself, the vision Daniel Miessler brought with his implementation, Fabric, is inspiring in it’s own way though I’m not convinced that AI can make anyone a better human. I think the idea of augmenting is good, and I think with all the infoglut I contend with leaning on a LLM makes sense in a world where everyone else is being sold on the idea of using one, and how to use it.

People who wax poetic about how an AI has changed their lives in good ways are simply waxy poets, as far as I can tell.

For me, with writing and other things I do, there can be value here and there – but I want control. I also don’t want to have to risk my own ideas and thoughts by uploading even a hint of them to someone else’s system. As a software engineer, I have seen loads of data given to companies by users, and I know what can be done with it, and I have seen how flexible ethics can be when it comes to share prices.

Why Installing Your Own LLM is a Good Idea. (Pros)

There are various reasons why, if you’re going to use a LLM, it’s a good idea to have it locally.

(1) Data Privacy and Security: If you’re an individual or a business, you should look after your data and security because nobody else really does, and some profit from your data and lack of security.

(2) Control and Customization: You can fine tune your LLM on your own data (without compromising your privacy and security). As an example, I can feed a LLM various things I’ve written and have it summarize where ideas I’ve written about connect – and even tell me if I have something published where my opinion has changed- without worrying about handing all of that information to someone else. I can tailor it myself – and that isn’t as hard as you think.

(3) Independence from subscription fees; lowered costs: The large companies will sell you as much as you can buy, and before you know it you’re stuck with subscriptions you don’t use. Also, since the technology market is full of companies that get bought out and license agreements changed, you avoid vendor lock-in.

(4) Operating offline; possible improved performance: With the LLM I’m working on, being unable to access the internet during an outage does not stop me from using it. What’s more, my prompts aren’t queued, or prioritized behind someone that pays more.

(5) Quick changes are quick changes: You can iterate faster, try something with your model, and if it doesn’t work, you can find out immediately. This is convenience, and cost-cutting.

(6) Integrate with other tools and systems: You can integrate your LLM with other stuff – as I intend to with Fabric.

(7) You’re not tied to one model. You can use different models with the same installation – and yes, there are lots of models.

The Cons of Using a LLM Locally.

(1) You don’t get to hear someone that sounds like Scarlett Johansson tell you about the picture you uploaded¹.

(2) You’re responsible for the processing, memory and storage requirements of your LLM. This is surprisingly not as bad as you would think, but remember – backup, backup, backup.

(3) If you plan to deploy a LLM as a business model, it can get very complicated very quickly. In fact, I don’t know all the details, but that’s nowhere in my long term plans.

Deciding.

In my next post, I’ll write up how to easily install a LLM. I have one on my M1 Mac Mini, my Linux desktop and my Windows laptop. It’s amazingly easy, but going in it can seem very complicated.

What I would suggest about deciding is simply trying it, see how it works for you, or simply know that it’s possible and it will only get easier.

Oh, that quote by Diogenes at the top? No one seems to have a source. Nice thought, though a possible human hallucination.

OK, that was a cheap shot, but I had to get it out of my system. ↩︎

When The Internet Eats Itself

May 10, 2024 ~ Taran ~ 10 Comments

The recent news of Stack Overflow selling it’s content to OpenAI was something I expected. It was a matter of time. Users of Stack Overflow were surprised, which I am surprised by, and upset, which I’m not surprised by.

That seems to me a reasonable response. Who wouldn’t? Yet when we contribute to websites for free on the Internet and it’s not our website, it’s always a terrible bargain. You give of yourself for whatever reason – fame, prestige, or just sincerely enjoying helping, and it gets traded into cash by someone else.

But companies don’t want you to get wise. They want you to give them your content for free so that they can tie a bow around it and sell it. You might get a nice “Thank you!” email, or little awards of no value.

No Good Code Goes Unpunished.

The fallout has been disappointing. People have tried logging in and sabotaging their top answers. I spoke to one guy on Mastodon a few days ago and he got banned. It seems pretty obvious to me that they had already backed up the database where all the stuff was, and that they would be keeping an eye on stuff. Software developers should know that. There was also some confusion about the Creative Commons licensing the site uses versus the rights given to the owners of the website, which are mutually exclusive.

Is it slimy? You bet. It’s not new, and the companies training generative AI have been pretty slimy. The problem isn’t generative AI, it’s the way the companies decide to do business by eroding trust with the very market for their product while poisoning wells that they can no longer drink from. If you’re contributing answers for free that will be used to train AI to give the same answers for a subscription, you’re a silly person¹.

These days, generative AI companies need to put filters on the front of their learning models to keep small children from getting sucked in.

Remember Huffington Post?

Huffington Post had this neat little algorithm for swapping around headlines til it found one that people liked, it gamed SEO, and it built itself into a powerhouse that almost no one remembers now. It was social, it was quirky, and it was fun. Volunteers put up lots of great content.

When Huffingpost sold for $315 million, the volunteers who provided the content for free and built the site up before it sold sued – and got nothing. Why? Because they had volunteered their work.

I knew a professional journalist who was building up her portfolio and added some real value – I met her at a conference in Chicago probably a few months before the sale, and I asked her why she was contributing to HuffPost for free. She said it was a good outlet to get some things out – and she was right. When it sold, she was angry. She felt betrayed, and rightfully so I think.

It seems people weren’t paying attention to that. I did².

You live, you learn, and you don’t do it again. With firsthand and second hand experience, if I write on a website and I don’t get paid, it’s my website. Don’t trust anyone who says, “Contribute and good things will happen!”. Yeah, they might, but it’s unlikely it will happen for you.

If your content is good enough for a popular site, it’s good enough to get paid to be there. You in the LinkedIn section – pay attention.

Back To AI’s Intake Manifold.

I’ve written about companies with generative AI models scraping around looking for content, with contributed works to websites being a part of the training models. It’s their oil, it’s what keeps them burning through cash as they try to… replace the people whose content they use. In return, the Internet gets slop generated all over, and you’ll know the slop when you read it – it lacks soul and human connection, though it fakes it from time to time like the pornographic videos that make the inexperienced think that’s what sex is really like. Nope.

The question we should be asking is whether it’s worth putting anything on the Internet at this point, just to have it folded into a statistical algorithm that chews up our work and spits out something like it. Sure, there are copyright lawsuits happening. The argument of transformative works doesn’t really work that well in a sane mind when it comes to the exponentially higher amount of content used to create a generative AI at this point.

So what happens when less people contribute their own work? One thing is certain: the social aspect of the Internet will not thrive as well.

Social.

The Stack Overflow website was mainly an annoyance for me over the years, but I understand that many people had a thriving society of a sort there. It was largely a meritocracy, as open source, at least at it’s core. You’ll note that I’m writing of it in the past tense – I don’t think anyone with any bit of self-worth will contribute there anymore.

The annoyance aspect for me came from (1) Not finding solutions to the quirky problems that people hired me to solve³, and (2) Finding code fragments I tracked down to Stack Overflow poorly (if at all) adapted to the employer or client needs. I also had learned not to give away valuable things for free, so I didn’t get involved. Most, if not all, of the work I did required my silence on how things worked, and if you get on a site like StackOverflow – your keyboard might just get you in trouble. Yet the problem wasn’t the site itself, but those who borrowed code like it was a cup of sugar instead of a recipe.

Beyond we software engineers, developers, whatever they call themselves these days, there are a lot of websites with social interaction that are likely getting their content shoved into an AI learning model at some point. LinkedIn, owned by Microsoft, annoyingly in the top search results, is ripe for being used that way.

LinkedIn doesn’t pay for content, yet if you manage to get popular, you can make money off of sponsored posts. “Hey, say something nice about our company, here’s $x”. That’s not really social, but it’s how ‘influencers’ make money these days: sponsored posts. When you get paid to write posts in that way, you might be selling your soul unless you keep a good moral compass, but when bills need to get paid, that moral compass sometimes goes out the window. I won’t say everyone is like that, I will say it’s a danger and why I don’t care much about ‘influencers’.

In my mind, anyone who is an influencer is trying to sell me something, or has an ego so large that Zaphod Beeblebrox would be insanely jealous.

Regardless, to get popular, you have to contribute content. Who owns LinkedIn? Microsoft. Who is Microsoft partnered with? OpenAI. The dots are there. Maybe they’re not connected. Maybe they are.

Other websites are out there that are building on user content. The odds are good that they have more money for lawyers than you do, that their content licensing and user agreement work for them and not you, and if someone wants to buy that content for any reason… you’ll find out what users on Stack Overflow found out.

All relationships are built on trust. All networks are built on trust. The Internet is built on trust.

The Internet is eating itself.

_{I am being kind.} ↩︎
_{I volunteered some stuff to WorldChanging.com way back when with the understanding it would be Creative Commons licensed. I went back and forth with Alex and Jamais, as did a few other contributors, and because of that and some nastiness related to the Alert Retrieval Cache, I walked away from the site to find out from an editor that contacted me about their book that they wanted to use some of my work. Nope. I don’t trust futurists, and maybe you shouldn’t either.} ↩︎
_{I always seemed to be the software engineer that could make sense out of gobblygook code, rein it in, take it to water and convince it to drink.} ↩︎

Our Technology And Ethics.

April 22, 2024 ~ Taran ~ 1 Comment

The headlines this past week have had Google’s relationship with Israel under scrutiny as they fired employees who were against what Israel has been doing and protested accordingly. I’ve looked at some of the news stories, some sympathizing with the former employees, some implicitly supporting Israel and the order that people expect within companies.

I won’t comment on that because that’s political and this isn’t about politics, or who is right or wrong.

Of Swords And Blacksmiths.

Throughout my career as a software engineer, I’ve had to deal with ethical issues and I’ve navigated them as best I could, as challenging as some of them were and some of them were personally quite challenging.

Ever since we figured out how to bonk each other over the heads with stones (stone technology), it seems we’ve found increasing occasion to do so. It could be that the first use of such weapons was for hunting or defense of the tribe from predators – likely both – but eventually we learned to turn them on ourselves.

I’m sure at some point there was a blacksmith who refused to make swords because of where the points and edges were aimed. Other blacksmiths just made them. There always seems to be someone else to kill, or to defend against. We could get into the Great Gun Debate, but we fall into the same problem with that. There’s always some human creeping around who wants to kill someone else for glorified reasons, and because of that we sleep with things under our pillows that could very well be used to kill us just as easily. It’s not a debate. It’s a criticism of humanity and an unfortunately honest one at that.

“We all lived for money, and that is what we died for.”
William T. Vollmann, No Immediate Danger: Volume One of Carbon Ideologies

Sometimes my ethics require me to move on, which I did without protest a few times over the decades: There’s always someone else who needs a job more than they care about an ethical issue if even they see the ethical issue. In the end we try, hopefully, to do more good than bad, but both of those are subjective.

Too often we use a technology as a scapegoat, an externalized criticism of ourselves that allows us to keep doing what we do. Technology can be used for good or bad; how we use that technology says something about ourselves and when we criticize the use of technology, we implicitly criticize ourselves but we don’t take the criticism because we have neatly placed the blame on a vague, externalized concept – a deflection at a species level, often because we are buying into the idea that the enemy is less than human. Yet we all are human despite ideologies, cultures, languages, and color coding that we don’t all neatly fit in.

We Are All Blacksmiths.

These days, with generative AI allowing us to paint the fence of the future once we give the corporations in control of them a few baubles, everything we do on the Internet is potentially a weapon to be used against someone else. While the firing of the Google employees who protested is news, those that still work there aren’t, and this is not to say that they aren’t faced with their own ethical dilemmas. We who work in technology hope that our work is used for good.

I worked at one place that started off with robo-calling software that was used to annoy people during elections that turned itself into an emergency communications service. Things can change, businesses can change, and controlling even a part of the infrastructure of a nation’s military can have unexpected consequences for everyone involved. What happens if Google suddenly doesn’t like something and turns something off?

The future is decidedly fickle. Our personal ethics should impact our collective ethics, but it often doesn’t. It can.

We build tools. Sadly, they aren’t used the way we would like sometimes, and we should try to influence things if we can – but ultimately, we are subject to a fickle future and good intentions that can be misdirected.

WordPress.com, Tumblr to Sell Information For AI Training: What You can do.

March 10, 2024 ~ Taran ~ 9 Comments

I accidentally posted this on RealityFragments.com, but I think it’s important enough to leave it there. The audiences vary, but both have other bloggers on them.

While I was figuring out how to be human in 2024, I missed that Tumblr and WordPress posts will reportedly be used for OpenAI and Midjourney training.

This could be a big deal for people who take the trouble to write their own content rather than filling the web with Generative AI text to just spam out posts.

If you’re involved with WordPress.org, it doesn’t apply to you.

WordPress.com has an option to use Tumblr as well, so when you post to WordPress.com it automagically posts to Tumblr. Therefore you might have to visit both of the posts below and adjust your settings if you don’t want your content to be used in training models.

This doesn’t mean that they haven’t already sent information to Midjourney and OpenAI yet. We don’t really know, but from the moment you change your settings…

WordPress.com: How to opt out of the AI training is available here.

It boils down to this part in your blog settings on WordPress.com:
With Tumblr.com, you should check out this post. Tumblr is more tricky, and the link text is pretty small around the images – what you need to remember is after you select your blog on the left sidebar, you need to use the ‘Blog Settings’ link on the right sidebar.

Hot Take.

When I was looking into all of this, it ends up that Automattic, the owners of WordPress.com and Tumblr.com is doing the sale.

If you look at your settings, if you haven’t changed them yet, you’ll see that the default was set to allowing the use of content for training models. The average person who uses these sites to post their content are likely unaware, and in my opinion if they wanted to do this the right way the default setting would be to have these settings opt out.

It’s unclear whether they already sent posts. I’m sure that there’s an army of lawyers who will point out that they did post it in places and that the onus was on users to stay informed. It’s rare for me to use the word ‘shitty’ on KnowProSE.com, but I think it’s probably the best way to describe how this happened.

It was shitty of them to set it up like this. See? It works.

Now some people may not care. They may not be paying users, or they just don’t care, and that’s fine. Personal data? Well, let’s hope that got scrubbed.

Some of us do. I don’t know how many, so I can’t say a lot or a few. Yet if Automattic, the parent company of both Tumblr and WordPress.com, will post that they care about user choices, it hardly seems appropriate that the default choice was not to opt out.

As a paying user of WordPress.com, I think it’s shitty to think I would allow the use of what I write, using my own brain, to be used for a training model that the company gets paid for. I don’t see any of that money. To add injury to that insult of my intelligence, Midjourney and ChatGPT also have subscription to offer the trained AI which I also pay for (ChatGPT).

To make matters worse, we sort of have to take the training models on the word of those that use them. They don’t tell us what’s in them or where the content came from.

This is my opinion. It may not suit your needs, and if you don’t have a pleasant day. But if you agree with this, go ahead, make sure your blog is not allowing third party data sharing.

Personally, I’m unsurprised at how poorly this has been handled. Just follow some of the links early on in the post and revel in dismay.

So. Many. Layoffs.

January 31, 2024 ~ Taran ~ 1 Comment

I’ve been looking at getting back into the ring of software engineering, but it doesn’t seem like a great time to do it.

When Google was laying off workers, I shook my head a bit. It ends up that Google spent 800 million dollars in layoffs just this month. Just this month!

By comparison, Google spent $2.1 billion dollars on layoff expenses for more than 12,000 employees over the course of 2023. Other Google employees only knew about people being dismissed when people’s emails got bounced back last year in February.

With so many layoffs, hopefully they’re getting better at it. Well, maybe not. Google employees have been told more layoffs are coming this year.

I imagine that there are some pretty high quality resumes floating around. As far as the tech field goes, Google is probably considered top tier, and landing a position against someone with Google on their resume is going to be tough.

There’s a problem with that, though. More than 25,000 tech workers from 100 companies got the axe in first few weeks of 2024. Meta, Amazon, Microsoft, Google, TikTok and Salesforce are included in that… and Microsoft numbers may account for the Blizzard/Activision layoffs that happened this past week, sadly.

Blizzard was one of those dream jobs I had as a significantly younger developer way back when. They were often late on delivery for a new game, but it was pretty much worth it. I still play Starcraft II.

It’s become an employer’s job market – maybe it was before, but definitely more so now, and in an era when artificial intelligence may be becoming more attractive for companies and software development, as well as other things. For all we know, they may have consulted artificial intelligence for some of the layoffs, though. It wouldn’t be the first time that happened, though that was in Russia.

I can’t imagine that Google, Microsoft, Meta and Amazon aren’t using big data and AI for this, at least behind the scenes, but it’s probably not being explained because of the blowback that might cause. ‘Fired by AI’ is not something that people would like to see.

When tech companies axe companies, Wall Street rewards them, so stock prices go up – and there are more unemployed technology folk in a period when AI tools are making so many types of productivity easier. Maybe too much easier.

This reminds me so much of the 1990s. The good news is that tech survived the 1990s despite the post-merger layoffs.

Of course, the correction on the NPR article (at the bottom) is something I wish I had caught earlier. “Nearly 25,000 tech workers were laid in the first weeks of 2024. Why is that?” would definitely be an article worth reading.

When Is An Algorithm ‘Expressive’?

January 26, 2024 ~ Taran ~ Leave a comment

Yesterday, I was listening to the webinar on Privacy Law and the United States First Amendment when I heard that lawyers for social networks are claiming both that they have free speech as a network as a speaker, as well as claiming not to be the speaker and claiming they are simply are presenting content users have expressed under the Freedom of Speech. How the arguments were presented I don’t know, and despite showing up for the webinar I am not a lawyer¹. The case before the Supreme Court was being discussed, but that’s not my focus here.

I’m exploring how it would be possible to claim that a company’s algorithms that impact how a user perceives information could be considered ‘free speech’. I began writing this post about that and it became long and unwieldy², so instead I’ll write a bit about the broader impact of social networks and their algorithms and tie it back.

Algorithms Don’t Make You Obese or Diabetic.

If you say the word ‘algorithm’ around some people, their eyes immediately glaze over. It’s really not that complicated; a repeatable thing is basically an algorithm. A recipe when in use is an algorithm. Instructions from Ikea are algorithms. Both hopefully give you what you want, and if they don’t, they are ‘buggy’.

Let’s go with the legal definition of what an algorithm is¹. Laws don’t work without definitions, and code doesn’t either.

Per Cornell’s Legal Information Institute, an algorithm is:

“An algorithm is a set of rules or a computational procedure that is typically used to solve a specific problem. In the case of Vidillion, Inc. v. Pixalate Inc. an algorithm is defined as “one or more process(es), set of rules, or methodology (including without limitation data points collected and used in connection with any such process, set of rules, or methodology) to be followed in calculations, data processing, data mining, pattern recognition, automated reasoning or other problem-solving operations, including those that transform an input into an output, especially by computer.” With the increasing automation of services, more and more decisions are being made by algorithms. Some examples are; criminal risk assessments, predictive policing, and facial recognition technology.”

By this definition and perhaps in it’s simplest form, adding two numbers is an algorithm, which also fits just about any technical definition out there. That’s not at issue.

What is at issue in the context of social networks is how algorithms impact what we view on a social networking website. We should all understand in the broad strokes that Facebook, Twitter, TikTok and their ilk are in the business of showing people what they want to see, and to do this they analyze what people view so that they can give people what they want.

Ice cream and brownies for breakfast, everyone!

Let’s agree every individual bit of content you see that you can act on, such as liking or re-transmitting, is a single item. Facebook sees you like ice cream, Facebook shows you posts of ice cream incessantly. Maybe you go out and eat ice cream all the time because of this and end up with obesity and diabetes. Would Facebook be guilty of making you obese and diabetic?

Fast food restaurants aren’t considered responsible for making people obese and diabetic. We have choices about where we eat, just as we have choices about what we do with our lives outside of a social network context. Further, almost all of these social networks give you options to not view content, from blocking to reporting to randomly deleting your posts and waving a finger at you for being naughty – without telling you how.

Timelines: It’s All A Story.

As I wrote elsewhere, we all choose our own social media adventures. Most of my social networks are pretty well tuned to feed me new things to learn every day, while doing a terrible job of providing me information on what all my connections are up to. It’s a real estate problem on social network sites, and not everyone can be in that timeline. Algorithms pick and choose, and if there are paid advertisements to give you free access, they need space too.

Think of it all as a personal newspaper. Everything you see is picked for you based on what the algorithms decide, and yet all of that information is competing to get into your eyeballs, maybe even your brain. Every story is shouting ‘pick me! pick me!’ with catchy titles, wonderful images, and maybe even some content – because everyone wants you to click to their website so you can hammer them with advertising.⁴

Yet when we step back from those individual stories, the social networking site is curating things in a chronological order. Let’s assume that what it thinks you like to see the most is at the top and it goes down in priority based on what the algorithms have learned about you.

Now think of each post as a page in a newspaper. What’s on the front page affects how you perceive everything in the newspaper. Unfortunately, because it’s all shoved into a prioritized list for you, you get things that are sometimes in a strange order, giving a weird context.

Sometimes you get stray things you’re not interested in because the algorithms have grouped you with others. Sometimes the priority of what you last wrote about will suddenly have posts related to it covering every page in that newspaper.

You might think you’re picking your own adventure through social media, but you’re not directly controlling it. You’re randomly hitting a black box to see what comes out in the hope that you might like it, and you might like the order that it comes in.

We’re all beta testers of social networks in that regard. They are constantly tweaking algorithms to try to do better, but doing better isn’t necessarily for you. It’s for them, and it’s also for training their artificial intelligences more than likely. It’s about as random as human interests are.

Developing Algorithms.

Having written software in various companies over the decades, I can tell you that if there’s a conscious choice to express something with them, to get people to think one way or the other (the point of ‘free speech’), it would have to be very coordinated.

Certain content would have to be weighted as is done with advertising. Random content churning through feeds would not fire things off with the social networking algorithms unless they manually chose to do so across users. That requires a lot of coordination, lots of meetings, and lots of testing.

It can be done. With advertising as an example, it has been done overtly. Another example is the last press against fake news, which has attempted to proactively check content with independent fact checkers.

Is that free speech? Is that freedom of expression of a company? If you look at this case again, you will likely draw your own conclusions. Legally, I have no opinion because I’m not a lawyer.

But as a software engineer, I look at it and wonder if this is a waste of the Court’s time.

It should be in the interest of software engineers and others about the legal aspects of what we have worked on and will work on. Ethics are a thing. ↩︎
It still is, and I apologize if it’s messy. This is a post I’ll likely have to revisit and edit. ↩︎
Legal definitions of what an algorithm is might vary around the world. It might be worth searching for a legal definition where you are. ↩︎
This site has advertising. It doesn’t really pay and I’m not going to shanghai viewers by misrepresenting what I write. It’s a choice. Yet to get paid for content, that’s what many websites do. If you are here, you’re appreciated. Thanks! ↩︎

AI, Confirmation Bias and Our Own Insanity.

January 15, 2024 ~ Taran ~ 1 Comment

In unsurprising news, if you feed artificial intelligences the output of artificial intelligences they become a bit insane. I’d covered that before in Synthetic Recursion, which seemed pretty intuitive even before I wrote that, but scientists at Rice and Stanford University wrote a paper: “Self Consuming Generative Models Go MAD“.

So, we can say that’s been verified.

What’s even worse is apparently, Taylor Swift, Selena Gomez and Kim Kardashian have been saying things that they did not say – organized disinformation that has appeared all over, and if in vacuuming copyrighted content OpenAI’s ChatGPT might get infected. It’s not just artificial intelligences, output from people willfully misleading others can easily make it in.

Fortunately, I verified with ChatGPT4 and they got it right by… using Bing. I don’t use Bing. Why does ChatGPT4? Because of the same reason you can’t have a Coke with your Kentucky Fried Chicken.

While this time it has been caught – it started in November 2023 – it demonstrates how inaccuracies can crop up, how biases can be pushed, and how many problems we still have with misinformation without involving artificial intelligence. Every time we get anything on social media these days we have to fact check, and then we immediately get blowback about fact checking being flawed.

Why? It fits their confirmation biases. Given the way large language models are trained, we can say that we’re getting a lot right and yet we’re collectively also under a delusion that humanity has collected is right. What is true is that what we believe we know just hasn’t been proven wrong yet, with different thresholds for that varying from person to person.

With science, there’s a verification process, but science has been under fire increasingly because of who pays for the papers to be written and published. Academia has to be funded, and we don’t fund that as much as we should so others do sometimes, to their own ends. That’s why it’s important to read the papers, but not everyone has the time to do that. There is good science happening, and I’d like to think more of it is good than bad.

With AI tools, I imagine more papers will be written more quickly, which creates a larger problem. Maybe even an exponentially larger problem.

We accept a lot, and since we don’t know what’s in learning models, we don’t know what has been verified until we find things that aren’t. This means we need to be skeptical, just like when we use Wikipedia. There are some people who don’t like doing that footwork because what they see fits their confirmation biases.

Should we be surprised that our tools would have them too based on what we feed them?

It’s almost as if we need to make sure we’re feeding these learning models with things of value. That should come at a cost, because when we write, when we express ourselves in any way, it’s based largely on experience, sometimes hard won.

Meanwhile, artificial intelligence tools are being created to write summaries of books authors took years to write. Amazon is being flooded with them, apparently, and if I see another advertisement about microlearning on Facebook that seems to use these sort of precis notes, I might throw up through my monitor on someone else’s keyboards.

Copyright, Innovation, and the Jailbreak of the Mouse.

January 11, 2024 ~ Taran ~ 1 Comment

Not produced by Disney, generated by deepai.

On one hand, we have the jailbreak of Steamboat Willie into the public domain despite the best efforts of Disney. I’m not worried about it either way; I generated the image using Deepai. If Disney is upset about it, I have no problem taking it down.

There’s a great write-up on the 1928 version of Mickey over at the Duke Center for the Study of the Public Domain, and you can see what you can do with the character and not through some of the links there.

So we have that aspect, where the Mickey Mouse Protection Act in 1998 allowed for the copyright protection further. As Lessig pointed out in Free Culture, much of the Disney franchise was built on the public domain where they copyrighted their own versions of works already in the public domain.

Personally, it doesn’t matter too much to me. I’ve never been a fan of Mickey Mouse, I’m not a big fan of Disney, and I have read much of the original works that Disney built off of and I like them better. You can find most of them at Gutenberg.org.

In other news, OpenAI has admitted that it can’t train it’s AI’s without copyrighted works.

Arguably, if there was more content in the public domain, OpenAI could train it’s AIs on stuff that is in the public domain. Then there’s the creative commons licensed content that could also be used but… well, that’s inconvenient.

So on one hand, we have a corporation making sure people don’t overstep with using Mickey of the Public Domain, which has happened, and on the other hand we have a corporation complaining that copyright is too restrictive.

On one hand, we have a corporation defending what it has under copyright (which people think went into the public domain but didn’t, just that version of Mickey), and on the other hand we have a corporation defending it’s wanton misuse of copyrighted materials.

Clearly something is not right with how we view copyright or innovation. Navigating that with lawyers seems like a disservice to everyone, but here we are.

The Ongoing Copyright Issue with Generative AI.

January 9, 2024 ~ Taran ~ 3 Comments

It’s a strange time. OpenAI (and Microsoft) are being sued by the New York Times and they’re claiming ‘Fair Use’ as if they’re having some coffee and discussing what they read in the New York Times, or are about to write a blog post about the entire published NYT archives, on demand.

It’s not just the New York Times, either. More and more authors are doing the same, or started before NYT.

IEEE’s article, “Generative AI has a Visual Plagiarism problem” demonstrates issues that back up the copyright claims. This is not regurgitation, this is not fair use – there is line by line text from the New York Times, amongst other things.

As I noted yesterday, OpenAI is making deals now for content and only caught this morning that, ‘The New York Times, too, had been in conversations with OpenAI to establish a “high-value” partnership involving “real-time display” of its brand in ChatGPT, OpenAI’s AI-powered chatbot.‘.

Clearly, discussions didn’t work out. I was going to link the New York Times article on it, but it seems I’ve used up my freebies so I can’t actually read it right now unless I subscribe.¹ At this end of things, as a simple human being, I’m subject to paywalls for content, but OpenAI hasn’t been. If I can’t read and cite an article from the New York Times for free, why should they be able to?

On the other hand, when I get content that originated from news sites like the New York Times, there is fair use happening. People transform what they have read and regurgitate it, some more intellligibly than others, much like an artificial intelligence, but there is at least etiquette – linking the source, at the least. This is not something OpenAI does. It doesn’t give credit. It just inhales large amounts of text, the algorithms decide on the best ways to spit them out to answer prompts. Like blogging, only faster, and like blogging, sometimes it just makes stuff up.

This is not unlike a well read person doing the same. Ideas, thoughts, even memes are experiences we draw upon. What makes these generative artificial intelligences different? Speed. They also consume a lot more water, apparently.

The line has to be drawn somewhere, and since OpenAI isn’t living up to the first part of it’s name and is not being transparent, people are left poking a black box to see if their copyright has been sucked in without permission, mention, or recompense.

That does seem a bit like unfair use. This is not to say that the copyright system couldn’t use an overhaul, but apparently so could how generative AIs get their content.

What is that ‘Open’ in OpenAI mean, anyway?

They do seem to have a good deal right now, I did try to subscribe but it failed for some obscure reason. I’ll try again later. $10 for a year of the New York Times is a definite deal, if only they could process my payment this morning. ↩︎

AI and the Media, Misinformation and Narratives.

December 18, 2023 ~ Taran ~ 3 Comments

News was once trusted more, where the people presenting the news were themselves trusted to give people the facts. There were narratives even then, yet there was a balance because of the integrity of the people involved.

Nowadays, this seems to have changed with institutional distrust, political sectarianism and the battle between partisan and ideological Identities versus anti-establishment orientations.

In short, things are wonky.

Now the world’s first news network entirely generated by artificial intelligence is set to launch next year.¹ This seems a bit odd given that the Dictionary.com word of the year is ‘hallucinate’ because of artificial intelligence, as I’ve written about before.

What could possibly go wrong with a news source that is completely powered by artificial intelligence?

This is terrifying. Sure, news will be easier and quicker to produce, but the costs overwhelmingly outweigh the benefits. AI news is a new frontier that will make it easier for bad faith actors to spread misinformation and disinformation. We can't even imagine the impact. https://t.co/yrgNMt2NUe
— Aaricka Washington (@aarickawash) December 14, 2023

Misinformation. Oddly enough, Dr Daniel Williams wrote an interesting article on misinformation, pointing out that misinformation could be a symptom instead of the actual problem. He makes some good points, though it does seem a chicken and egg issue at this point. Which came first? I don’t think anyone can know the answer to that, and if they did, they’d probably not be trusted because things have gotten that bad.

At the same time, I look through my Facebook memories just about every day and note more and more content that I had shared is… gone. Deleted. There’s no reasoning given, and when I do find out that something I shared has been deleted, it’s as informative as a random nun wandering around with a ruler, rapping people’s knuckles and not telling them why she’s doing it.

Algorithms. I don’t know that it’s censorship, but they sure do weed a lot of content and that makes me wonder how much content gets weeded elsewhere. I’m not particularly terrible with my Facebook account or any other account. Like everyone else, I have shared things that I thought to be true that ended up not being true, but I don’t do that very often because I’m skeptical.

We would like to believe integrity is inherent in journalism, but the water got muddied somewhere along the way when news narratives and editorials became more viewed than the actual facts. With the facts, it’s easy to build one’s own narrative though not easy enough when people are too busy making a living to do so. Further, we have a tendency toward viewing that which fits our own world view, the ‘echo chambers’ that pop up now and then such as echoed extremism. To have time to expand beyond our echo chambers, we need to find time to do so and be willing to have our own world views challenged.

Instead, most people are off chasing the red dots, mistaking sometimes being busy as being productive. At a cellular level, we’re all very busy, but that doesn’t mean we’re productive, that we’re adding value to the world around us somehow. There is something to Dr. Daniel Williams’ points on societal malaise.

A news network run completely by artificial intelligence mixed with the world as we have it now doesn’t seem ideal, yet the idea has it’s selling points because media itself isn’t trusted largely because media is built around business, and business is built around advertising, advertising in turn is a game of numbers and to get the numbers you have to get eyeballs looking at the content. Thus, propping up people’s world views is more important when the costs of doing all of that are higher. Is it possible that decreasing the costs would decrease the need to prop up world views for advertising?

We’ll be finding out.

2024 ↩︎