When Is An Algorithm ‘Expressive’?

Yesterday, I was listening to the webinar on Privacy Law and the United States First Amendment when I heard that lawyers for social networks are claiming both that they have free speech as a network as a speaker, as well as claiming not to be the speaker and claiming they are simply are presenting content users have expressed under the Freedom of Speech. How the arguments were presented I don’t know, and despite showing up for the webinar I am not a lawyer1. The case before the Supreme Court was being discussed, but that’s not my focus here.

I’m exploring how it would be possible to claim that a company’s algorithms that impact how a user perceives information could be considered ‘free speech’. I began writing this post about that and it became long and unwieldy2, so instead I’ll write a bit about the broader impact of social networks and their algorithms and tie it back.

Algorithms Don’t Make You Obese or Diabetic.

If you say the word ‘algorithm’ around some people, their eyes immediately glaze over. It’s really not that complicated; a repeatable thing is basically an algorithm. A recipe when in use is an algorithm. Instructions from Ikea are algorithms. Both hopefully give you what you want, and if they don’t, they are ‘buggy’.

Let’s go with the legal definition of what an algorithm is1. Laws don’t work without definitions, and code doesn’t either.

Per Cornell’s Legal Information Institute, an algorithm is:

“An algorithm is a set of rules or a computational procedure that is typically used to solve a specific problem. In the case of Vidillion, Inc. v. Pixalate Inc. an algorithm is defined as “one or more process(es), set of rules, or methodology (including without limitation data points collected and used in connection with any such process, set of rules, or methodology) to be followed in calculations, data processing, data mining, pattern recognition, automated reasoning or other problem-solving operations, including those that transform an input into an output, especially by computer.” With the increasing automation of services, more and more decisions are being made by algorithms. Some examples are; criminal risk assessments, predictive policing, and facial recognition technology.”

By this definition and perhaps in it’s simplest form, adding two numbers is an algorithm, which also fits just about any technical definition out there. That’s not at issue.

What is at issue in the context of social networks is how algorithms impact what we view on a social networking website. We should all understand in the broad strokes that Facebook, Twitter, TikTok and their ilk are in the business of showing people what they want to see, and to do this they analyze what people view so that they can give people what they want.

Ice cream and brownies for breakfast, everyone!

Let’s agree every individual bit of content you see that you can act on, such as liking or re-transmitting, is a single item. Facebook sees you like ice cream, Facebook shows you posts of ice cream incessantly. Maybe you go out and eat ice cream all the time because of this and end up with obesity and diabetes. Would Facebook be guilty of making you obese and diabetic?

Fast food restaurants aren’t considered responsible for making people obese and diabetic. We have choices about where we eat, just as we have choices about what we do with our lives outside of a social network context. Further, almost all of these social networks give you options to not view content, from blocking to reporting to randomly deleting your posts and waving a finger at you for being naughty – without telling you how.

Timelines: It’s All A Story.

As I wrote elsewhere, we all choose our own social media adventures. Most of my social networks are pretty well tuned to feed me new things to learn every day, while doing a terrible job of providing me information on what all my connections are up to. It’s a real estate problem on social network sites, and not everyone can be in that timeline. Algorithms pick and choose, and if there are paid advertisements to give you free access, they need space too.

Think of it all as a personal newspaper. Everything you see is picked for you based on what the algorithms decide, and yet all of that information is competing to get into your eyeballs, maybe even your brain. Every story is shouting ‘pick me! pick me!’ with catchy titles, wonderful images, and maybe even some content – because everyone wants you to click to their website so you can hammer them with advertising.4

Yet when we step back from those individual stories, the social networking site is curating things in a chronological order. Let’s assume that what it thinks you like to see the most is at the top and it goes down in priority based on what the algorithms have learned about you.

Now think of each post as a page in a newspaper. What’s on the front page affects how you perceive everything in the newspaper. Unfortunately, because it’s all shoved into a prioritized list for you, you get things that are sometimes in a strange order, giving a weird context.

Sometimes you get stray things you’re not interested in because the algorithms have grouped you with others. Sometimes the priority of what you last wrote about will suddenly have posts related to it covering every page in that newspaper.

You might think you’re picking your own adventure through social media, but you’re not directly controlling it. You’re randomly hitting a black box to see what comes out in the hope that you might like it, and you might like the order that it comes in.

We’re all beta testers of social networks in that regard. They are constantly tweaking algorithms to try to do better, but doing better isn’t necessarily for you. It’s for them, and it’s also for training their artificial intelligences more than likely. It’s about as random as human interests are.

Developing Algorithms.

Having written software in various companies over the decades, I can tell you that if there’s a conscious choice to express something with them, to get people to think one way or the other (the point of ‘free speech’), it would have to be very coordinated.

Certain content would have to be weighted as is done with advertising. Random content churning through feeds would not fire things off with the social networking algorithms unless they manually chose to do so across users. That requires a lot of coordination, lots of meetings, and lots of testing.

It can be done. With advertising as an example, it has been done overtly. Another example is the last press against fake news, which has attempted to proactively check content with independent fact checkers.

Is that free speech? Is that freedom of expression of a company? If you look at this case again, you will likely draw your own conclusions. Legally, I have no opinion because I’m not a lawyer.

But as a software engineer, I look at it and wonder if this is a waste of the Court’s time.

  1. It should be in the interest of software engineers and others about the legal aspects of what we have worked on and will work on. Ethics are a thing. ↩︎
  2. It still is, and I apologize if it’s messy. This is a post I’ll likely have to revisit and edit. ↩︎
  3. Legal definitions of what an algorithm is might vary around the world. It might be worth searching for a legal definition where you are. ↩︎
  4. This site has advertising. It doesn’t really pay and I’m not going to shanghai viewers by misrepresenting what I write. It’s a choice. Yet to get paid for content, that’s what many websites do. If you are here, you’re appreciated. Thanks! ↩︎

Firing Up Recommendations: Pyrorank.

Being a bit busy with other things, I didn’t get to write a little but about a new recommendation algorithm which involves artificial intelligence: Pyrorank. It has some lofty claims, largely hidden by academia and academic verbiage.

I initially read about it in Researchers devise algorithm to break through ‘search bubbles’ on July 10th and put it into the stack of things I consider interesting. ‘Search’ and ‘Bubbles’ mean something to some of us who, in the days of antiquity, wrote our own search and sort algorithms from scratch.

What this new algorithm is claimed to do is give you better recommendations by “reducing the impact of users’ profiles and broadening recommendations that still reflect the focus of the search, producing more diverse and useful results.”

In other words, recommendations on Netflix, or Amazon.com, or even advertising on Facebook could be less annoyingly predictable, showing us the same things – some of which we may have already seen, purchased, or passed over before. Personally, it wasn’t long before I started seeing present recommendation algorithms as a tyranny, and with my eclectic tastes that can be supremely annoying.

Recommendation systems, used by Google, Netflix, and Spotify, among others, are algorithms that use data to suggest or recommend products or choices to consumers based on the users’ past purchases, search history, and demographics. However, these parameters bias search outcomes because they put users in filter bubbles.

“The traditional way recommendation systems work is by basing recommendations on the notion of similarity,” explains Bari, who leads the Courant Institute’s Predictive Analytics and AI Research Lab. “This means that you will see similar items in the choice and recommended lists based on either users similar to you or similar items you have bought. For instance, if I am an Apple product user, I will increasingly see more and more Apple products in my recommendations.”

The limitations of existing recommendation systems have become evident in striking ways. For instance, political partisans may be largely directed to news content that aligns with their pre-existing views. More significantly, recommender systems have turned up self-harm videos to susceptible individuals.

Researchers devise algorithm to break through ‘search bubbles‘, New York University, 10 July 2023.

This sounds hopeful, particularly for social media algorithms which we have seen have reinforced polarized views. It could increase the size of the echo chambers (there are always echo chambers) while adding diversity to them.

Thus I did some digging this morning and found this paper, “Pyrorank: A Novel Nature-Inspired Algorithm to Promote Diversity in Recommender Systems“, which is unfortunately paywalled. I have requested the full text, and if I do manage to get it and I do find anything particularly interesting in it, I’ll post a bit more on it.

I also looked for different perspectives on it; others would have looked at the paper and found different things worth highlighting. My focus was on how it actually compares.

A comparison was made between Pyrorank and the traditional recommendation system to test the viability of each system. This experiment was carried out on large Movielens, Good Books, and Goodreads datasets. The objective of the testing was to find out which system stays true to the purpose of accurate recommendations while simultaneously providing diversification and unrepeated results.

The results were hugely in favor of Pyrorank, which not only stuck to genuine core recommendations but also gave mixed results that did not align with the past results of product purchases or to someone who is a similar user.

Pyrorank – A New Pathway For Your Search Engine, DigitalWorld, Web Desk, Wednesday, July 12, 2023

Unfortunately, it doesn’t say how rigorous the testing was, but it does sound a little promising.

What is interesting is that this is based on pyrodiversity, which is something that I had never considered, so to me this is a little new and exciting – once it lives up to it’s claimed results.

There may be hope for recommendations yet.

Exploring Beyond Code 2.0: Into A World of AI.

It’s become a saying on the Internet without many people understanding it: “Code is Law”. This is a reference to one of the works of Lawrence Lessig, revised already since it’s original publication.

Code Version 2.0 dealt with much of the nuances of Law and Code in an era where we are connected by code. The fact that you’re reading this implicitly means that the Code allowed it.

Here’s an example that weaves it’s way throughout our society.

One of the more disturbing things to consider is that when Alexis de Tocqueville wrote Democracy in America 1, he recognized the jury as a powerful mechanism for democracy itself.

“…If it is your intention to correct the abuses of unlicensed printing and to restore the use of orderly language, you may in the first instance try the offender by a jury; but if the jury acquits him, the opinion which was that of a single individual becomes the opinion of the country at large…”

Alexis de Tocqueville, Volume 1 of Democracy in America, Chapter XI: Liberty of the Press In the United States (direct link to the chapter within Project Gutenberg’s free copy of the book)

In this, he makes the point that public opinion on an issue is summarized by the jury, for better and worse. Implicit in that is the discussion within the Jury itself, as well as the public opinion at the time of the trial. This is indeed a powerful thing, because it allows the people to decide instead of those in authority. Indeed, the jury gives authority to the people.

‘The People’, of course, means the citizens of a nation, and within that there is discourse between members of society regarding whether something is or is not right, or ethical, within the context of that society. In essence, it allows ethics to breathe, and in so doing, it allows Law to be guided by the ethics of a society.

It’s likely no mistake that some of the greatest concerns in society stem from divisions in what people consider to be ethical. Abortion is one of those key issues, where the ethics of the rights of a woman are put into conflict with the rights of an unborn child. On either side of the debate, people have an ethical stance based on their beliefs without compromise. Which is more important? It’s an extreme example, and one that is still playing out in less than complimentary ways for society.

Clearly no large language model will solve it, since the large language models are trained with implicitly biased training models and algorithms which is why they shouldn’t be involved, and this would likely go for general artificial intelligences of the future. Machine learning, or deep learning, learns from us, and every learning model is developed by it’s own secret jury whose stewed biases may not reflect the whole of society.

In fact, they would reflect a subset of society that is as disconnected from society as the companies that make them, since the company hires people based on it’s own values to move toward their version of success. Companies are about making money. Creating value is a very subjective thing for human society, but money is it’s currency.

With artificial intelligence being involved in so many things and with them becoming more and more involved, people should at the least be concerned:

  • AI-powered driving systems are trained to identify people, yet darker shades of humanity are not seen.
  • AI-powered facial recognition systems are trained on datasets of facial images. The code that governs these systems determines which features of a face are used to identify individuals, and how those features are compared to the data in the dataset. As a result, the code can have a significant impact on the accuracy and fairness of these systems, which has been shown to have an ethnic bias.
  • AI-powered search engines are designed to rank websites and other online content according to their relevance to a user’s query. The code that governs these systems determines how relevance is calculated, and which factors are considered. As a result, the code can have a significant impact on the information that users see, and therefore what they discuss, and how they are influenced.
  • AI-powered social media platforms are designed to connect users with each other and to share content. The code that governs these platforms determines how users are recommended to each other, and how content is filtered and ranked. As a result, the code can have a significant impact on the experiences of users on these platforms – aggregating into echo chambers.

We were behind before artificial intelligence reared it’s head recently with the availability of large language models, separating ourselves in ways that polarized and made compromise impossible.

Maybe it’s time for Code Version 3.0. Maybe it’s time we really got to talking about how our technology will impact society beyond a few smart people.

1 This was covered in Volume 1 of ‘Democracy in America‘, available for free here on Project Gutenberg.

Why Social Media Moderation Fails

Ukrainian Military Tractor Pulling Moscow Parody
A clear parody of a Ukrainian tractor pulling the Moscow.

Moderation of content has become a bit ridiculous on social media sites of late. Given that this post will show up on Facebook, and the image at top will be shown, it’s quite possible that the Facebook algorithms that have run amok with me over similiar things, clear parody, may further restrict my account. I clearly marked the image as a parody.

Let’s see what happens. I imagine they’ll just toss more restrictions on me, which is why Facebook and I aren’t as close as we once were. Anyone who thinks a tractor pulling the sunk Moskva really happened should probably have their head examined, but this is the issue of such algorithms left unchecked. It quite simply is impossible, implausible, and… yes, funny, because Ukrainian tractors have invariably been the heroes of the conflict, even having been blown up when their owners were simply trying to reap their harvests.

But this is not about that.

This is about understanding how social media moderation works, and doesn’t, and why it does, and doesn’t.

What The Hell Do You Know?

Honestly, not that much. As a user, I’ve steered clear of most problems with social networks simply by knowing it’s not a private place where I can do as I please – and even where I can, I have rules of conduct I live by that are generally compatible with the laws of society.

What I do know is that when I was working on the Alert Retrieval Cache way back when, before Twitter, the problem I saw with this disaster communication software was the potential for bad information. Since I couldn’t work on it by myself because of the infrastructural constraints of Trinidad and Tobago (which still defies them for emergency communications), I started working on the other aspects of it, and the core issue was ‘trusted sources’.

Trusted Sources.

To understand this problem, you go to a mechanic for car problems, you go to a doctor for medical problems, and so on. Your mechanic is a trusted source for your car (you would hope). But what if your mechanic specializes in your car, but your friend has a BMW that spends more time in the shop than on the road? He might be a trusted source.

You don’t see a proctologist when you have a problem with your throat, though maybe some people should. And this is where the General Practitioner comes in to basically give you directions on what specialist you should see. With a flourish of a pen in alien handwriting, you are sent off to a trusted source related to your medical issue – we hope.

In a disaster situation, you have on the ground people you have on the ground. You might be lucky enough to have doctors, nurses, EMTs and people with some experience in dealing with a disaster of whatever variety that’s on the table, and so you have to do the best with what you have. For information, some sources will be better than others. For getting things done, again, it depends a lot on the person on the ground.

So the Alert Retrieval Cache I was working on after it’s instantiation was going to have to deal with these very human issues, and the best way to deal with that is with other humans. We’re kind of good at that, and it’s not something that AI is very good at because AI is built by specialists and beyond job skills, most people are generalists.You don’t have to be a plumber to fix a toilet, and you don’t have to be a doctor to put a bandage on someone. What’s more, people can grow beyond their pasts despite an infatuation in Human Resources with the past.

Nobody hires you to do what you did, they hire you to do what they want to do in the future.

So just in a disaster scenario, trusted sources are fluid. In an open system not confined to disasters, open to all manner of cute animal pictures, wars, protests, and even politicians (the worst of the lot in my opinion), trusted sources is a complete crapshoot. This leads everyone to trust nothing, or some to trust everything.

Generally, if it goes with your cognitive bias, you run with it. We’re all guilty of it to some degree. The phrase, “Trust but verify” is important.

In social media networks, ‘fact checking’ became the greatest thing since giving up one’s citizenship before a public offering. So fact checking happens, and for the most part is good – but, when applied to parody, it fails. Why? Because algorithms don’t have a sense of humor. It’s either a fact, or it’s not. And so when I posted the pictures of Ukrainian tanks towing everything, Facebook had a hissy fit, restricted my account and apparently had a field day going through past things I posted that were also parody. It’s stupid, but that’s their platform and they don’t have to defend themselves to me.

Is it annoying? You bet. Particularly since no one knows how their algorithms work. I sincerely doubt that they do. But this is a part of how they moderate content.

In protest, does it make sense to post even more of the same sort of content? Of course not. That would be shooting one’s self in the foot (as I may be doing now when this posts to Facebook), but if you’ve already lost your feet, how much does that matter?

Social media sites fail when they don’t explain their policies. But it gets worse.

Piling on Users.

One thing I’ve seen on Twitter that has me shaking my head, as I mentioned in the more human side of Advocacy and Social Networks, is the ‘Pile On’, where a group of people can get onto a thread and overload someone’s ability to respond to one of their posts. On most networks there is some ‘slow down’ mechanism to avoid that happening, and I imagine Twitter is no different, but that might be just from one specific account. Get enough accounts doing the same thing to the same person, it can get overwhelming from the technical side, and if it’s coordinated – maybe everyone has the same sort of avatar as an example – well, that’s a problem because it’s basically a Distributed Denial of Service on another user.

Now, this could be about all manner of stuff, but the algorithms involved don’t care about how passionate people might feel about a subject. They could easily see commonalities in the ‘attack’ on a user’s post, and even on the user. A group could easily be identified as doing pile ons, and their complaints could be ‘demoted’ on the platform, essentially making it an eyeroll and, “Ahh.These people again.”

It has nothing to do with the content. Should it? I would think it should, but then I would want them to agree with my perspective because if they didn’t, I would say it’s unfair. As Lessig wrote, Code is Law. So there could well be algorithms watching that. Are there? I have no earthly idea, but it’ something I could see easily implemented.

And for being someone who does it, if this happens? It could well cause problems for the very users trying to advocate a position. Traffic lights can be a real pain.

Not All In The Group Are Saints.

If we assume that everyone in our group can do no wrong, we’re idiots. As groups grow larger, the likelihood of getting something wrong increases. As groups grow larger, there’s increased delineation from other groups, there’s a mob mentality and there’s no apology to be had because there’s no real structure to many of these collective groups. When Howard Rheingold wrote about Smart Mobs, I waited for him to write about “Absolutely Stupid Mobs”, but I imagine that book would not have sold that well.

Members of groups can break terms of service. Now, we assume that the account is looked at individually. What happens if they can be loosely grouped? We have the technology for that. Known associates, etc, etc. You might be going through your Twitter home page and find someone you know being attacked by a mob of angry clowns – it’s always angry clowns, no matter how they dress – and jump in, passionately supporting someone who may have well caused the entire situation.

Meanwhile, Twitter, Facebook, all of them simply don’t have the number of people to handle what must be a very large flaming bag of complaints on their doorstep every few microseconds. Overwhelmed, they may just go with what the algorithms say and call it a night so that they can go home before the people in the clown cars create traffic.

We don’t know.

We have Terms of Service for guidelines, but we really don’t know the algorithms these social media sites run to check things out. It has to be at least a hybrid system, if not almost completely automated. I know people on Twitter who are on their third accounts. I just unfollowed one today because I didn’t enjoy the microsecond updates on how much fun they were having jerking the chains of some group that I won’t get into. Why is it their third account? They broke the Terms of Service.

What should you not do on a network? Break the Terms of Service.

But when the terms of service are ambiguous, how much do they really know? What constitutes an ‘offensive’ video? An ‘offensive’ image? An ‘offensive’ word? Dave Chappelle could wax poetic about it, I’m sure, as could Ricky Gervais, but they are comedians – people who show us the humor in an ugly world, when permitted.

Yet, if somehow the group gets known to the platform, and enough members break Terms of Service, could they? Would they? Should they?

We don’t know. And people could be shooting themselves in the foot.

It’s Not Our Platform.

As someone who has developed platforms – not the massive social media platforms we have now, but I’ve done a thing or two here and there – I know that behind the scenes things can get hectic. Bad algorithms happen. Good algorithms can have bad consequences. Bad algorithms can have good consequences. Meanwhile, these larger platforms have stock prices to worry about, shareholders to impress, and if they screw up some things, well, shucks, there’s plenty of people on the platform.

People like to talk about freedom of speech a lot, but that’s not really legitimate when you’re on someone else’s website. They can make it as close as they can, following the rules and laws of many nations or those of a few, but really, underneath it all, their algorithms can cause issues for anyone. They don’t have to explain to you why the picture of your stepmother with her middle finger up was offensive, or why a tractor towing a Russian flag ship needed to be fact checked.

In the end, there’s hopefully a person at the end of the algorithm who could be having a bad day, or could just suck at their job, or could even just not like you because of your picture and name. We. Don’t. Know.

So when dealing with these social networks, bear that in mind.