A Spanish Wired.com

When my friend, Miguel Ángel Pérez Álvarez (MAPA), posted an article from Wired.com, I was surprised to find out that Wired has a Spanish version marketed toward Argentina, Chile, Colombia, Spain, Mexico, Panama and Peru. It doesn’t appear to have crossover to the English version which is peculiar, given how I know MAPA.

MAPA and I had made each other’s acquaintance through MISTICA and CARDICIS, which were both about breaking down language and cultural barriers using technology, CARDICIS more so than MISTICA.

I was writing about what he actually wrote but I kept coming back to this point. His article is as poignant in Spanish as English, and I’m not sure why Wired.com hasn’t thought to translate the articles between the two.

That it has existed and I didn’t know about it disturbs me a bit. I don’t know what good articles I missed there just because I happen to be an anglophone. Fortunately, my Spanish is much improved.

We talk about bias, and here we have a language bias. Again.

Distilling Traffic

Having pulled Data Transfer out of cars, I’ll revisit traffic itself:

“…Each of them is a physical record of their ancestors, dating back to their, marked by life events – living memory. In minds alone, each human brain is 100 terabytes, with a range of 1 Terabyte to 2.5 Petabytes according to present estimates. Factor in all the physical memory of our history and how we lived, we’re well past that…”

me, Traffic, RealityFragments, June 6th 2023

So while we’re all moving memory in traffic, we’re also moving history. Our DNA holds about 750 megabytes, according to some sources, of our individual ancestry as well as a lot of tweaks to our physiology that make us different people. Let’s round off the total memory to 2 Terabytes, 1 conservative terabyte for what our brain holds and roughly another terabyte of DNA (conservative here, liberal there…). 100 cars with only drivers is 200 Terabytes.

Conservatively. Sort of. Guesstimate built of guesstimates. It’s not so much about the values as the weight, as you’ll see.

Nature uses only the longest threads to weave her patterns, so that each small piece of her fabric reveals the organization of the entire tapestry.

Richard Feynman, Chapter 1, The Law of Gravitation, p. 34 – The Character of Physical Law (1965)

Now, from all that history, we have ideas that have been passed on from generation to generation. Books immediately come to mind, as do other things like language, culture and tradition. All of these pass along ideas from generation, distilling things toward specific ends even while we distill our own environment to our own ends, or lack thereof which is an end. That’s a lot of information linked together, and that information is linked to the ecological systems that we’re connected to and their history.

Now, we’re beginning to train artificial intelligences on training models. What are in those training models? In the case of large language models, probably lots of human writing. In the case of images, lots of images. And so on. But these models are disconnected in ways that we are not, and we are connected in ways that we’re still figuring out.

I mean, we’re still learning some really interesting stuff about photosynthesis, something most of us were likely taught about in school. So these data models AI’s are being trained on through deep learning are subject to change and have to be changed as soon as information in that data model is outdated.

Who chooses what gets updated? It’s likely not you or me since we don’t even know what’s in these training models. For all we know, it’s data from our cellphones tracking us in real time, which isn’t that farfetched, but for now we can be fairly sure it’s someone who has decided what is in the machine learning models in the first place. Which, again, isn’t us.

What if they decide to omit… your religious text of choice? Or let’s say that they only want to train it on Mein Kampf and literature of that ilk. Things could go badly, and while that’s not really in the offing right now… we don’t know.

This impacts future generations and what they will do and how they will do it. It even impacts present generations. This seems like something we should be paying attention to.

We all live in our own little bubbles, after all, and our bubbles don’t have much influence on learning models for artificial intelligence. That could be a problem. How do we deal with it?

First, we have to start with understanding the problem, and most people including myself are only staring at pieces of the problem from our own little bubbles. Applications like ChatGPT just distill bubbles depending on their models.

Language as a Communication Technology

We don’t talk about how much language is a communication technology with it’s own compatibilities and incompatibilities. Until around 2004, I had no idea how much of an impact it had. CARDICIS opened my eyes to a lot, as simple as it was.

Growing up, the education system and or my school decided that it was a brilliant idea to teach both Spanish and French at the same time, and I quickly decided neither was worth pursuing not because of my teachers, but because I did not understand how important it was and because it was initially difficult for me. Then I got behind, then I got further behind, and then I was past the point of no return for the academic aspect in secondary school.

If the system was dumb, so was I, but the system was older and I had the excuse of being young and ignorant.

I bring this up for a few reasons. In the vein of what I have been writing about bias, medium and messages and other stuff in the context of artificial intelligence, language is a bigger deal than what most monophones (people who speak only one language) might begin to understand.

“Because language flows in the same direction as other elements of culture… For the most part, language flows in the other direction, from the conquerors to the conquered.”

Speaking Of Tongues: Justin E.H. Smith On The Mysteries Of Language“, Justin E.H. Smith, Professor of Philosophy at the University of Paris

Language through conquest has long been a topic of colonialism. The Treaty of Torsedillas that Spain and Portugal agreed to started off by a Papal decree. Conquest was religious, divided by language because of division of European nations. England, France and the Netherlands simply chose to ignore the treaty. England was still Catholic when the treaty was written, and continued to be until 1534.

The medium was the language, and the message was the language, but the message brought new medium of religion which had it’s own message, and so on.

“…I think we’re now moving into a period when we will leave it to the machines to speak to each other. A lot of the tedious work of coding came during an early phase of computing. We’re developing artificial intelligence to do that for us. When we have only machines speaking machine, however, it’s going to be a big problem, because their language is going to proliferate beyond our ability to fully grasp even how it’s proliferating.”

Speaking Of Tongues: Justin E.H. Smith On The Mysteries Of Language“, Justin E.H. Smith, Professor of Philosophy at the University of Paris.

Well, when artificial intelligences did start talking to each other – we’ve already seen it happen, as in 2017 when Facebook chatbots were designed to negotiate with each other. In 2022, DALL-E2 was noted using it’s own language as well. In fact, I wouldn’t be surprised if it had not happened before and we didn’t hear about it. We certainly will be hearing more about it.

None of this is simple. Different languages evolved largely because of geographic isolation until people started wandering between civilizations. There’s other types of isolation too, but in the end what is remarkable is that these large language models, and machine learning/deep learning with a multilingual content may be some of the best ways for us to get that universal translator that Star Trek always has breaking (probably Microsoft updates), and that Douglas Adams simplified to a BabelFish.

The Great Wave off Kanagawa, Hokusai.

And it goes beyond that.

Consider The Great Wave Off Kanagawa. Some of you reading this, if not most at the time of this writing, read left to right so when we view the image, we see the wave first, then the boats. Many people don’t even notice the boats and the people on them.

Yet the artist Hokusai was Japanese, and Japanese look from right to left. A native Japanese speaker would likely see the boats first, and therefore the peril those in the boats are in seems more real. A simple thing like the direction of reading impacts how we view images even beyond language.

As a software engineer, I wrestled with internationalization, different keyboard types, etc – but if we accept language as a communication technology, as well as an art, it’s pretty clear that transmitting and receiving information lacks the depth of interpreting.

That can be much more complicated for us, as well as our creations presenting unbiased output for us.

Justin E. H. Smith, who I quoted from an interview, has a fresh book coming out that you can pre-order now: The Internet Is Not What You Think It Is: A History, a Philosophy, a Warning

That Other Linguistic Bias…

language barrier
Hands off my tags! Michael Gaida

 It seemed a bit strange to me to write about the bias in English when I have also been aware of the linguistic diversity of the Internet for some time. I didn’t shove that in because I was not up to date on the latest data regarding language and those connecting to the Internet. As luck would have it, I just found it here in the form of a spreadsheet, updated as of this month of this year. 
 
It shows promise. We went from 64% of humans connected to 67% in one year. More languages from the continent of Africa are represented. Information like this reveals an implicit bias that most people are not aware of – the invisible 33%. 
Our framing on the Internet tends to neglect them. We have a tendency to believe that everyone is connected. We’re not.

What’s more, that simple bit of information also demonstrates that training a large language model or an AI that leaves 33% of humanity out should give us pause. It won’t, but it should. 33% of humanity can’t access the Internet. Cultures and languages aren’t represented.
But technology waits for no one because tech companies wait for no one because they need us to keep buying technology.

The Babel Problem

Babel TowerSome self-centric perspectives shared using social media creating a communication failure got me thinking more about information and how it affects us, as individuals, and how it affects humanity. It’s also something that I’ve been researching off and on, and one which has me working on a hobby software project related to it.

Information is everywhere. We’re all pattern recognition and information analysis experts in our own right. It’s a part of being human, as Stephen Pinker wrote about in the context of language, which is one of the  ways that we process and communicate information. There is the nature aspect, and there is the nurture aspect, which is often seen as a matter of which has more influence.

This is particularly interesting in this day and age for a variety of reasons, particularly when interacting using social media.

Language is the most obvious barrier, and translation algorithms are getting much better – but interpretation of translations leaves much to be desired at times. Another aspect is dialect, born of geography, which do not always translate well. There are some who will argue about cultural identity, but if cultural identity isolates, what use is that identity?

Another aspect is the ability of people to actually read and write to be understood.  While we may have a lot more literacy in the world than we did some decades ago, functional literacy is something different and is something that educational systems only measure within their own dialects. This leads to how people think, because people typically communicate as clearly as they think. And what affects how we think?

We get into world views – a factor of nurture, largely, and the ability to process the information of our world clearly. The most obvious aspect of these prejudices has to do with the color of skin of human beings – something that haunts us despite scientific evidence that there are no actual races. Other things are less obvious.

There are commonalities, as mentioned in a very thorough exploration by Pierre Levy in “The Semantic Sphere“, that weave commonality through concepts around the world despite language – but they can fail in that last mile of neurons, as people may have very different reactions to the same concepts.

When it comes to all of this, I live a very different life and look at things, at times, in very different ways than others. This has allowed me to sometimes solve problems that others could not solve.

Everyone looks at things differently, but commonly, people don’t look at things that differently when they read what everyone else reads, watch what everyone else watches, and thus think fairly closely to what other people think.

That, in turn, gives us the codification of problems in a way that is sometimes more popular than correct, and thus any solution may be solving the wrong problem. It’s a convoluted mess when you start thinking about it (and worse, trying to express it as I am here).

And that, really, is the core of this post. A thought of why the people who come up with appropriate solutions are typically the ones who can identify what the problems actually are… in a world of popularity.

 

Media Responsibility and Learning.

I often cringe when I read what people share on social media. Aside from the inner proofreader that was so necessary as a youth, I run across things like, “TTPS: Illegal entry into T&T is a crime“.

What else is illegal that is a crime? 

If the goal was to make the Trinidad and Tobago Police Service look illiterate – mission accomplished. If the goal was to make The Morning Brew, a local program, look a bit foolish – mission accomplished. And it’s there for all the world to see.

If you watch the video, though, the headline is does not represent what was actually said – a distillation that demonstrates a lack of thought and consideration.

Who came up with this headline, and do they even understand their mistake?

This prompted me to immediately mock it, of course – pondering with a friend as to what else that is illegal might be a crime.

Murder is illegal, so is it a crime?  Littering is illegal, so is it a crime? And so on and so forth – which amused me for a few minutes, but then it struck me:

There are people who may seriously be thinking in that way.

Words have a power all their own, and the way we all learn is not by reading dictionaries but through context.

So yes, I’m picking on this particular headline, which is unfair. In a world where all too often people share without reading the associated link, we’re implicitly showing people how to communicate by example. There could be a secondary school student right now writing an essay that may reach pull the ‘illegal’ and ‘crime’ thing out of their bag unwittingly… only to be openly mocked by an English teacher and their class.

Why? Because they made the mistake of learning from a media headline.

Global Audiences, Global Publishing

Cloudy Earth

I wrote ‘Local Context In All Context In A Networked World‘ a few weeks before I wrote ‘Writers Without Borders‘.

That there’s a common theme is not a mistake. On a planet where we now can know almost instantaneously know what is happening on other parts of the planet, we as a whole aren’t really that good at communicating across the very same planet. Beyond the obvious, where lack of internet connection is a problem, we face other human challenges.

Language remains a barrier. There have been strides in automatic translation, but it’s still far from perfect and may always be. Our language evolves, enough such that ‘figuratively’ and ‘literally’ mean the same in our newest dictionaries – both figuratively and literally. Colloquialisms defy translation because they are so easily misinterpreted in other parts of the world.

‘Paw paw’, using Google Translate today, translates to the Spanish ‘garra’ – which translates back to ‘Claw’. In Trinidad and Tobago, ‘paw paw’ is a colloquialism for ‘papaya’. A green paw paw is not a green claw, at least in Trinidad and Tobago.

Babel. It’s all meaningless babel. And in a world that makes more and more use of Natural Language Processing, such that large amounts of information are analyzed and presented to a human without human interaction, there could be a human at the other end of that software wondering why people in Trinidad and Tobago eat claws.

Then we get into different acronyms – there are so many acronyms around the world.

Now, one can argue that other people need to learn everything. One can spend a lot of time doing that, and being insulted by people who don’t understand what you’re trying to communicate – or worse, insulting people who don’t understand what you’re trying to communicate. Is the goal to fight over these things or is it to be misunderstood?

For me, it’s to be misunderstood. For corporations, it’s about being understood. For governments… well, maybe not, but at least some of us think that the goal of governments should be to be understood.

Think Global, Act Local‘ doesn’t make as much sense on a planet where we actually do act globally by sharing information.

We need to think global and act global – and still act local.

This is a hard thing to think about. It’s alien. Our societies evolved as much through distance from other societies as other things – in fact, the distance was a large part of helping define a society. Immigration departments have taken over that job, and while they do serve a purpose, I have yet to hear someone happy about immigration. In fact, if they were happy, immigration would probably detain them.

But… Writing?

But what does that mean for writing in particular? Honestly, not as much as one would think if writers adhere to some good practice developed over the course of the 10,000 year history of writing. Things like, when using a potentially unknown acronym, expanding it the first time. With technology that is now a few decades old, we can link to a reference.

We can give appropriate context. We can tag our content, and for the sake of the space-time continuum, we should have dates and times instead of simply, “yesterday” or “Tomorrow” or… These have been standard communication guidelines for centuries, if not millennia.

This is not hard.

Mediation, Media, Social Media, Journalism

El Mercurio newsroom
El Mercurio Newsroom, by JD Lasica.

We use language and communication so much that sometimes we take it for granted.

‘Media’, ‘mediation’ – when we look at these words, it’s all but impossible to note the exact first 5 letters. This is no coincidence. They both derive from the noun, ‘medium‘. Digging further gets you to a Proto-Indian root, ‘*medhyo‘, something you can drill further down into if you wish.

It’s an interesting history in not words, but concepts and thoughts. Medium has been used to describe, ‘intermediate agency, channel of communication’ since around 1600. The basis of ‘media’ and ‘mediate’ is medium. Are they so different in concept?

In theory, no. In practice these days, it’s hard to say.

Mediation

As mentioned before, I took the first level of training in Mediation at the Conflict Resolution and Media Center of Trinidad and Tobago, and after hours I began thinking about the common etymology of ‘media’ and ‘mediate’ which got us to where we are, here. Yet when I look at the two as they are now, through a fresh lens, that seems to be the only way in which they are linked other than through some serendipity.

Mediation is a confidential process that works toward resolution of conflict through communication facilitated by a neutral third party. I did learn a few things.

Media, on the other hand, has come to mean any communication over one or more mediums. Newspapers use paper and literacy, radio uses sound and radio frequencies, television uses sound, video technologies, and sometimes literacy, and the Internet combines all of these to varying extents. ‘Social Media’ is redundant, really, because all media is social – it’s really media that allows easier feedback, and these day, allows things to be shared faster than other forms of media, driven by interests of users.

From Media To Journalism

‘Media’ encapsulates entertainment, education, and news. However, these days, we hear it used in the context of ‘news’ a lot. The lines between entertainment, education and news have blurred with the ‘talking heads’ and the prevalence of bias to sell advertising or simply to keep it. So when we hear about ‘The Media’ in this context, it’s about a specific use of the media. It’s about what we are given as news. And journalism is where ‘news’ is supposed to come from, or where we say it’s supposed to come from.

If you talk to anyone with a point of view, they will say that there is bias in published journalism – be it published in print, on radio, on television, or on the Internet – and that’s where things can get fuzzy. And so does what a journalist actually is. As Mark Lyndersay points out in , “What Is A Journalist?“:

…Paul Richards asked, “Who or what constitutes a journalist and should be protected by this?”

“And more importantly, who should not be considered a journalist?”

The American Press Institute notes, “Asking who is a journalist is the wrong question, because journalism can be produced by anyone.”

As the Institute explains on a series of pages on its website dedicated to considering the role of journalism professionals (report here), the journalist is a “committed observer.”

In 2011, “We Are All Journalists Now” by Scott Gant covered the same issue. It’s 7 years later, and I’m not sure society has changed enough to deal with it sensibly. And if we get into the etymology of ‘journalist’, we find this:

1690s, “one whose work is to write or edit public journals or newspapers,” from French journaliste.

As A.J. Liebling wrote, “Freedom of the press is guaranteed only to those who own one.” The Internet gave everyone with access to the Internet access to such a press. To publish publicly without a media organization, potentially publishing things less biased by advertisers – but then, to make money, advertising became necessary, and all that happened was the atomizing of the same business model.

What all of this really gets to, though, is an phrase attributed to Edmund Burke, supposedly used in a debate in 1787  when the House of Commons of Great Britain was opened to the press.

Indirect But Significant Influence

There are 2 definitions of the Fourth Estate defined on Dictionary.com:

  1. the journalistic profession or its members; the press.
  2. a group other than the usual powers, as the three estates of France, that wields influence in the politics of a country.

The first definition fit better before the Internet, where there was a more substantial difference between journalists and the general public. The second definition fits better in modern times, where we can all publish. And there you have the link between journalism and the public as it shifts in one definition.

These days, the more popular what you share is, the more influence you have – for better or worse. What others share that you have demonstrates how much influence you have as well – a closed circuit.

Thus, if we can get past definitions of ‘journalist’ and ‘journalism’, words doomed to a period when journalists broadcast instead of interacted, we get back to us all being a part of the Fourth Estate.

But what does this all have to do with mediation? Not that much right now, it seems, and yet, maybe it should. The Fourth Estate is necessarily not confidential, but maybe it could be more neutral. Maybe that’s what they should have in common. Maybe that ‘neutral third party’ should be everyone publishing to some metaphorical public journal. Maybe we should all be facilitating facts instead of regurgitating hearsay – after all, hearsay is heresy.

An informed public, after all, is what I expect from journalism. What I get, on the other hand, hardly seems to fit Journalistic Ethics and Standards. I can’t criticize what happens in the industry, because all I know is hearsay – but I can make a few distinctions that I believe can accepted and agreed upon as truths in the context of journalism aspect of the media:

  • When it comes to the media in the context of news, people need to be informed. They want to be entertained. The two are separate.
  • Publishers are the ‘media’, journalists are not the media unless they self-publish. If they don’t self-publish, they just work for the media.
  • With the atomization of the Fourth Estate, anyone who publishes has a greater responsibility when using their influence.

In these ways and more, we might get ‘media’ and ‘mediation’ to make more sense together when we see those common five letters.

 

Language And Tech (2014)

tweet

It’s official, for better or worse: ‘Tweet’ is now recognized in the Oxford dictionary despite breaking at least one OED rule: It’s not 10 years old yet.

Big Data‘ also made it in, as did ‘crowdsourcing‘, ‘e-reader‘, ‘mouseover‘ and ‘redirect‘ (new context). There’s a better writeup in the June 2013 update of the Oxford English Dictionary (OED) that also dates the use of the phrase, “don’t have a cow, man” back to 1959 – to the chagrin of Bart‘s fans everywhere, I’m sure.

As a sidenote, those that use twitter are discouraged from being twits and ‘sega’ is actually a dance from the Mascarene Islands.

It’s always interesting to watch how language evolves and sometimes it’s a little disturbing. I honestly don’t know how I should feel about ‘tweet’ making it in as the brand ‘twitter’ is based on the word ‘twit’… see above link… but hey. Oxford says it’s ok and twits and tweeters everywhere can now rejoice.

Image courtesy Nancy L. Stockdale and made available through this Creative Commons License.