statistics

Everyone’s out to protect you online because everyone’s out to get you online. It’s a weird mix of people who want to use your data for profit and those who want to use your data for profit.

Planned obsolescence is something that has become ubiquitous in this technological age. It wasn’t always this way. Things used to be produced to last, not be replaced, and this is something to ponder before joining a line to get the latest iPhone, or when a software manufacturer shifts from a purchase model (where the license indicates you don’t really own the software sometimes!) to a subscription model.

The case has been made that software can’t be produced or maintained for free. The case has also been made, with less of a marketing department, that Free Software and Open Source software can do the same at a reduced cost. The negotiations are ongoing, but those who built their corporations from dumpster diving to read code printouts definitely have the upper hand.

Generally speaking, the average user doesn’t need complicated. In fact, the average user just wants a computer where they can browse the internet, write simple documents and spreadsheets. Corporations producing software on the scale of Microsoft, Google, Amazon, and so on don’t really care too much about what you need, they care about maintaining market share so that they can keep making money. Software has more features than the average user knows what to do with.

Where the business decisions are made, it’s about the bottom line. It’s oddly like something else we’re seeing a lot of lately. It seems unrelated, yet it’s pretty close to the same thing when you think about it.

“…This is true of the cat detector, and it is true of GPT-4 — the difference is a matter of the length and complexity of the output. The AI cannot distinguish between a right and wrong answer — it only can make a prediction of how likely a series of words is to be accepted as correct. That is why it must be considered the world’s most comprehensively informed bullshitter rather than an authority on any subject. It doesn’t even know it’s bullshitting you — it has been trained to produce a response that statistically resembles a correct answer, and it will say anything to improve that resemblance...

…It’s the same reason AI can produce a Monet-like painting that isn’t a Monet — all that matters is it has all the characteristics that cause people to identify a piece of artwork as his. Today’s AI approximates factual responses the way it would approximate “Water Lilies.”…”
The Great Pretender, Devin Coldeway, TechCrunch, April 3, 2023.

Abstracted away, the large language models aren’t that different than business teams – except, maybe, business teams could actually care about their consumers, but instead rely on statistics – just like large language models do. It’s a lot like the representations of Happy, Strong and Tough that I wrote about with AI generated images. It’s an approximation based on what the models and algorithms are trained on – which is… us.

There could be a soul to the Enterprise, I suppose, but maybe the Enterprise needs to remember where it comes from.

A couple of days ago, one of the many political memes rolling around was how many police shootings there were under different presidencies. People were, of course, spewing rhetoric on the number of lethal shootings there were between one administration in the 1980s and one in the present. I’m being obtuse because this is not about politics.

The data presented showed that there were less shootings under one administration than another, but it was just a raw number. It had no basis in the number of police at the time, the number of arrests, or the number of police per capita.

I decided to dig into that.

The U.S. population has gone from roughly 227 million people (circa 1980) in that time to 318.9 million as of 2014. That’s fairly substantial. But how many police were there in the 1980s? A search on how many police officers there were in the 1980s was simply useless.

I went to the Bureau of Justice Statistics page on local police. It ends up that they only did any form of police officer census from 1992 to the present day in 4 year increments, which means that they didn’t have the data from the 1980s. If that’s true – if there was no data collected prior – it would mean that decisions were being made without basic data analysis back then, but it also means that we hit a limit of open data.

And I’d expended too much time on it (shouldn’t that be easy to find?), so I left it at that.

Assuming that the data simply does not exist, it means that the limit of the data is by simply not collecting it. I find it difficult to believe that this is the case, but I’ll assume good. So the open data is limited.

Assuming that the data exists but is simply not available, it means that the open data is limited.

The point here is that open data has limits, either defined by a simple lack of data or a lack of access to the data. It has limits by collection method (where bias can be inserted), by the level of participation, and so forth.

And as far as that meme, I have no opinion. “Insufficient data” – a phrase I’ve used more often than I should be comfortable with in this day and age.

KnowProSE.com

Where one line can make a difference.

It’s All Statistics.

The Limits of Open Data and Big Data