Of Digital Shadows And Digital Ghosts

Ice, Shadow and StoneIn writing about shadows and ghosts, it’s hard not to draw the line to how we process data – the phrase big data gets tossed around a lot in this way.

Data Science allows us to create constructs of data – interpreted and derived, insinuated and insulated, when in fact we know about as much about that data as we do the people in our own lives – typically insufficient to understand them as people, something I alluded to here.

Data only tells us what has happened, it doesn’t tell us what will happen, and it’s completely based on the availability we frame in and from data. We can create shadows from that data, but the real value of data is in the ghosts – the collected data in contexts beyond our frames and availability.

This is the implicit flaw in machine learning and even some types of AI. It’s where ethics intersects technology when the technologies have the capacity to affect human lives for better and worse, because it becomes a problem of whether it’s fair.

And we really aren’t very good at ‘fair’.


The Limits of Open Data and Big Data

Open Data Awards 2015A couple of days ago, one of the many political memes rolling around was how many police shootings there were under different presidencies. People were, of course, spewing rhetoric on the number of lethal shootings there were between one administration in the 1980s and one in the present. I’m being obtuse because this is not about politics.

The data presented showed that there were less shootings under one administration than another, but it was just a raw number. It had no basis in the number of police at the time, the number of arrests, or the number of police per capita.

I decided to dig into that.

The U.S. population has gone from roughly 227 million people (circa 1980) in that time to 318.9 million as of 2014. That’s fairly substantial. But how many police were there in the 1980s? A search on how many police officers there were in the 1980s was simply useless.

I went to the Bureau of Justice Statistics page on local police. It ends up that they only did any form of police officer census from 1992 to the present day in 4 year increments, which means that they didn’t have the data from the 1980s. If that’s true – if there was no data collected prior – it would mean that decisions were being made without basic data analysis back then, but it also means that we hit a limit of open data.

And I’d expended too much time on it (shouldn’t that be easy to find?), so I left it at that.

Assuming that the data simply does not exist, it means that the limit of the data is by simply not collecting it. I find it difficult to believe that this is the case, but I’ll assume good. So the open data is limited.

Assuming that the data exists but is simply not available, it means that the open data is limited.

The point here is that open data has limits, either defined by a simple lack of data or a lack of access to the data. It has limits by collection method (where bias can be inserted), by the level of participation, and so forth.

And as far as that meme, I have no opinion. “Insufficient data” – a phrase I’ve used more often than I should be comfortable with in this day and age.