The Limits of Open Data and Big Data

Open Data Awards 2015A couple of days ago, one of the many political memes rolling around was how many police shootings there were under different presidencies. People were, of course, spewing rhetoric on the number of lethal shootings there were between one administration in the 1980s and one in the present. I’m being obtuse because this is not about politics.

The data presented showed that there were less shootings under one administration than another, but it was just a raw number. It had no basis in the number of police at the time, the number of arrests, or the number of police per capita.

I decided to dig into that.

The U.S. population has gone from roughly 227 million people (circa 1980) in that time to 318.9 million as of 2014. That’s fairly substantial. But how many police were there in the 1980s? A search on how many police officers there were in the 1980s was simply useless.

I went to the Bureau of Justice Statistics page on local police. It ends up that they only did any form of police officer census from 1992 to the present day in 4 year increments, which means that they didn’t have the data from the 1980s. If that’s true – if there was no data collected prior – it would mean that decisions were being made without basic data analysis back then, but it also means that we hit a limit of open data.

And I’d expended too much time on it (shouldn’t that be easy to find?), so I left it at that.

Assuming that the data simply does not exist, it means that the limit of the data is by simply not collecting it. I find it difficult to believe that this is the case, but I’ll assume good. So the open data is limited.

Assuming that the data exists but is simply not available, it means that the open data is limited.

The point here is that open data has limits, either defined by a simple lack of data or a lack of access to the data. It has limits by collection method (where bias can be inserted), by the level of participation, and so forth.

And as far as that meme, I have no opinion. “Insufficient data” – a phrase I’ve used more often than I should be comfortable with in this day and age.