open data

MIT Technology Review has a meandering article, “A.I Can Be Made Legally Responsible for It’s Decisions“. In it’s own way, it tries to chart the territories of trade secrets and corporations, threading a needle that we may actually need to change to adapt to using Artificial Intelligence (AI).

One of the things that surprises me in such writing and conversations is not that it revolves around protecting trade secrets – I’m sorry, if you put your self-changing code out there and are willing to take the risk, I see that as part of it – is that it focuses on the decision process. Almost all bad decisions in code I have encountered have come about because the developers were hidden in a silo behind a process that isolated them… sort of like what happens with an AI, only two-fold.

If the decision process is flawed, the first thing to be looked at is the source data for the decisions – and in an AI, this can be a daunting task as it builds learning algorithms based on… data. And so, you have to delve into whether the data used to build those algorithms was corrupt or complete – the former is an issue we get better at minimizing, the latter cannot be solved if only because we as individuals and more so as a society are terrible at identifying what we don’t know.

So, when it comes to legal responsibility of code on a server, AI or not, who is responsible? The publishing company, of course, though if you look at software licensing over the decades you’ll find that software companies have become pretty good at divesting themselves of responsibility. “If you use our software we are not responsible for anything”, is a good short read that most end user license agreements and software licenses have in there, and by clicking through the OK, you’re basically indemnifying the publisher. That, you see, is the crux of of the problem when we speak of AI and responsibility.

In the legal frameworks, camped Armies of Lawyers wait on retainer for anything to happen so that they can defend their well paying client who by simply pointing at a contract that puts all responsibility on the user. Lawyers can argue that point, but they get paid to and I don’t. I’m sure there are some loopholes. I’m sure that when pushed into a corner by another company with similar or better legal resources, ‘settle’ becomes a word used more frequently.

So, if companies can’t be held responsible for their non-AI code, how can they be held responsible for their AI code?

Free Software and Open Source software advocates such as myself have made these points more often than not in so many ways – but this AI discussion extends into data as well, which pulls the Open Data Initiative into the spotlight as well.

The system is flawed in this regard, so to discuss whether an AI can be responsible for it’s decisions is silly. The AI won’t pay a fine, the AI won’t go to jail (what does ‘life’ mean for an AI, anyway?). Largely, it’s the court of public opinion that guides things – and that narrative is easily changed by PR people who have a side door to the legal department.

So let’s not discuss AI and responsibility. Let’s discuss code, data and responsibility – let’s go back to where the root of the problem exists. I’m not an MIT graduate, but I do understand Garbage In, Garbage Out (GIGO).

A couple of days ago, one of the many political memes rolling around was how many police shootings there were under different presidencies. People were, of course, spewing rhetoric on the number of lethal shootings there were between one administration in the 1980s and one in the present. I’m being obtuse because this is not about politics.

The data presented showed that there were less shootings under one administration than another, but it was just a raw number. It had no basis in the number of police at the time, the number of arrests, or the number of police per capita.

I decided to dig into that.

The U.S. population has gone from roughly 227 million people (circa 1980) in that time to 318.9 million as of 2014. That’s fairly substantial. But how many police were there in the 1980s? A search on how many police officers there were in the 1980s was simply useless.

I went to the Bureau of Justice Statistics page on local police. It ends up that they only did any form of police officer census from 1992 to the present day in 4 year increments, which means that they didn’t have the data from the 1980s. If that’s true – if there was no data collected prior – it would mean that decisions were being made without basic data analysis back then, but it also means that we hit a limit of open data.

And I’d expended too much time on it (shouldn’t that be easy to find?), so I left it at that.

Assuming that the data simply does not exist, it means that the limit of the data is by simply not collecting it. I find it difficult to believe that this is the case, but I’ll assume good. So the open data is limited.

Assuming that the data exists but is simply not available, it means that the open data is limited.

The point here is that open data has limits, either defined by a simple lack of data or a lack of access to the data. It has limits by collection method (where bias can be inserted), by the level of participation, and so forth.

And as far as that meme, I have no opinion. “Insufficient data” – a phrase I’ve used more often than I should be comfortable with in this day and age.

KnowProSE.com

Where one line can make a difference.

Artificial Intelligences and Responsibility.

The Limits of Open Data and Big Data