Deep Learning, Information Bottlenecks – and Osmosis.

I’ve experimented in the past with deep learning in a few different ways, coming up with my own thoughts on how things work and why they work. It was apparent to me when I stopped that in 2016 that I was missing something, and that I needed some distance between myself and the topic at hand. I gave up those Pine64s, and as it happened, moved away from where I was doing it – more importantly, divorcing me from a Software Engineering world where ‘solutions right now’ always trumped ‘solutions’, the former the harbinger of problems, the latter the Holy Grail of every software engineer who dare dream in a world that, except for the minority, requires lockstep precision within an industry that spends it’s time firefighting because of solutions-right-now.

It’s disenchanting. Being disenchanted allows for little in the way of real solutions, at least for myself.

And today I read, “New Theory Cracks Open The Black Box of Deep Neural Networks“. Of course, deep learning is not that new, and the ‘Information Bottleneck’ thought stems from the original work in 1999, the Information Bottleneck Method. That works perhaps in explaining things on a surface level and on an informational level – but as I read it, I was reminded of secondary school biology: Osmosis. No one has seemed to connect the two when they are so suitably connected, and I’d wager that Osmosis scales better since the information bottlenecks, when themselves in a matrix, pretty much would mimic a tunable osmosis.

That said, I’ve found the major problem with deep learning to be that we define inputs when, quite possibly, we should be more loose in our definitions of what we put in. This aligns better with chaos theory – something that the Wired article seems to dismiss:

…When Schwab and Mehta applied the deep belief net to a model of a magnet at its “critical point,” where the system is fractal, or self-similar at every scale, they found that the network automatically used the renormalization-like procedure to discover the model’s state. It was a stunning indication that, as the biophysicist Ilya Nemenman said at the time, “extracting relevant features in the context of statistical physics and extracting relevant features in the context of deep learning are not just similar words, they are one and the same.”

The only problem is that, in general, the real world isn’t fractal. “The natural world is not ears on ears on ears on ears; it’s eyeballs on faces on people on scenes,” Cranmer said…

Pragmatically, this is what we see when we work on projects – but the problem is not what we see, it’s what we don’t see. It’s the things we don’t intuitively connect ourselves because of our own limitations; with simple deep learning we may get away with what we see, but on a much larger scale, we may be looking at the motion of wings of a butterfly on the other side of the world causing a tipping point that creates a hurricane on the other.

Of course, this is all theory, and hardly some earth shattering change in the way we look at things – but a small change in how we approach things could well be what we need to move forward at various intersections. In this, I am trying to be a simple butterfly flapping his wings.

Leave a Reply

Your email address will not be published. Required fields are marked *