**Why does Deep Learning work?**

This is the big question on everyone’s mind these days. C’mon we all know the answer already:

*“the long-term behavior of certain neural network models are governed by the statistical mechanism of infinite-range Ising spin-glass Hamiltonians” [1] *In other words,

*Multilayer Neural Networks are just Spin Glasses*

Ok, so what is this and what does it imply?

In a recent paper by LeCun, he attempts to extend our understanding of training neural networks by studying the SGD approach to solving the multilayer Neural Network optimization problem [1]. Furthermore, he claims

*None of these works however make the attempt to explain the paradigm of optimizing the highly non-convex neural network objective function through the prism of spin-glass theory and thus in this respect our approach is very n*ovel. And this is kinda true

But here’s the thing…we already have a good idea of what the Energy Landscape of multiscale spin glass models* look like–from early theoretical protein folding work (by Wolynes, Dill, etc [2,3,4]). In fact, here is a typical surface:

*[technically these are Ising spin models with multi-spin interactions]

Let us consider the nodes, which above represent partially folded states, as nodes in a multiscale spin glass–or , say, a multilayer neural network. Immediately we see the analogy and the appearance of the ‘Energy funnel’ In fact, researchers have studied these ‘folding funnels’ of spin glass models over 20 years ago [2,3,4]

Note: the Wolynes protein-folding spin-glass model is significantly different from the p-spin Hopfield model that LeCun discusses because it contains multi-scale, multi-spin interactions. These details matter.

So with a surface like this, it is not so surprising that an SGD method might be able to find the Energy minima (called the Native State in protein folding theory). We just need to go down, and jump around enough to get over the local saddle points. This, infact, defines a so-called ‘folding funnel’ Indeed, such a surface, at the time, seemed to be necessary to resolve Levinthal’s paradox [4]

*So is not surprising at all, in fact, that SGD may work*.

Then again, *a real theory* of protein folding, which would actually be able to fold a protein correctly (i.e. Freed’s approach [5]), would be a lot more detailed than a simple spin glass model. Likewise, real Deep Learning systems are going to have a lot more engineering details (Dropout, Pooling, Momentum) than a theoretical spin glass model. Still, hopefully we can learn something by looking at the spin glass models from theoretical chemistry and physics. So…

In this next series of posts, we are going to look at some depth at LeCun’s recent analysis, at some fundamentals of condensed matter theory of spin glasses, and just what is going in current Deep Learning research and the applied methods. [1] LeCun et. al., *The Loss Surfaces of Multilayer Networks, 2015* [2] Spin glasses and the statistical mechanics of protein folding, PNAS, 1987 [3] THEORY OF PROTEIN FOLDING: The Energy Landscape Perspective, Annu. Rev. Phys. Chem. 1997 [4] From Levinthal to pathways to funnels, Nature, 1997 [5] Mimicking the folding pathway to improve homology-free protein structure prediction, 2008