On the Power of Shallow Learning


A deluge of recent work has explored equivalences between wide neural networks and kernel methods. A central theme is that one can analytically find the kernel corresponding to a given wide network architecture, but despite major implications for architecture design, no work to date has asked the converse question: given a kernel, can one find a network that realizes it? We affirmatively answer this question for fully-connected architectures, completely characterizing the space of achievable kernels. Furthermore, we give a surprising constructive proof that any kernel of any wide, deep, fully-connected net can also be achieved with a network with just one hidden layer and a specially-designed pointwise activation function. We experimentally verify our construction and demonstrate that, by just choosing the activation function, we can design a wide shallow network that mimics the generalization performance of any wide, deep, fully-connected network.

Paper is currently under submission for NeuRIPS 2021. The preprint can be found at arXiv/2106.03186.

Sajant Anand
Sajant Anand
Physics PhD Student, UC Berkeley

I work at the interesection of quantum information and condensed matter, designing tensor network applications to simulate quantum systems on both classical and quantum computers.