Skip navigation







FIELD Phys:StringTheory
DATE November 23 (Mon), 2020
TIME 17:00-18:00
PLACE Online
SPEAKER de Mello Koch, Robert
HOST Papadimitriou, Ioannis
INSTITUTE University of Witwatersrand, 남아프리카공화국
TITLE Why deep networks generalize
ABSTRACT Training a deep network involves applying an algorithm which fixes the parameters of the network. The performance of the trained deep network is evaluated by studying the trained network's performance on unseen test data. The difference between how the network performs on the training data and on unseen data defines a generalization error. Networks that perform as well on unseen data as they did on training data, have a small generalization error.

We have definite expectations for the size of the generalization error, based essentially on common sense. If the training data set is much smaller than the number of parameters in the network, training can fit any data perfectly, so that errors and noise are captured during training. Typical deeps network applications use deep networks with hundreds of millions of parameters, trained using data sets with tens of thousands of parameters. Clearly then, we are squarely in the regime of large generalization errors. Remarkably however, for typical deep learning applications, the generalization error is small. This begs the question: why do deep nets generalize?

In this talk we develop parallels between deep learning and the renormalization group to explain why deep networks generalize.
  • list