Deep Learning - Nature

transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level.

  • The first layer of representation typically represent the presence or absence of edges at particular orientations and locations in the image.
  • The second layer typically detects motifs by spotting particular arrangements of edges, regardless of small variations in the edge positions.
  • The third layer may assemble motifs into larger combinations that correspond to parts of familiar objects, and subsequent layers would detect objects as combinations of these parts.

Stochastic Gradient Descent (SGD)

  • A few examples (small sets)
  • computing the outputs and the errors
  • computing the average gradient for the examples
  • adjusting the weights accordingly

It is called stochastic because each small set of examples
gives a noisy estimate of the average gradient over all examples.

good features can be learned automatically using a general-purpose learning procedure is the key advantage of deep learning.

but the ReLU typically learns much faster in networks with many layers, allowing training of a deep supervised network without unsupervised pre-training.

Convolutional neural networks

Key idea:

  • local connections
  • shared weights
  • pooling
  • the use of many layers