Backpropagation Details

It does a gradient descent over the entire network weight vector.
It can be generalized to handle arbitraty directed graphs.
It will find a local min, which might not be the global min. In practice, this has not been a large problem (local min are good enough and large dimension means more escape routes).
A popular technique is to add momemtum $α$ $Δ w_{i, j} (n) = η δ_{j} x_{i, j} + α Δ w_{i, j} (n - 1)$
As always, it minimizes the error over the training examples. Will it generalize?
It can be very slow to train.
But, once trained, it can make new categorizations very fast.
Inductive Bias: is very hard to quantify, but can be characterized as smooth interpolation between data points.