Perceptron training rule is guaranteed to succeed if the
training examples are linearly separable and the learning rate
is sufficiently small.
The linear unit training rule (gradient descent) will
converge to the hypothesis with minimum squared error given a
sufficiently small learning rate, even with noise and when
training data not separable by .
Gradient descent has two problems
Convergence to a local minimum can sometimes be
slow.
There is no guarantee that we will find the global minimum.