Stochastic versus Batch Gradient Descent

Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if $η$ made small enough.
In stochastic the weights are updated after examining each example.
Batch gradient takes longer in its inner loop (large sum) but it can use a larger step size because of this sum.
Stochastic can often avoid falling into local minima.
NOTE: altough these methods assume unthresholded linear units, they can be easily modified to work on regular perceptrons.