←
^
→
Artificial Neural Networks
Stochastic versus Batch Gradient Descent
Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if
η
made small enough.
In stochastic the weights are updated after examining each example.
Batch gradient takes longer in its inner loop (large sum) but it can use a larger step size because of this sum.
Stochastic can often avoid falling into local minima.
NOTE: altough these methods assume unthresholded linear units, they can be easily modified to work on regular perceptrons.
José M. Vidal
.
17 of 33