Artificial Neural Networks

*Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if $\eta $ made small enough.*- In stochastic the weights are updated after examining each example.
- Batch gradient takes longer in its inner loop (large sum) but it can use a larger step size because of this sum.
- Stochastic can often avoid falling into local minima.
*NOTE: altough these methods assume unthresholded linear units, they can be easily modified to work on regular perceptrons.*

17 of 33