Overfitting ANNs

The error decreases monotonically over time as a function of the training set.
But, when measure against the validation set the function is non-monotonic. This line is said to measure the generalization accuracy of the network.
The second figure shows that it is hard to pick the min (the validation curve can have local minima).
Some techniques used to fight overfitting include

Decrease each weight by some small factor during each iteration (weight decay). Keep them small so as to bias against learning complex surfaces.
Provide a validation set and use it to monitor the error. But, be careful not to find a minimum too early.
With small data sets, use k-fold cross-validation: divide pile into k disjoint sets. Each time one of the sets is the validation and the other k-1 are the training data.