Summary

ANNs provide a method for learning real-valued and vector-valued over continuous and discrete-valued attributes.
Backpropagation is the most common algorithm, it works.
The hypothesis space consists of all the functions that can be represented by the network. For three layers this means all function (in some cases an excessive number of nodes might be needed).
Backpropagation uses gradient-descent and so converges to a local minimum.
The hidden layers can invent new features.
Over-fitting is common. It results in networks that generalize poorly