ANNs provide a method for learning real-valued and
vector-valued over continuous and discrete-valued
attributes.
Backpropagation is the most common algorithm, it
works.
The hypothesis space consists of all the functions that
can be represented by the network. For three layers this means
all function (in some cases an excessive number of nodes might
be needed).
Backpropagation uses gradient-descent and so converges to
a local minimum.
The hidden layers can invent new features.
Over-fitting is common. It results in networks that
generalize poorly