Define $C_G$ to be the G-composition of
$C$—the set of all functions that can be implemented by
network $G$ assuming units take on functions from class
$C$.
$C$ is the concept class represented by each neuron.
If G is a layered DAG with $n$ input nodes, $s \geq 2$
internal nodes each having at most $r$ inputs, and $VC(C) =
d$, then
\[ VC(C_G) \leq 2ds \cdot \log(es) \]
We can use this to bound the number of training examples
sufficient to learn (with prob. at least $1-\delta$) any
target concept from $C_G$ to within error ε.