Radial Basis Function Networks

We can view the new $\hat{f}(x)$ as a two-layer network were the first layer computes the various $K$ and the second does the linear combination of them.
The network can be trained in two stages:

The number $k$ of hidden units is determined and each unit $u$ is defined by choosing the values of $x_u$ and $\sigma_u^2$ that define its kernel function $K_u(d(x_u,x))$ (e.g., with EM).
The weights $w_u$ are trained to maximize the fit of the network to the data using the global squared error criterion.

Since the $K_u$ are fixed for the second step it can be trained with efficient linear function fit methods.