Radial Basis Functions

Instead of using a linear function, we try to learn a function of the form \[ \hat{f}(x) = w_0 + \sum_{u=1}^{k} w_u K_u(d(x_u,x)) \] where each $x_u$ is an instance from $X$ where the kernel function $K_u(d(x_u,x))$ is defined so that it decreases as the distance $d(x_u,x)$ increases.
$k$ is a user-defined constant that specifies the number of kernel functions to be included.
Note that, even thought $\hat{f}(x)$ is a global approx to $f(x)$, the contribution from each of the $K_u$ terms is localized to a region near $x_u$.
It is common to choose $K$ to be a Gaussian centered around $x_u$ \[ K_u(d(x_u,x)) = e^{\frac{d^2(x_{u},x)}{2\sigma_u^2}} \]
It has been shown that this $\hat{f}(x)$ can approximate any function with arbitrarily small error, given enough $k$ and given that each $\sigma^2$ can be separately specified.