Local Regression Descent

If we choose Eq. 3, we have to re-derive a gradient descent rule using the same arguments as for neural nets.
As such, we can adjust the weights of $\hat{f}(x)$ with \[ \Delta w_i \equiv \eta \sum_{x\in k nearest nbrs of x_q} K(d(x_q,x))(f(x)-\hat{f}(x))a_j(x) \]
This equation will do gradient descent on the weights of $\hat{f}(x)$ so as to minimize its error against $f(x)$.
There are many other much more efficient methods to fit linear functions to a set of fixed training examples (see book for references).
Locally Weighted Regression often uses linear or quadratic functions because more complex functions are too hard to fit and only provide marginal benefits, at best.