Learning A Real-Valued Function
- Consider any real-valued target function $f$.
- Training examples $\langle x_{i}, d_{i} \rangle$, where $d_{i}$ is noisy
training value. $d_{i} = f(x_{i}) + e_{i}$, and $e_{i}$ is random
variable (noise) drawn independently for each $x_{i}$ according to
some Gaussian distribution with mean=0.
- The maximum likelihood hypothesis $h_{ML}$ we defined earlier as
\[
\array{
h_{ML} &= \argmax_{h \in H} p(D\,|\,h) \\
&= \argmax_{h \in H} \prod_{i=1}^{m} p(d_{i}\,|\,h)
}
\]
if we replace that probability with its equation and simplify we will eventually get
\[
h_{ML} = \argmin_{h \in H} \sum_{i=1}^{m} \left(d_{i} -
h(x_{i})\right)^{2} \]
- Therefore, $h_{ML}$ is the hypothesis that minimizes the
sum of the squared errors, if observations are generated by
adding Normal noise with zero main to the true data.
- Under these conditions any learning algorithm that minimizes the squared error will output a maximum likelihood hypothesis.
José M. Vidal
.
9 of 39