Learning A Real-Valued Function

Consider any real-valued target function $f$.
Training examples $\langle x_{i}, d_{i} \rangle$, where $d_{i}$ is noisy training value. $d_{i} = f(x_{i}) + e_{i}$, and $e_{i}$ is random variable (noise) drawn independently for each $x_{i}$ according to some Gaussian distribution with mean=0.
The maximum likelihood hypothesis $h_{ML}$ we defined earlier as \[ \array{ h_{ML} &= \argmax_{h \in H} p(D\,|\,h) \\ &= \argmax_{h \in H} \prod_{i=1}^{m} p(d_{i}\,|\,h) } \] if we replace that probability with its equation and simplify we will eventually get \[ h_{ML} = \argmin_{h \in H} \sum_{i=1}^{m} \left(d_{i} - h(x_{i})\right)^{2} \]
Therefore, $h_{ML}$ is the hypothesis that minimizes the sum of the squared errors, if observations are generated by adding Normal noise with zero main to the true data.
Under these conditions any learning algorithm that minimizes the squared error will output a maximum likelihood hypothesis.