Learning To Predict Probabilities
- Consider predicting survival probability from patient data
- Training examples , where
is 1 or 0
- Want to train neural network to output a probability given
(not a 0 or 1)
- In this case can show
The negation of this quantity is known as the cross entropy.
- In order to maximize that we would need to do gradient
ascent on it, wrt the edge weight. This weight update works
out to be
where
- This is the same rule used by Backpropagation except that
Backpropagation multiplies by an extra term , which is the derivative of the sigmoid
function.
- Backpropagation updates seek ML hypothesis under the
assumption that training data can be modeled by Normal noise
on the target function.
- Cross entropy updates seek ML hypothesis under the
assumption that observed boolean value is a probabilistic
function of input instance.