Minimum Description Length Principle
- We can interpret the definition of $h_{MAP}$ in terms of the basic concepts of information theory
\[
\array{
h_{MAP} &= &\arg \max_{h \in H}P(D\,|\,h) P(h) \\
&= &\arg \max_{h \in H} \log_{2} P(D\,|\,h) + \log_{2} P(h) \\
&= &\arg \min_{h \in H} - \log_{2} P(D\,|\,h) - \log_{2} P(h)
}
\]
This equation can be interpreted as a statement that short hypotheses
are preferred!
- $- \log_{2} P(h)$ is length of $h$ under optimal
code
- $- \log_{2} P(D\,|\,h)$ is length of $D$ given $h$ under optimal code
- The minimum description length principle
recomends choosing the hypothesis that minimizes the sum of
these two description lengths.
- If a representation of hypotheis is chosen so that the
size of $h$ is $- \log_2 P(h)$ and if a representation for
exceptions is chosen so that the encoding length of $D$ given
$h$ is $- \log_2 P(D\,|\,h)$ then the MDL principle produces MAP
hypotheses.
José M. Vidal
.
11 of 39