User the old $error_D(d)$ definition—the proportion
of examples from $D$ misclassified by $h$.
But, what about B?
Define $error_B(h)$ to be the probability that $h$ will
disagree with B on the classification of a randomly drawn
instance.Then find
\[ h = \argmin_{h \in H} k_D error_D(h) + k_B error_B(h) \]
But, what do we set $k_D$ and $k_B$ to?
Which one is more reliable, the data or the
theory?