Gibbs Algorithm

Bayes optimal classifier provides best result, but can be expensive if many hypotheses.
Gibbs algorithm:
1. Choose one hypothesis at random, according to $P(h\,|\,D)$
2. Use this to classify new instance
Surprising fact: Assume target concepts are drawn at random from $H$ according to priors on $H$. Then: \[ E[error_{Gibbs}] \leq 2 E[error_{Bayes Optimal}] \] So, if the learner correctly assumes a uniform prior distribution over $H$, then
- Pick any hypothesis from VS, with uniform probability
- Its expected error is no worse than twice Bayes optimal