Naive Bayes Issues
- Conditional independence assumption is often violated
\[ P(a_{1}, a_{2} \ldots a_{n}\,|\,v_{j}) = \prod_{i} P(a_{i} \,|\, v_{j}) \]
but it works surprisingly well anyway.
- We don't need estimated
posteriors $\hat{P}(v_j\,|\,x)$ to be correct; need only that
\[\argmax_{v_{j} \in V} \hat{P}(v_{j}) \prod_{i} \hat{P}(a_{i} \,|\, v_{j}) =
\argmax_{v_{j} \in V} P(v_{j}) P(a_{1} \ldots, a_n \,|\, v_{j}) \]
- Naive Bayes posteriors often unrealistically close to 1 or
0.
- If none of the training instances with target value $v_j$ have attribute
value $a_i$ Then
\[ \hat{P}(a_i\,|\,v_j) = 0 \text{, and...}\]
\[ \hat{P}(v_{j}) \prod_{i} \hat{P}(a_{i} \,|\, v_{j}) = 0 \]
Typical solution is Bayesian estimate for $\hat{P}(a_{i} \,|\, v_{j})$
\[ \hat{P}(a_{i} \,|\, v_{j}) \leftarrow \frac{n_{c} + mp}{n + m} \]
where
- $n$ is number of training examples for which $v=v_j$,
- $n_c$ number of examples for which $v=v_j$ and
$a=a_i$
- $p$ is prior estimate for $\hat{P}(a_{i} \,|\, v_{j})$
- $m$ is weight given to prior (i.e. number of ``virtual'' examples)
José M. Vidal
.
20 of 39