Naive Bayes Issues

Conditional independence assumption is often violated \[ P(a_{1}, a_{2} \ldots a_{n}\,|\,v_{j}) = \prod_{i} P(a_{i} \,|\, v_{j}) \] but it works surprisingly well anyway.
We don't need estimated posteriors $\hat{P}(v_j\,|\,x)$ to be correct; need only that \[\argmax_{v_{j} \in V} \hat{P}(v_{j}) \prod_{i} \hat{P}(a_{i} \,|\, v_{j}) = \argmax_{v_{j} \in V} P(v_{j}) P(a_{1} \ldots, a_n \,|\, v_{j}) \]
Naive Bayes posteriors often unrealistically close to 1 or 0.
If none of the training instances with target value $v_j$ have attribute value $a_i$ Then \[ \hat{P}(a_i\,|\,v_j) = 0 \text{, and...}\] \[ \hat{P}(v_{j}) \prod_{i} \hat{P}(a_{i} \,|\, v_{j}) = 0 \] Typical solution is Bayesian estimate for $\hat{P}(a_{i} \,|\, v_{j})$ \[ \hat{P}(a_{i} \,|\, v_{j}) \leftarrow \frac{n_{c} + mp}{n + m} \] where
- $n$ is number of training examples for which $v=v_j$,
- $n_c$ number of examples for which $v=v_j$ and $a=a_i$
- $p$ is prior estimate for $\hat{P}(a_{i} \,|\, v_{j})$
- $m$ is weight given to prior (i.e. number of ``virtual'' examples)