Bayesian Learning

Learn Naive Bayes Text

  1. Collect all words and other tokens that occur in Examples.
    • $Vocabulary \leftarrow$ all distinct words and other tokens in $Examples$
  2. Calculate the required $P(v_{j})$ and $P(w_{k}\,|\,v_{j})$ probability terms.
    • $docs_{j} \leftarrow $ subset of $Examples$ for which the target value is $v_{j}$
    • $P(v_{j}) \leftarrow \frac{|docs_{j}|}{|Examples|}$
    • $Text_{j} \leftarrow $ a single document created by concatenating all members of $docs_{j}$
    • $n \leftarrow$ total number of words in $Text_{j}$ (counting duplicate words multiple times)
    • for each word $w_{k}$ in $Vocabulary$
      • $n_{k} \leftarrow$ number of times word $w_{k}$ occurs in $Text_{j}$
      • $P(w_{k}\,|\,v_{j}) \leftarrow \frac{n_{k} + 1}{n + |Vocabulary|}$

José M. Vidal .

23 of 39