Learn Naive Bayes Text
- Given a set of Examples, and V.
- Collect all words and other tokens that occur in
Examples.
- $Vocabulary \leftarrow$ all distinct words and other tokens in $Examples$
- Calculate the required $P(v_{j})$ and $P(w_{k}\,|\,v_{j})$ probability terms.
- $docs_{j} \leftarrow $ subset of $Examples$ for which the target value is $v_{j}$
- $P(v_{j}) \leftarrow \frac{|docs_{j}|}{|Examples|}$
- $Text_{j} \leftarrow $ a single document created by
concatenating all members of $docs_{j}$
- $n \leftarrow$ total number of words in $Text_{j}$ (counting
duplicate words multiple times)
- for each word $w_{k}$ in $Vocabulary$
- $n_{k} \leftarrow$ number of times word $w_{k}$ occurs in
$Text_{j}$
- $P(w_{k}\,|\,v_{j}) \leftarrow \frac{n_{k} + 1}{n + |Vocabulary|}$
José M. Vidal
.
23 of 39