Bayesian Learning

Classify Naive Bayes Text

Given a Doc

$positions \leftarrow$ all word positions in $Doc$ that contain tokens found in $Vocabulary$
Return $v_{NB}$, where \[v_{NB} = \argmax_{v_{j} \in V} P(v_{j}) \prod_{i \in positions}P(a_{i}\,|\,v_{j}) \]

This algorithm was shown to classify Usenet articles into their appropriate newsgroups with 89% accuracy.
A similar approach was proposed by Paul Graham in A Plan for Spam. Several implementations exist such as Spambayes.

José M. Vidal .

24 of 39