Bayesian Learning

This talk is based on

1 Introduction

2 Bayes Theorem

2.1 Choosing a Hypothesis

2.2 Example

2.3 Probability Formulas

3 Brute Force Bayes Concept Learning

3.1 Relation to Concept Learning

3.2 MAP Hypothesis and Consistent Learners

4 Learning A Real-Valued Function

5 Learning To Predict Probabilities

6 Minimum Description Length Principle

7 Bayes Optimal Classifier

7.1 Bayes Optimal Classification

8 Gibbs Algorithm

9 Naive Bayes Classifier

9.1 Naive Bayes Classifier

9.2 Naive Bayes Algorithm

  1. Naive_Bayes_Learn(examples)
  2. For each target value $v_j$
    1. $\hat{P}(v_j) \leftarrow $ estimate $P(v_j)$
    2. For each attribute value $a_i$ of each attribute $a$ $\hat{P}(a_i\,|\,v_j) \leftarrow$ estimate $P(a_i\,|\,v_j)$
  1. Classify_New_Instance($x$) \[ v_{NB} = \argmax_{v_{j} \in V} \hat{P}(v_{j}) \prod_{a_i \in x} \hat{P}(a_{i} \,|\, v_{j}) \]

9.3 Naive Bayes Example

9.4 Naive Bayes Issues

10 Learning to Classify Text

10.1 Text Attributes

10.2 Learn Naive Bayes Text

  1. Collect all words and other tokens that occur in Examples.
    • $Vocabulary \leftarrow$ all distinct words and other tokens in $Examples$
  2. Calculate the required $P(v_{j})$ and $P(w_{k}\,|\,v_{j})$ probability terms.
    • $docs_{j} \leftarrow $ subset of $Examples$ for which the target value is $v_{j}$
    • $P(v_{j}) \leftarrow \frac{|docs_{j}|}{|Examples|}$
    • $Text_{j} \leftarrow $ a single document created by concatenating all members of $docs_{j}$
    • $n \leftarrow$ total number of words in $Text_{j}$ (counting duplicate words multiple times)
    • for each word $w_{k}$ in $Vocabulary$
      • $n_{k} \leftarrow$ number of times word $w_{k}$ occurs in $Text_{j}$
      • $P(w_{k}\,|\,v_{j}) \leftarrow \frac{n_{k} + 1}{n + |Vocabulary|}$

10.3 Classify Naive Bayes Text

  1. $positions \leftarrow$ all word positions in $Doc$ that contain tokens found in $Vocabulary$
  2. Return $v_{NB}$, where \[v_{NB} = \argmax_{v_{j} \in V} P(v_{j}) \prod_{i \in positions}P(a_{i}\,|\,v_{j}) \]

11 Bayesian Belief Networks

11.1 Conditional Independence

11.2 Bayesian Belief Network

Bayes net

11.3 Inference in Bayesian Networks

11.4 Learning Bayesian Networks

11.4.1 Learning Bayesian Networks

11.4.2 Gradient Ascent for Bayes Nets

11.5 The Expectation Maximization Algorithm

  1. Calculate probabilities of unobserved variables, assuming $h$.
  2. Calculate new $w_{ijk}$ to maximize $E[\ln P(D\,|\,h)]$ where $D$ now includes both observed and (calculated probabilities of) unobserved variables.

11.5.1 When To Use EM Algorithm

11.5.2 EM Example: Generating Data from k Gaussians

two gaussians

11.5.3 EM for Estimating k Means

  1. E step: Calculate the expected value $E[z_{ij}]$ of each hidden variable $z_{ij}$, assuming the current hypothesis $h = \langle \mu_1, \mu_2 \rangle$ holds. \[ \array{ E[z_{ij}] & = & \frac{p(x=x_i \,|\, \mu = \mu_j)}{\sum_{n=1}^{2} p(x = x_i \,|\, \mu=\mu_n)} \\ & = & \frac{e^{-\frac{1}{2 \sigma^2} (x_i - \mu_j)^2}}{\sum_{n=1}^{2} e^{-\frac{1}{2 \sigma^2} (x_i - \mu_n)^2}} } \]
  2. M step: Calculate a new maximum likelihood hypothesis $h' = \langle \mu_1', \mu_2' \rangle$, assuming the value taken on by each hidden variable $z_{ij}$ is its expected value $E[z_{ij}]$ calculated above. Replace $h = \langle \mu_1, \mu_2 \rangle$ by $h' = \langle \mu_1', \mu_2' \rangle$. \[ \mu_j \leftarrow \frac{\sum_{i=1}^m E[z_{ij}] x_i}{\sum_{i=1}^m E[z_{ij}]} \]

11.5.4 EM Algorithm

11.5.5 General EM Problem

Given: Determine:
  1. $h$ that (locally) maximizes $E[\ln P(Y\,|\,h)]$

11.5.6 General EM Method

  1. E step: Calculate $Q(h'\,|\,h)$ using the current hypothesis $h$ and the observed data $X$ to estimate the probability distribution over $Y$. \[ Q(h' \,|\, h) \leftarrow E[ \ln P(Y \,|\, h') \,|\, h, X ] \]
  2. M step: Replace hypothesis $h$ by the hypothesis $h'$ that maximizes this $Q$ function. \[ h \leftarrow \argmax_{h'} Q(h' \,|\, h) \]

11.6 Summary of Bayes Nets

URLs

  1. Machine Learning book at Amazon, http://www.amazon.com/exec/obidos/ASIN/0070428077/multiagentcom/
  2. Slides by Tom Mitchell on Machine Learning, http://www-2.cs.cmu.edu/~tom/mlbook-chapter-slides.html
  3. Wikipedia:Bayes Theorem, http://www.wikipedia.org/wiki/Bayes_theorem
  4. A Plan for Spam by Paul Graham, http://www.paulgraham.com/spam.html
  5. Spambayes Homepage, http://spambayes.sourceforge.net/

This talk available at http://jmvidal.cse.sc.edu/talks/bayesianlearning/
Copyright © 2009 José M. Vidal . All rights reserved.

25 February 2003, 02:16PM