Information Gain
- The information gain is the expected reduction
in entropy caused by partitioning the examples with respect to
an attribute.
- Given $S$ is the set of example, $A$ the attribute, and
$S_v$ the subset of $S$ for which attribute $A$ has value $v$:
\[ Gain(S,A) \equiv Entropy(S) - \sum_{v \in Values(A)} \frac{|S_{v}|}{|S|}
Entropy(S_{v}) \]
- That is, current entropy minus new entropy.
- Using our set of examples
we can now calculate that
- Original Entropy = 0.94
- Humidity = High entropy = 0.985
- Humidity = Normal entropy = 0.592
- $Gain (S,Humidity) = .94 - \left(\frac{7}{14}\right).984 - \left(\frac{7}{14}\right).592 = .151$
- Wind = Weak entropy = 0.811
- Wind = Strong entropy = 1.0
- $Gain (S,Wind) = .94 - \left(\frac{8}{14}\right).811 - \left(\frac{6}{14}\right)1.0 = .048$
- So Humidity provides a greater information gain.
José M. Vidal
.
10 of 25