Entropy as Encoding Length

We can also say that $Entropy(S)$ equals the expected number of bits needed to encode class ($\oplus$ or $\ominus$) of randomly drawn member of $S$ using the optimal, shortest-length code.
Why?
Information theory: optimal length code assigns $- \log_{2}p$ bits to message having probability $p$.
Imagine I'm choosing elements from $S$ at random and telling you whether they are $\oplus$ or $\ominus$. How many bits per element will I need? (We work-out encoding beforehand).
If message has probability 1 then its encoding length is 0. Why?
If probability .5 then we need 1 bit (the maximum).
So, the expected number of bits to encode whether a random member of $S$ is $\oplus$ or $\ominus$ is: of $S$: \[ p_{\oplus} (-\log_{2} p_{\oplus}) + p_{\ominus} (-\log_{2} p_{\ominus}) \] \[ Entropy(S) \equiv - p_{\oplus} \log_{2} p_{\oplus} - p_{\ominus} \log_{2} p_{\ominus} \]