Decision Tree Learning

This talk is based on

1 Introduction

1.1 Uses

[833+,167-] .83+ .17-
Fetal_Presentation = 1: [822+,116-] .88+ .12-
| Previous_Csection = 0: [767+,81-] .90+ .10-
| | Primiparous = 0: [399+,13-] .97+ .03-
| | Primiparous = 1: [368+,68-] .84+ .16-
| | | Fetal_Distress = 0: [334+,47-] .88+ .12-
| | | | Birth_Weight < 3349: [201+,10.6-] .95+ .05-
| | | | Birth_Weight >= 3349: [133+,36.4-] .78+ .22-
| | | Fetal_Distress = 1: [34+,21-] .62+ .38-
| Previous_Csection = 1: [55+,35-] .61+ .39-
Fetal_Presentation = 2: [3+,29-] .11+ .89-
Fetal_Presentation = 3: [8+,22-] .27+ .73-

2 Representation

Tree

3 When to Consider Decision Trees

  1. Instances describable by attribute-value pairs.
  2. Target function is discrete valued.
  3. Disjunctive hypothesis may be required.
  4. Possibly noisy training data.
  5. The training data may contain missing attribute values.

4 Building a Decision Tree

  1. $A \leftarrow$ the best decision attribute for next $node$.
  2. Assign $A$ as decision attribute for $node$.
  3. For each value of $A$, create new descendant of $node$.
  4. Sort training examples to leaf nodes.
  5. If training examples perfectly classified, Then STOP, Else iterate over new leaf nodes.

4.1 Choosing the Best Attribute

Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

4.2 Entropy

4.3 Entropy as Encoding Length

4.4 Non Boolean Entropy

4.5 Information Gain

4.6 ID3

4.7 ID3 Example

5 Hypothesis Space Search by ID3

6 Inductive Bias

6.1 Restriction and Preference Biases

6.2 Occam's Razor

7 Issues in Decision Tree Learning

7.1 Overfitting

Overfitting

7.1.1 Dealing With Overfitting

7.1.2 Reduced-Error Pruning

Results
  1. Do until further pruning is harmful:
    1. Evaluate impact on validation set of pruning each possible node (plus those below it)
    2. Greedily remove the one that most improves validation set accuracy

7.1.3 Rule Post-Pruning

Decision tree
  1. Infer tree as well as possible.
  2. Convert tree to equivalent set of rules.
  3. Prune each rule by removing any preconditions that result in improving its estimated accuracy.
  4. Sort final rules by their estimated accuracy and consider them in this sequence when classifying.

7.2 Continuous-Valued Attributes

7.3 Attributes with Many Values

7.4 Unknown Attribute Values

7.5 Attributes With Costs

URLs

  1. Machine Learning book at Amazon, http://www.amazon.com/exec/obidos/ASIN/0070428077/multiagentcom/
  2. Slides by Tom Mitchell on Machine Learning, http://www-2.cs.cmu.edu/~tom/mlbook-chapter-slides.html
  3. Wikipedia definition of Occams Razor, http://www.wikipedia.org/wiki/Occam%27s_Razor
  4. C4.5 source code page, http://www.cse.unsw.edu.au/~quinlan/

This talk available at http://jmvidal.cse.sc.edu/talks/decisiontrees/
Copyright © 2009 José M. Vidal . All rights reserved.

22 January 2003, 07:51AM