Learning Sets of Rules

This talk is based on

1 Learning Rules

2 Rules

3 Sequential Covering

;Examples is the set of examples;
; e.g.
; 1- Wind(d1)=Strong, Humdity(d1)=Low, Outlook(d1)=Sunny, PlayTennis(d1)=No
; 2- Wind(d2)=Weak, Humdity(d2)=Med, Outlook(d2)=Sunny, PlayTennis(d2)=Yes
; 3- Wind(d3)=Med, Humdity(d3)=Med, Outlook(d3)=Rain, PlayTennis(d3)=No
;Target_attribute is the one wish to learn.
; e.g. PlayTennis(x)
;Attributes is the set of all possible attributes.
; e.g. Wind(x), Humidity(x), Outlook(x)
; Threshold is the desired performance.
;
Sequential_covering (Target_attribute, Attributes, Examples, Threshold) :
  Learned_rules = {}
  Rule = Learn-One-Rule(Target_attribute, Attributes, Examples)

  while Performance(Rule, Examples) > Threshold :
    Learned_rules = Learned_rules + Rule
    Examples = Examples - {examples correctly classified by Rule}
    Rule = Learn-One-Rule(Target_attribute, Attributes, Examples)

  Learned_rules = sort Learned_rules according to performance over Examples
  return Learned_rules

3.1 Learn-One-Rule

Learn one Rule

3.1.1 Learn-One-Rule Algorithm

Learn-One-Rule(target_attribute, attributes, examples, k)
;Returns a single rule that covers some of the Examples
  best-hypothesis = the most general hypothesis
  candidate-hypotheses = {best-hypothesis}
  while candidate-hypothesis     ;Generate the next more specific candidate-hypotheses
    all-constraints = all "att.=val." contraints
    new-candidate-hypotheses = all specializations of candidate-hypotheses by adding all-constraints
    remove from new-candidate-hypotheses any that are duplicates, inconsistent, or not maximally specific
    ;Update best-hypothesis
    best-hypothesis = $\argmax_{h \in \text{new-candidate-hypotheses}}$ Performance(h,examples,target_attribute)
    ;Update candidate-hypotheses
    candidate-hypotheses = the k best from new-candidate-hypotheses according to Performance.
  prediction = most frequent value of target_attribute from examples that match best-hypothesis
  return IF best-hypothesis THEN prediction


Performance(h, examples, target_attribute)
  h-examples = the set of examples that match h
  return - Entropy(h-examples) wrt target_attribute

3.1.2 Learn-One-Rule Example

Day Outlook Temp Humid Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Low Weak Yes
D6 Rain Cool Low Strong No
D7 Overcast Cool Low Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Low Weak Yes
D10 Rain Mild Low Weak Yes
D11 Sunny Mild Low Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Low Weak Yes
D14 Rain Mild High Strong No
  1. best-hypothesis = IF T THEN PlayTennis(x) = Yes
  2. candidate-hypotheses = {best-hypothesis}
  3. all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot, ......}
  4. new-candidate-hypotheses = {IF Outlook=Sunny THEN PlayTennis=YES, IF Outlook=Overcast THEN PlayTennis=YES, ...}
  5. best-hypothesis = IF Outlook=Sunny THEN PlayTennis=YES
  6. candidate-hypotheses = {IF Outlook=Sunny THEN PlayTennis=YES, IF Outlook=Sunny THEN PlayTennis=YES...}
  7. .....

3.1.3 Learn One Summary

3.2 CN2

3.3 Other Variations

3.4 Other Performance Measures

  1. Relative frequency: Let $n$ be the number of examples the rule matches and $n_c$ be the number that it classifies correctly. \[\frac{n_c}{n}\]
  2. m-estimate of accuracy: Let $p$ be the prior probability that a random example will be correctly classified correctly, let $m$ be the weight. \[\frac{n_c + mp}{n+m}\]
  3. Entropy: Let $S$ be the set of examples that match the rule precondition, $c$ be the number of distinct values the target function make take on, and $p_i$ the proportion of examples for which the target function takes the $i$th value \[-\text{Entropy}(S) = \sum_{i=1}^c p_i\log_2 p_i\]

4 Learning Rule Sets Summary

5 Learning Rule Sets Summary

6 Learning First-Order Rules

6.1 First-Order Logic Definitions

6.2 Learning First-Order Horn Clauses

6.3 FOIL

FOIL(target-predicate, predicates, examples)
  pos = those examples for which target-predicate is true
  teg = those examples for which target-predicate is false
  learnedRules = {}
  while pos do
    ;learn a new rule
    newRule = the rule that predicts target-predicate with no preconditions
    newRuleNeg = neg
    while newRuleNeg do
      ;add a new literal to specialize newRule
      candidateLiterals = candidate new literals for newRule based on predicates
      bestLiteral = $\argmax_{l \in \text{candidateLiterals}}$ Foil-gain(l, newRule)
      add bestLiteral to preconditions of newRule
      newRuleNeg = subset of newRuleNeg that satisfies newRuleNeg preconditions
    learnedRules = learnedRules + newRule
    pos = pos - {member of pos covered by newRule}
  return learnedRules

6.4 Generating Candidate Specializations

6.5 FOIL Example

6.6 Guiding Search in FOIL

6.7 Foil-Gain

6.8 Learning Recursive Rule Sets

6.9 FOIL Summary

7 Induction As Inverted Deduction

7.1 Inverted Example

7.2 Induction and Deduction

Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding operation, so that the question of relative importance cannot arise. Who thinks of asking whether addition or subtraction is the more important process in arithmetic? But at the same time much difference in difficulty may exist between a direct and inverse operation; ... it must be allowed that inductive investigations are of a far higher degree of difficulty and complexity than any questions of deduction...
(Jevons 1874)

7.3 Inverse Entailment

7.4 Inverse Entailment Pros and Cons

Pros: Cons:

7.5 Resolution Rule

7.6 Inverting Resolution

7.7 Learning With Inverted Resolution

  1. Select a training example that is not yet covered by learned clauses.
  2. Use inverse resolution rule to generate candidate hypothesis h that satisfies B ∧ h ∧ x → f(x), where B = background knowledge plus any learned clauses.

7.8 First-Order Resolution

7.9 Inverting First-Order Resolution

7.10 Inverted First-Order Example

7.11 Inverse Resolution Summary

8 Generalization, θ-Subsumption, and Entailment

9 PROGOL

URLs

  1. Machine Learning book at Amazon, http://www.amazon.com/exec/obidos/ASIN/0070428077/multiagentcom/
  2. Slides by Tom Mitchell on Machine Learning, http://www-2.cs.cmu.edu/~tom/mlbook-chapter-slides.html
  3. wikipedia:First-order_logic, http://www.wikipedia.org/wiki/First-order_logic
  4. wikipedia:Prolog, http://www.wikipedia.org/wiki/Prolog
  5. CN2 Homepage, http://www.cs.utexas.edu/users/pclark/software.html#cn2
  6. wikipedia:Soundness, http://www.wikipedia.org/wiki/Soundness
  7. wikipedia:Completeness, http://www.wikipedia.org/wiki/Completeness

This talk available at http://jmvidal.cse.sc.edu/talks/learningrules/
Copyright © 2009 José M. Vidal . All rights reserved.

08 April 2003, 01:53PM