# CSCE 883: Test 2

## Name:___________________________________________________

Show your work for every question but make clear which one is the
final answer by drawing a box around it. Remember to copy the
question number to your answer booklet.

**[15 points]** The data is represented with the
attributes `temperature`, `direction`, and
`color`, and each example must be classified into one
of `Alice`, `Bob`, or `Clarke`. You are
given the set of training examples
Temperature |
Direction |
Color |
Classification |

80 |
N |
Red |
Alice |

70 |
N |
Blue |
Bob |

90 |
S |
Red |
Clarke |

60 |
E |
Red |
Alice |

85 |
W |
Blue |
Bob |

and the three hypothesis
- if temp > 70 and color=red then its Alice else its Bob.
- if temp > 70 and color=blue then its Bob else its Clarke.
- if temp <= 70 then its Clarke else its Alice.

What is the most probable (Bayes optimal classification) of a new instance?
**[15 points]** Given the set of examples in the table above, how would a
naive Bayes classifier classify the example:
Temperature |
Direction |
Color |

80 |
S |
Red |

**[5 points]** What are some of the main issues that
arise when trying to use Bayesian Belief networks (Bayes nets)
to solve a machine learning problem?
**[10 points]** You are given the problem of building a
machine learning algorithm to classify DNA sequences into one
of two categories: can-sing, cannot-sing. The sequences
consist of strings of 100 characters, where each character can
take on one of four values (C,G,T,A). You are asked to
consider all possible hypothesis. Give an upper bound on the
number of randomly drawn training examples sufficient to
assure that for any concept any consistent learner will, with
probability 90% output a hypothesis with error at most
.2.
**[15 points]** You are given a sample space that
consists of points on a plane which can be either + or -. You
are also given a hypothesis space where each hypothesis
consists of two circles that can be placed in any location and
can have any radius. Any example that lies within a circle is
+. Show, with a picture, why the VC dimension of this
hypothesis space is at least 4.
**[10 points]** Given the following set of examples
(all points in the plane):

X |
Y |
Classification |

0 |
0 |
+ |

0 |
1 |
+ |

1 |
0 |
- |

2 |
1 |
- |

1 |
2 |
+ |

0 |
2 |
- |

2 |
2 |
+ |

2 |
0 |
- |

How would 3-NN (3 nearest neighbors) classify a point (1,1)?
**[10 points]** Explain why help-desk applications
usually use case-based reasoning.
**[10 points]** Describe how you would implement a
program that can play tic-tac-toe. The program should learn to
play using a genetic algorithm. Make sure to describe your
encoding method.
**[10 points]** Answer the previous question but now
using genetic programming. Make sure to describe your encoding
method.

Jose M. Vidal