CSCE 883: Test 2

Name:___________________________________________________

Show your work for every question but make clear which one is the final answer by drawing a box around it. Remember to copy the question number to your answer booklet.

  1. [15 points] The data is represented with the attributes temperature, direction, and color, and each example must be classified into one of Alice, Bob, or Clarke. You are given the set of training examples
    Temperature Direction Color Classification
    80 N Red Alice
    70 N Blue Bob
    90 S Red Clarke
    60 E Red Alice
    85 W Blue Bob

    and the three hypothesis
    1. if temp > 70 and color=red then its Alice else its Bob.
    2. if temp > 70 and color=blue then its Bob else its Clarke.
    3. if temp <= 70 then its Clarke else its Alice.
    What is the most probable (Bayes optimal classification) of a new instance?
  2. [15 points] Given the set of examples in the table above, how would a naive Bayes classifier classify the example:
    Temperature Direction Color
    80 S Red
  3. [5 points] What are some of the main issues that arise when trying to use Bayesian Belief networks (Bayes nets) to solve a machine learning problem?
  4. [10 points] You are given the problem of building a machine learning algorithm to classify DNA sequences into one of two categories: can-sing, cannot-sing. The sequences consist of strings of 100 characters, where each character can take on one of four values (C,G,T,A). You are asked to consider all possible hypothesis. Give an upper bound on the number of randomly drawn training examples sufficient to assure that for any concept any consistent learner will, with probability 90% output a hypothesis with error at most .2.
  5. [15 points] You are given a sample space that consists of points on a plane which can be either + or -. You are also given a hypothesis space where each hypothesis consists of two circles that can be placed in any location and can have any radius. Any example that lies within a circle is +. Show, with a picture, why the VC dimension of this hypothesis space is at least 4.
  6. [10 points] Given the following set of examples (all points in the plane):
    X Y Classification
    0 0 +
    0 1 +
    1 0 -
    2 1 -
    1 2 +
    0 2 -
    2 2 +
    2 0 -
    How would 3-NN (3 nearest neighbors) classify a point (1,1)?
  7. [10 points] Explain why help-desk applications usually use case-based reasoning.
  8. [10 points] Describe how you would implement a program that can play tic-tac-toe. The program should learn to play using a genetic algorithm. Make sure to describe your encoding method.
  9. [10 points] Answer the previous question but now using genetic programming. Make sure to describe your encoding method.

Jose M. Vidal