Artificial Neural Networks

This talk is based on

1 Introduction

1.1 The Human Brain

Neuron

A neuron

1.2 Neural Network Representation

Artificial Neuron

Artificial neuron

2 When to Use Neural Networks

2.1 ALVINN

Alvin picture
Alvin picture

3 Perceptrons

3.1 Representational Power of Perceptrons

Linearly separable

Decision surface of two-input ( x 1 and x 2 ) perceptron.

3.2 Perceptron Training

3.3 Perceptron Training Rule Convergence

3.4 Gradient Descent

3.4.1 Gradient Descent Landscape

Parabola floor.

3.4.2 Calculating the Gradient Descent

3.4.3 Gradient Descent Algorithm

  1. Gradient-Descent(training-examples, η )
    • Each training example is a pair of the form x ,t , where x is the vector of input values, and t is the target output value. η is the learning rate (e.g., .05).
  2. Initialize each w i to some small random value
  3. Until the termination condition is met, Do
    1. Initialize each Δ w i to zero.
    2. For each x ,t in training-examples, Do
      1. Input the instance x to the unit and compute the output o
      2. For each linear unit weight w i , Do Δ w i Δw i +η(t-o)x i
    3. For each linear unit weight w i , Do w i w i +Δw i

3.5 Perceptron Learning Summary, so far

3.6 Incremental (Stochastic) Gradient Descent

3.6.1 Stochastic versus Batch Gradient Descent

4 Multilayer Networks

NN Separation

4.1 Sigmoid Unit

Sigmoid

4.2 Error Gradient for Sigmoid Unit

4.3 Backpropagation

  1. Initialize all weights to small random numbers.
  2. For each training example Do
    1. Input the training example to the network and compute the network outputs
    2. For each output unit k calculate its error term δ k δ k o k (1-o k )(t k -o k )
    3. For each hidden unit h calculate its error term δ h δ h o h (1-o h ) k outputsw h ,kδ k
    4. Update each network weight w i ,j w i ,j w i ,j+Δw i ,j where Δ w i ,j=ηδ j x i ,j
  3. Goto 2 if termination condition is not met.

4.3.1 Backpropagation Details

4.3.2 Hidden Layer Representation

838
Input H1 H2 H3 Output
10000000 .89 .04 .08 10000000
01000000 .15 .99 .99 01000000
00100000 .01 .97 .27 00100000
00010000 .99 .97 .71 00010000
00001000 .03 .05 .02 00001000
00000100 .01 .11 .88 00000100
00000010 .80 .01 .98 00000010
00000001 .60 .94 .01 00000001

4.3.3 8-3-8 Plots

encoding

Hidden unit encoding

hidden weights

Weights from inputs to a hidden unit

output errors

Sum of squared errors

4.3.4 Backpropagation Convergence

  1. add momentum, as before,
  2. use stochastic gradient descent,
  3. train multiple nets with different initial weights.

4.4 Representational Power of ANNs

4.5 Overfitting ANNs

overfitt ann overfitt ann
  1. Decrease each weight by some small factor during each iteration (weight decay). Keep them small so as to bias against learning complex surfaces.
  2. Provide a validation set and use it to monitor the error. But, be careful not to find a minimum too early.
  3. With small data sets, use k-fold cross-validation: divide pile into k disjoint sets. Each time one of the sets is the validation and the other k-1 are the training data.

5 Face Recognition Example

g1

left

straight

right

up

6 Alternative Error Functions

7 Recurrent Networks

Recurrent

8 Dynamically Modifying Network Structure

9 Summary

URLs

  1. Machine Learning book at Amazon, http://www.amazon.com/exec/obidos/ASIN/0070428077/multiagentcom/
  2. Slides by Tom Mitchell on Machine Learning, http://www-2.cs.cmu.edu/~tom/mlbook-chapter-slides.html
  3. Alvinn Project Homepage, http://www.ri.cmu.edu/projects/project_160.html
  4. Software for face recognition, http://www-2.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html

This talk available at http://jmvidal.cse.sc.edu/talks/ann/
Copyright © 2009 José M. Vidal . All rights reserved.

01 February 2003, 09:41AM