←
^
→
Artificial Neural Networks
Gradient Descent
The
delta rule
is designed to converge if the examples are not linearly separable.
It does a gradient descent on the hypothesis space.
Consider a simpler
linear unit
, where
o
=
w
0
+
w
1
x
1
+
⋅ ⋅ ⋅
+
w
n
x
n
Let's learn
w
i
's that minimize the squared error
E
(
w
⇀
)
≡
1
2
∑
d
∈
D
(
t
d
-
o
d
)
2
Where
D
is set of training examples
Lets try to minimize the error.
José M. Vidal
.
11 of 33