# Backpropagation
When training a [[Neural Networks (NNs)]] with a test data set (see [[AI Data Sets]]), the algorithm:
- Calculates the error between the ground truth `T` (i.e., the correct value in the data set) and the predicted output. This error is called E. This error represents the cost or the loss function (cfr [[Gradient Descent]])
- Propagates the error back into the network and update each weight and bias as per the following equations
- Update weights: `wi = wi - n * (delta E / delta wi)`
- Update biases: `bi = bi - n * (delta E / delta bi)`
- The error is calculated as the means squared error: `E = 1/2 * (T - a2)^2` and `E = 1/2m * sum (i = 1 -> m) (Ti - a2,i)^2`
- Where `T` is the ground truth
- Where `a2` is the predicted output of the neural network
Note that the backpropagation algorithm is applied 1-n times, called epochs, or until the mean squared error is below a certain threshold.
In the above example, the backpropagation algorithm will update `w2, b2, w1, b1`
To backpropagate the error from `a2` to `w2`, we need to use the following function: `w2 = w2 - n * (delta E / delta w2)` and we know that:
- `E` is a function of `a2`, as per `E = 1/2 * (T - a2)^2`
- `a2` is a function of `z2`, as per `a2 = f(z2) = 1 / 1 + e^-z2`
- `z2` is a function of `w2`, as per `z2 = a1 * w2 + b2`
That's why we can use the chain rule and:
- take the derivative of `E` with respect to `a2`
- `E = 1/2 * (T - a2)^2 => delta E / delta a2`
- take the derivative of `a2` with respect to `z2`
- `a2 = f(z2) = 1 / 1 + e^-z2 => delta a2 / delta z2`
- take the derivative of `z2` with respect to `w2`
- `z2 = a1 * w2 + b2 => delta z2 / delta w2`
Then, the derivative of the error, with respect to `w2` would be simply the product of these individual derivatives:
- `w2 -> w2 - n * (delta E / delta w2)`
- Thus: `delta E / delta w2 = delta E / delta a2 * delta a2 / delta z2 * delta z2 / delta w2`
![[Backpropagation-derivatives.png]]
Where:
- `delta E / delta w2 = (-(T - a2)) * (a2 * (1 - a2)) * (a1)`
Thus, `w2` is updated as per the following equation
- `w2 -> w2 - n * (-(T - a2)) * (a2 * (1 - a2)) * (a1)`
To update `w1`, we then use the following equation:
- `w1 -> w1 - n * (delta E / delta w1)`
![[Backpropagation-updating-w1.png]]
![[Backpropagation-updating-w1-1.png]]
And so on...
![[Backpropagation-updating-w2.png]]
![[Backpropagation-updating-b2.png]]
![[Backpropagation-updating-w1-2.png]]
![[Backpropagation-updating-b1.png]]
In