# Backpropagation When training a [[Neural Networks (NNs)]] with a test data set (see [[AI Data Sets]]), the algorithm: - Calculates the error between the ground truth `T` (i.e., the correct value in the data set) and the predicted output. This error is called E. This error represents the cost or the loss function (cfr [[Gradient Descent]]) - Propagates the error back into the network and update each weight and bias as per the following equations - Update weights: `wi = wi - n * (delta E / delta wi)` - Update biases: `bi = bi - n * (delta E / delta bi)` - The error is calculated as the means squared error: `E = 1/2 * (T - a2)^2` and `E = 1/2m * sum (i = 1 -> m) (Ti - a2,i)^2` - Where `T` is the ground truth - Where `a2` is the predicted output of the neural network Note that the backpropagation algorithm is applied 1-n times, called epochs, or until the mean squared error is below a certain threshold. In the above example, the backpropagation algorithm will update `w2, b2, w1, b1` To backpropagate the error from `a2` to `w2`, we need to use the following function: `w2 = w2 - n * (delta E / delta w2)` and we know that: - `E` is a function of `a2`, as per `E = 1/2 * (T - a2)^2` - `a2` is a function of `z2`, as per `a2 = f(z2) = 1 / 1 + e^-z2` - `z2` is a function of `w2`, as per `z2 = a1 * w2 + b2` That's why we can use the chain rule and: - take the derivative of `E` with respect to `a2` - `E = 1/2 * (T - a2)^2 => delta E / delta a2` - take the derivative of `a2` with respect to `z2` - `a2 = f(z2) = 1 / 1 + e^-z2 => delta a2 / delta z2` - take the derivative of `z2` with respect to `w2` - `z2 = a1 * w2 + b2 => delta z2 / delta w2` Then, the derivative of the error, with respect to `w2` would be simply the product of these individual derivatives: - `w2 -> w2 - n * (delta E / delta w2)` - Thus: `delta E / delta w2 = delta E / delta a2 * delta a2 / delta z2 * delta z2 / delta w2` ![[Backpropagation-derivatives.png]] Where: - `delta E / delta w2 = (-(T - a2)) * (a2 * (1 - a2)) * (a1)` Thus, `w2` is updated as per the following equation - `w2 -> w2 - n * (-(T - a2)) * (a2 * (1 - a2)) * (a1)` To update `w1`, we then use the following equation: - `w1 -> w1 - n * (delta E / delta w1)` ![[Backpropagation-updating-w1.png]] ![[Backpropagation-updating-w1-1.png]] And so on... ![[Backpropagation-updating-w2.png]] ![[Backpropagation-updating-b2.png]] ![[Backpropagation-updating-w1-2.png]] ![[Backpropagation-updating-b1.png]] In