# Vanishing Gradient A problem with the [[Sigmoid Function]]. When we perform backpropagation, we keep multiplying factors that are less than one by each other (because we are using the Sigmoid function, which outputs a value between 0 and 1) and so their gradients tend to be smaller and smaller as we move backward in the network. This causes the neurons in earlier layers to learn very slowly, compared to neurons in later layers. This causes for the learning to take too long and accuracy to be compromised. This is why the [[Sigmoid Function]] and similar ones are not used as activation functions.