# Gradient Descent
Gradient Descent is an iterative optimization algorithm for finding the minimum of a function.
To find the minimum of a function using Gradient Descent, we take steps proportional to the negative of the gradient of the function at the current point. We start with a random initial value of w, w0, and we take steps towards the minimum. To determine the direction we move toward, we compute the gradient of the loss function at the current value of w. The gradient is given by the slope of the tangent at w=current value. Then, the magnitude of the step we take depends on a parameter called the **learning rate**. The larger the learning rate, the bigger the step we take, and the smaller the learning rate, the smaller step we take. Then, we take a step and move to w1. We compute w1 as `w1 = w0 - learning rate * (gradient at w0)`. Then we keep repeating the same process until we hit the minimum or a value of the cost function that is very close to the minimum, within a very small predefined threshold.
## Influence of the learning rate
A large learning rate can lead to big steps and eventually missing the minimum. On the other hand, a small learning rate can lead to many small steps and a lot of time to find the minimum.
## Example
Assuming we have data and that we want to find the line that best fits the data set, we define a cost or a loss function. Common example of a cost or loss function:
![[Gradient Descent Algorithm - cost or loss function example.png]]
In this function, we take the difference between the z values and the the product of w and x's. We square that and we sum the squared difference across all values of z and x. The best value of w is then the value that results in the minimum value of this cost or loss function.
![[Gradient Descent Algorithm - example.png]]
The cost or loss function above is a parabola. It's interesting because it has one global minimum, or one unique solution.
Example of progress with Gradient Descent:
![[Gradient Descent Algorithm progress example.png]]
In this case, the ideal value of w that makes this cost function minimum is w=2, meaning `z=2x`, which corresponds to a line that fits the data points perfectly:
![[Gradient Descent Algorithm - minimum.png]]
In real life data sets, it's very rare to have such a simple case.