Gradients
Tracking error through the network
When we propagate the loss/error back through the network, we compute the gradients of the loss with respect to each parameter. Gradients are the partial derivatives of the loss function with respect to each parameter (weights and biases) in the network. They indicate the direction and magnitude of change required to minimise the loss.
Once the network's gradients are computed, an optimisation algorithm adjusts the weights and biases to minimise the loss function. The gradients determine the step size and direction for these adjustments. Note that the loss function calculates the loss and then using the derivative of the loss function, it calculates the gradient of the loss, which is used during backpropagation, to calculate the gradients as we move back through the layers of the network.
Example
We compute loss using a predicted value and a target value, i.e. how incorrect was the prediction. Let's look at an example using the Mean Squared Error calculation. (We will look more in depth at loss functions on the next page)
Mean Squared Error (MSE) Loss:
The MSE loss for a single sample is defined as , where is the loss, is the predicted value, and is the actual value.
Gradient of the Loss:
To find the gradient of the loss with respect to the predicted value , we need to compute the partial derivative of with respect to :
Using the chain rule of differentiation:
Simplifying this, we get:
Example Calculation
Actual Value : 1.0 Predicted Value : 0.8
Calculate the Loss:
Calculate the Gradient:
The gradient -0.2 indicates that increasing the predicted value will decrease the loss, as it suggests that is less than the actual value .
The optimisation algorithm then uses the gradients of each parameter to adjust the model parameters in the direction opposite to the gradient, because we want to move towards the minimum of the loss function.
Last updated