Computation Overview
Core concepts of neural network computation
Input * Weight
There are a few mathematical concepts used in deep learning that, when combined, result in powerful computation systems. The basic calculation is multiplying an input with a weighted value, to get an output.
Weighted Sum/Dot Product
We can then apply this to a 3 to 1 design.
This introduces the Weighted Sum, also known as the Dot Product. It is the accumulation of all the multiplications between nodes. As we see here, the result value is 1.5.
Activation Functions
When working with numerical ranges we often use the ranges of -1.0 to 1.0 or 0 to 1.0, because it acts as a fixed range, allowing normalisation, scaling etc. Neural networks use the critical component called an activation function, which enables/disables nodes in a network based on their value, like an on/off switch. Let's apply a common activation function used in hidden layers, , to control the nodes activation.
We will cover in detail later on, for now note how the result value was modified within range.
Three Layer Network
Going a step forward, we can design a three layer network, where we apply another activation function, known as , to the output node.
Let's work through the calculations Variables:
Input nodes:
Weights from the input to hidden layer:
Weights from hidden to output layer:
Hidden layer activation:
Output layer activation: (sigmoid)
Calculations:
Step 1: Input to Hidden Layer
Reshape weights for input to hidden layer:
Calculate the weighted sum for each hidden node:
Hidden node 1:
Apply activation:
Hidden node 2:
Apply activation:
Hidden layer output:
Step 2: Hidden to Output Layer
Weights for hidden to output layer:
Calculate the weighted sum for the output node:
Apply sigmoid activation:
Bias
The bias term is an additional parameter added to each node in a layer, except the input layer. It allows the model to better fit the data by providing each node with the ability to shift the activation function, adjusting its threshold independently. It adds flexibility to the model, allowing it to learn more complex relationships in the data and improves the model’s ability to generalise. Formula: For a neuron with input , weights , and bias
where is the input to the activation function.
We can store a vector of bias terms, one for each node in a layer.
Loss Function
Once we have sent the input tensor through the network, we compare the predicted result with the true sample value, using a loss function. The loss calculation results in both a value and a gradient. The gradient tells us how much we need to adjust the prediction and in what direction.
Once we know the loss gradient, we can propagate the error back through the network, known as backpropagation. We can use the chain rule, to determine how each weight, connecting two nodes, influenced the final loss value.
These gradients are then passed to an optimiser algorithm and used to update the weights accordingly, iteratively resulting in minimising the loss of the network.
The Chain Rule
The chain rule is a fundamental principle in calculus that allows us to compute the derivative of a composite function. The chain rule allows us to decompose the derivative of a composite function into the product of simpler derivatives. In the context of neural networks, the chain rule is essential for backpropagation, which is used to calculate gradients and update the model’s parameters (weights and biases). Formula:
If we have two functions and , and we form a composite function , the chain rule states that the derivative of with respect to is:
Example Calculation:
Forward Pass:
Input: x
Hidden layer: h = g(x)
Output layer: o = f(h)
Loss: L(o, y)
Backward Pass:
We want to compute
Then we apply the chain rule
We calculate all the gradients for the network and then the optimiser updates the weights and biases accordingly before the next training epoch starts.
Last updated