Backpropagation¶
To unrderstand backpropagation we need to understand the forward propagation: the process of moving the input data through the net to get the output. On the contrary, backpropagation is the process of moving the error back through the net to adjust the weights and biases. Backpropagation involves calculating the gradient of the error with respect to the weights and biases of the neural network.
Chain Rule in Calculus¶
When in ml you get the error you need to update parameters (weights and biases) in order to minimize it. This is done by using the gradient of the error with respect to the parameters. This is done by using the chain rule of calculus: function f(g(x)) has partial derivatives with respect to x f'(g(x)) * g'(x).
Let's apply the chain rule of calculus to the given context. The error is a function of the output of the neural network and the output of the neural network is a function of the weights and biases.
- Identify the outer function f and the inner function $g$.
- Replace f with "derivative of the error with respect to the weights and biases".
- Replace g with "derivative of the error with respect to the output of the neural network".
- Apply the chain rule: $$ f(g(x))' = f'(g(x)) \cdot g'(x) $$.
Chain Rule Applied to Backpropagation¶
Given:
- $f$ = "derivative of the error with respect to the weights and biases"
- $g$ = "derivative of the error with respect to the output of the neural network"
The chain rule states: $[ \frac{d}{dx} f(g(x)) = f'(g(x)) \cdot g'(x) ]$
Which is in this context is: $[ \frac{dE}{dW} = \frac{dE}{dO} \cdot \frac{dO}{dW} ]$
Where:
- $E$ is the error.
- $O$ is the output of the neural network.
$W$ represents the weights and biases.
So the derivative of the error with respect to the weights and biases is the derivative of the error with respect to the output of the neural network times the derivative of the output of the neural network with respect to the weights and biases. Psst... chain rule. Alltogehter this is called backpropagation, ### In short We calculate our error by taking the diff (actual - expected) and getting back along the net at small learning rates adjusting each of the matrices of weights and biases in a Net.
in this article I tried to get gradient descent https://www.prettylagom.me/regression.html in python. The implementation of gradient descent in logistic regression is highly relevant here since it is algorithm used to minimize the loss function in various machine learning models, including deep learning models.
The pesudo code is below. By looking at this you see multiplicatino of the derivative of the function by the derivative of the inside(== just follows Chain Rule).
# Forward pass
output = neural_network(input, weights, biases)
# Compute the loss function
error = loss(output, target)
# Backward pass
gradient = error_gradient(output, target)
# Update the weights and biases
weights, biases = backpropagate(gradient, weights, biases)
In real implementation it is iterated over multiple epochs to minimize the loss function and improve the model's performance.
import numpy as np
# Predict
# Compute the linear combination of the weights and input features
linear_combination = np.dot(weights.T, input_features) + bias
# Apply the sigmoid activation function to get the predicted probabilities
activation_output = sigmoid(linear_combination)
# Gradient descent
# Compute the difference between the predicted probabilities and the actual labels
error = activation_output - true_labels
# Compute the gradient of the loss with respect to the weights
gradient_weights = (1/num_samples) * np.dot(input_features, error.T)
# Compute the gradient of the loss with respect to the bias
gradient_bias = (1/num_samples) * np.sum(error)
# Update
# Update the weights by subtracting the product of the learning rate and the gradient
weights = weights - learning_rate * gradient_weights
# Update the bias similarly
bias = bias - learning_rate * gradient_bias